Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > DOM with HTML

Reply
Thread Tools

DOM with HTML

 
 
Alessio Pace
Guest
Posts: n/a
 
      07-01-2003
Hi, I need to get a sort of DOM from an HTML page that is declared as XHTML
but unfortunately is *not* xhtml valid.. If I try to parse it with
xml.dom.minidom I get error with expat (as I supposed), so I was told to
try in this way, with a "forgiving" html parser:

from xml.dom.ext.reader import HtmlLib
reader = HtmlLib.Reader()
dom = reader.fromUri(url) # 'url' the web page

FIRST ISSUE:
It seemed to me, reading the source code in
$MY_PYTHON_INSTALLATION_DIR/site-packages/_xmlplus/dom/ext/reader/ ,
that these are 4DOM APIs , so from what I know of python distributions, they
are extra packages, or not? I would like to use *only* libs that are
available in the python2.2 suite, not any extra.

SECOND ISSUE:
If the above libs were included in python (and so I would continue using
them), how do I print a string representation of a (sub) tree of the DOM? I
tried with .toxml() as in the XML tutorial but that method does not exist
for the FtNode objects that are involved there... Any idea??

Thanks so much for who can help me

--
bye
Alessio Pace
 
Reply With Quote
 
 
 
 
F. GEIGER
Guest
Posts: n/a
 
      07-01-2003
> Hi, I need to get a sort of DOM from an HTML page that is declared as
XHTML
> but unfortunately is *not* xhtml valid.. If I try to parse it with


I use mx.Tidy in such cases, with great success.

Cheers
Franz


"Alessio Pace" <> schrieb im Newsbeitrag
news:3GbMa.4404$.. .
> Hi, I need to get a sort of DOM from an HTML page that is declared as

XHTML
> but unfortunately is *not* xhtml valid.. If I try to parse it with
> xml.dom.minidom I get error with expat (as I supposed), so I was told to
> try in this way, with a "forgiving" html parser:
>
> from xml.dom.ext.reader import HtmlLib
> reader = HtmlLib.Reader()
> dom = reader.fromUri(url) # 'url' the web page
>
> FIRST ISSUE:
> It seemed to me, reading the source code in
> $MY_PYTHON_INSTALLATION_DIR/site-packages/_xmlplus/dom/ext/reader/ ,
> that these are 4DOM APIs , so from what I know of python distributions,

they
> are extra packages, or not? I would like to use *only* libs that are
> available in the python2.2 suite, not any extra.
>
> SECOND ISSUE:
> If the above libs were included in python (and so I would continue using
> them), how do I print a string representation of a (sub) tree of the DOM?

I
> tried with .toxml() as in the XML tutorial but that method does not exist
> for the FtNode objects that are involved there... Any idea??
>
> Thanks so much for who can help me
>
> --
> bye
> Alessio Pace



 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
DOM ? HTML DOM mk834tt@yahoo.com Javascript 1 12-20-2007 01:08 AM
Convert a XML DOM Object to a HTML DOM Object manjunath.d@gmail.com XML 0 09-20-2005 08:16 AM
What is the difference between DOM Level 1 and DOM Level 2. mike XML 1 11-20-2004 03:19 PM
Difference between pure DOM and JAXP over DOM ?? Thorsten Meininger XML 0 07-28-2004 08:51 AM
Difference between pure DOM and JAXP over DOM ?? Thorsten Meininger Java 0 07-28-2004 08:51 AM



Advertisments