Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > How to get an XML DOM while offline?

Reply
Thread Tools

How to get an XML DOM while offline?

 
 
william tanksley
Guest
Posts: n/a
 
      03-19-2008
I want to parse my iTunes Library xml. All was well, until I unplugged
and left for the train (where I get most of my personal projects
done). All of a sudden, I discovered that apparently the presence of a
DOCTYPE in the iTunes XML makes xml.dom.minidom insist on accessing
the Internet... So suddenly I was unable to do any work.

I don't want to modify the iTunes XML; iTunes rewrites it too often.
How can I prevent xml.dom.minidom from dying when it can't access the
Internet?

Is there a simpler way to read the iTunes XML? (It's merely a plist,
so the format is much simpler than general XML.)

-Wm
 
Reply With Quote
 
 
 
 
Diez B. Roggisch
Guest
Posts: n/a
 
      03-19-2008
william tanksley wrote:

> I want to parse my iTunes Library xml. All was well, until I unplugged
> and left for the train (where I get most of my personal projects
> done). All of a sudden, I discovered that apparently the presence of a
> DOCTYPE in the iTunes XML makes xml.dom.minidom insist on accessing
> the Internet... So suddenly I was unable to do any work.
>
> I don't want to modify the iTunes XML; iTunes rewrites it too often.
> How can I prevent xml.dom.minidom from dying when it can't access the
> Internet?
>
> Is there a simpler way to read the iTunes XML? (It's merely a plist,
> so the format is much simpler than general XML.)


Normally, this should be solved using an entity-handler that prevents the
remote fetching. I presume the underlying implementation of a SAX-parser
does use one, but you can't override that (at least I didn't find anything
in the docs)

The most pragmatic solution would be to rip the doctype out using simple
string methods and/or regexes.

Diez
 
Reply With Quote
 
 
 
 
Paul Boddie
Guest
Posts: n/a
 
      03-19-2008
On 19 Mar, 16:27, "Diez B. Roggisch" <(E-Mail Removed)> wrote:
> william tanksley wrote:
> > I want to parse my iTunes Library xml. All was well, until I unplugged
> > and left for the train (where I get most of my personal projects
> > done). All of a sudden, I discovered that apparently the presence of a
> > DOCTYPE in the iTunes XML makes xml.dom.minidom insist on accessing
> > the Internet... So suddenly I was unable to do any work.


The desire to connect to the Internet for DTDs is documented in the
following bug:

http://bugs.python.org/issue2124

However, I can't reproduce the problem using xml.dom.minidom.parse/
parseString and plain XHTML, although I may be missing something which
activates the retrieval of the DTD.

> > I don't want to modify the iTunes XML; iTunes rewrites it too often.
> > How can I prevent xml.dom.minidom from dying when it can't access the
> > Internet?

>
> > Is there a simpler way to read the iTunes XML? (It's merely a plist,
> > so the format is much simpler than general XML.)

>
> Normally, this should be solved using an entity-handler that prevents the
> remote fetching. I presume the underlying implementation of a SAX-parser
> does use one, but you can't override that (at least I didn't find anything
> in the docs)


There's a lot of complicated stuff in the xml.dom package, but I found
that the DOMBuilder class (in xml.dom.xmlbuilder) probably contains
the things which switch such behaviour on or off. That said, I've
hardly ever used the most formal DOM classes to parse XML in Python
(where you get the DOM implementation and then create other factory
classes - it's all very "Java" in nature), so the precise incantation
is unknown/forgotten to me.

> The most pragmatic solution would be to rip the doctype out using simple
> string methods and/or regexes.


Maybe, but an example fragment of the XML might help us diagnose the
problem, ideally with some commentary from the people who wrote the
xml.dom software in the first place.

Paul
 
Reply With Quote
 
william tanksley
Guest
Posts: n/a
 
      03-31-2008
"Diez B. Roggisch" <(E-Mail Removed)> wrote:
> The most pragmatic solution would be to rip the doctype out using simple
> string methods and/or regexes.


Thank you, Diez and Paul; I took Diez's solution, and it works well
enough for me.

> Diez


-Wm
 
Reply With Quote
 
Stefan Behnel
Guest
Posts: n/a
 
      04-07-2008
william tanksley wrote:
> I want to parse my iTunes Library xml. All was well, until I unplugged
> and left for the train (where I get most of my personal projects
> done). All of a sudden, I discovered that apparently the presence of a
> DOCTYPE in the iTunes XML makes xml.dom.minidom insist on accessing
> the Internet... So suddenly I was unable to do any work.
>
> I don't want to modify the iTunes XML; iTunes rewrites it too often.
> How can I prevent xml.dom.minidom from dying when it can't access the
> Internet?
>
> Is there a simpler way to read the iTunes XML? (It's merely a plist,
> so the format is much simpler than general XML.)


Try lxml. Since version 2.0, its parsers will not access the network unless
you tell it to do so.

http://codespeak.net/lxml

It's also much easier to use than minidom and much faster:

http://blog.ianbicking.org/2008/03/3...r-performance/

Stefan
 
Reply With Quote
 
Fredrik Lundh
Guest
Posts: n/a
 
      04-07-2008
Stefan Behnel wrote:

>> Is there a simpler way to read the iTunes XML? (It's merely a plist,
>> so the format is much simpler than general XML.)

>
> Try lxml. Since version 2.0, its parsers will not access the network unless
> you tell it to do so.
>
> http://codespeak.net/lxml


which makes it true for all ET implementations (the whole idea that
parsing a file should result in unexpected network access is of course a
potential security risk and one of a number of utterly stupid design
decisions in XML).

you'll find plist reading code here, btw:

http://effbot.org/zone/element-iterp...ental-decoding

replace the import with "from xml.etree import cElementTree" if you're
running 2.5.

(not sure if that one works with lxml, though, but that should be
fixable. you can at least reuse the unmarshaller dict).

</F>

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: How include a large array? Edward A. Falk C Programming 1 04-04-2013 08:07 PM
getNextSibling() never ends? DOM XML nodes (org.w3c.dom) Alan Java 6 10-13-2008 05:48 PM
Replacing _xmlplus.dom.minidom with xml.dom.minidom aine_canby@yahoo.com Python 3 08-03-2007 03:50 PM
Convert a XML DOM Object to a HTML DOM Object manjunath.d@gmail.com XML 0 09-20-2005 08:16 AM
Is it possible to get element type specified by schema while parsingthe xml document using SAX/DOM? Jari Kujansuu XML 2 09-30-2003 03:14 PM



Advertisments