Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > ElementTree.XML(string XML) and ElementTree.fromstring(string XML)not working

Reply
Thread Tools

ElementTree.XML(string XML) and ElementTree.fromstring(string XML)not working

 
 
Kee Nethery
Guest
Posts: n/a
 
      06-26-2009
Summary: I have XML as string and I want to pull it into ElementTree
so that I can play with it but it is not working for me. XML and
fromstring when used with a string do not do the same thing as parse
does with a file. How do I get this to work?

Details:
I have a CGI that receives XML via an HTTP POST as a POST variable
named 'theXml'. The POST data is a string that the CGI receives, it is
not a file on a hard disk.

The POSTed string looks like this when viewed in pretty format:

<xml>
<purchase id="1" lang="en">
<item id="1" productId="369369">
<name>Autumn</name>
<quantity>1</quantity>
<price>8.46</price>
</item>
<javascript>YES</javascript>
</purchase>
<customer id="123456" time="1227449322">
<shipping>
<street>19 Any Street</street>
<city>Berkeley</city>
<state>California</state>
<zip>12345</zip>
<country>People's Republic of Berkeley</country>
<name>Jon Roberts</name>
</shipping>
<email>(E-Mail Removed)</email>
</customer>
</xml>


The pseudocode in Python 2.6.2 looks like:

import xml.etree.ElementTree as et

formPostData = cgi.FieldStorage()
theXmlData = formPostData['theXml'].value
theXmlDataTree = et.XML(theXmlData)

and when this runs, theXmlDataTree is set to:

theXmlDataTree instance <Element xml at 7167b0>
attrib dict {}
tag str xml
tail NoneType None
text NoneType None

I get the same result with fromstring:

formPostData = cgi.FieldStorage()
theXmlData = formPostData['theXml'].value
theXmlDataTree = et.fromstring(theXmlData)

I can put the xml in a file and reference the file by it's URL and use:

et.parse(urllib.urlopen(theUrl))

and that will set theXmlDataTree to:

theXmlDataTree instance <xml.etree.ElementTree.ElementTree instance at
0x67cb48>

This result I can play with. It contains all the XML.

et.parse seems to pull in the entire XML document and give me
something to play with whereas et.XML and et.fromstring do not.

Questions:
How do I get this to work?
Where in the docs did it give me an example of how to make this work
(what did I miss from reading the docs)?

.... and for bonus points ...

Why isn't et.parse the only way to do this? Why have XML or fromstring
at all? Why not enhance parse and deprecate XML and fromstring with
something like:

formPostData = cgi.FieldStorage()
theXmlData = formPostData['theXml'].value
theXmlDataTree =
et
..parse
(makeThisUnicodeStringLookLikeAFileSoParseWillDeal WithIt(theXmlData))

Thanks in advance,
Kee Nethery
 
Reply With Quote
 
 
 
 
Nobody
Guest
Posts: n/a
 
      06-26-2009
On Thu, 25 Jun 2009 18:02:25 -0700, Kee Nethery wrote:

> Summary: I have XML as string and I want to pull it into ElementTree
> so that I can play with it but it is not working for me. XML and
> fromstring when used with a string do not do the same thing as parse
> does with a file. How do I get this to work?


Why do you need an ElementTree rather than an Element? XML(string) returns
the root element, as if you had used et.parse(f).getroot(). You can turn
this into an ElementTree with e.g. et.ElementTree(XML(string)).

> Why isn't et.parse the only way to do this? Why have XML or fromstring
> at all? Why not enhance parse and deprecate XML and fromstring with
> something like:
>
> formPostData = cgi.FieldStorage()
> theXmlData = formPostData['theXml'].value
> theXmlDataTree =
> et.parse(makeThisUnicodeStringLookLikeAFileSoParse WillDealWithIt(theXmlData))


If you want to treat a string as a file, use StringIO.

 
Reply With Quote
 
 
 
 
unayok
Guest
Posts: n/a
 
      06-26-2009
On Jun 25, 9:02 pm, Kee Nethery <(E-Mail Removed)> wrote:
> Summary: I have XML as string and I want to pull it into ElementTree
> so that I can play with it but it is not working for me. XML and
> fromstring when used with a string do not do the same thing as parse
> does with a file. How do I get this to work?
>
> Details:
> I have a CGI that receives XML via an HTTP POST as a POST variable
> named 'theXml'. The POST data is a string that the CGI receives, it is
> not a file on a hard disk.
>
> The POSTed string looks like this when viewed in pretty format:

[...]
> et.parse seems to pull in the entire XML document and give me
> something to play with whereas et.XML and et.fromstring do not.
>
> Questions:
> How do I get this to work?
> Where in the docs did it give me an example of how to make this work
> (what did I miss from reading the docs)?
>

[skipping bonus points question]

I'm not sure what you're expecting. It looks to me like things are
working okay:

My test script:

import xml.etree.ElementTree as ET

data="""<xml>
<purchase id="1" lang="en">
<item id="1" productId="369369">
<name>Autumn</name>
<quantity>1</quantity>
<price>8.46</price>
</item>
<javascript>YES</javascript>
</purchase>
<customer id="123456" time="1227449322">
<shipping>
<street>19 Any Street</street>
<city>Berkeley</city>
<state>California</state>
<zip>12345</zip>
<country>People's Republic of Berkeley</
country>
<name>Jon Roberts</name>
</shipping>
<email>(E-Mail Removed)</email>
</customer>
</xml>"""

xml = ET.fromstring( data )

print xml
print "attrib ", xml.attrib
print "tag ", xml.tag
print "text ", xml.text
print "contents "
for element in xml :
print element
print "tostring"
print ET.tostring( xml )

when run, produces:

<Element xml at 7f582c2e82d8>
attrib {}
tag xml
text

contents
<Element purchase at 7f582c2e8320>
<Element customer at 7f582c2e85a8>
tostring
<xml>
<purchase id="1" lang="en">
<item id="1" productId="369369">
<name>Autumn</name>
<quantity>1</quantity>
<price>8.46</price>
</item>
<javascript>YES</javascript>
</purchase>
<customer id="123456" time="1227449322">
<shipping>
<street>19 Any Street</street>
<city>Berkeley</city>
<state>California</state>
<zip>12345</zip>
<country>People's Republic of Berkeley</
country>
<name>Jon Roberts</name>
</shipping>
<email>(E-Mail Removed)</email>
</customer>
</xml>

Which seems to me quite useful (i.e. it has the full XML available).
Maybe you can explain how you were trying to "play with" the results
of fromstring() that you can't do from parse().

The documentation for elementtree indicates:

> The ElementTree wrapper type adds code to load XML files as trees
> of Element objects, and save them back again.


and

> The Element type can be used to represent XML files in memory.
> The ElementTree wrapper class is used to read and write XML files.


In the above case, you should find that the getroot() of your loaded
ElementTree instance ( parse().getroot() ) to be the same as the
Element generated by fromstring().
 
Reply With Quote
 
Carl Banks
Guest
Posts: n/a
 
      06-26-2009
On Jun 25, 6:02*pm, Kee Nethery <(E-Mail Removed)> wrote:
> Summary: I have XML as string and I want to pull it into ElementTree *
> so that I can play with it but it is not working for me. XML and *
> fromstring when used with a string do not do the same thing as parse *
> does with a file. How do I get this to work?
>
> Details:
> I have a CGI that receives XML via an HTTP POST as a POST variable *
> named 'theXml'. The POST data is a string that the CGI receives, it is *
> not a file on a hard disk.
>
> The POSTed string looks like this when viewed in pretty format:
>
> <xml>
> * * * * <purchase id="1" lang="en">
> * * * * * * * * <item id="1" productId="369369">
> * * * * * * * * * * * * <name>Autumn</name>
> * * * * * * * * * * * * <quantity>1</quantity>
> * * * * * * * * * * * * <price>8.46</price>
> * * * * * * * * </item>
> * * * * * * * * <javascript>YES</javascript>
> * * * * </purchase>
> * * * * <customer id="123456" time="1227449322">
> * * * * * * * * <shipping>
> * * * * * * * * * * * * <street>19 Any Street</street>
> * * * * * * * * * * * * <city>Berkeley</city>
> * * * * * * * * * * * * <state>California</state>
> * * * * * * * * * * * * <zip>12345</zip>
> * * * * * * * * * * * * <country>People's Republic of Berkeley</country>
> * * * * * * * * * * * * <name>Jon Roberts</name>
> * * * * * * * * </shipping>
> * * * * * * * * <email>(E-Mail Removed)</email>
> * * * * </customer>
> </xml>
>
> The pseudocode in Python 2.6.2 looks like:
>
> import xml.etree.ElementTree as et
>
> formPostData = cgi.FieldStorage()
> theXmlData = formPostData['theXml'].value
> theXmlDataTree = et.XML(theXmlData)
>
> and when this runs, theXmlDataTree is set to:
>
> theXmlDataTree *instance * * * *<Element xml at 7167b0>
> * * * * attrib *dict * *{}
> * * * * tag * * str * * xml
> * * * * tail * *NoneType * * * *None
> * * * * text * *NoneType * * * *None
>
> I get the same result with fromstring:
>
> formPostData = cgi.FieldStorage()
> theXmlData = formPostData['theXml'].value
> theXmlDataTree = et.fromstring(theXmlData)
>
> I can put the xml in a file and reference the file by it's URL and use:
>
> et.parse(urllib.urlopen(theUrl))
>
> and that will set theXmlDataTree to:
>
> theXmlDataTree *instance * * * *<xml.etree.ElementTree.ElementTree instance at *
> 0x67cb48>
>
> This result I can play with. It contains all the XML.


I believe you are misunderstanding something. et.XML and
et.fromstring return Elements, whereas et.parse returns an
ElementTree. These are two different things; however, both of them
"contain all the XML". In fact, an ElementTree (which is returned by
et.parse) is just a container for the root Element (returned by
et.fromstring)--and it adds no important functionality to the root
Element as far as I can tell.

Given an Element (as returned by et.XML or et.fromstring) you can pass
it to the ElementTree constructor to get an ElementTree instance. The
following line should give you something you can "play with":

theXmlDataTree = et.ElementTree(et.fromstring(theXmlData))

Conversely, given an ElementTree (as returned bu et.parse) you can
call the getroot method to obtain the root Element, like so:

theXmlRootElement = et.parse(xmlfile).getroot()

I have no use for ElementTree instances so I always call getroot right
away and only store the root element. You may prefer to work with
ElementTrees rather than with Elements directly, and that's perfectly
fine; just use the technique above to wrap up the root Element if you
use et.fromstring.


[snip]
> Why isn't et.parse the only way to do this? Why have XML or fromstring *
> at all?


Because Fredrick Lundh wanted it that way. Unlike most Python
libraries ElementTree is under the control of one person, which means
it was not designed or vetted by the community, which means it would
tend to have some interface quirks. You shouldn't complain: the
library is superb compared to XML solutions like DOM. A few minor
things should be no big deal.


Carl Banks
 
Reply With Quote
 
Kee Nethery
Guest
Posts: n/a
 
      06-26-2009
thank you to everyone, I'll play with these suggestions tomorrow at
work and report back.

On Jun 25, 2009, at 8:04 PM, Carl Banks wrote:

> Because Fredrick Lundh wanted it that way. Unlike most Python
> libraries ElementTree is under the control of one person, which means
> it was not designed or vetted by the community, which means it would
> tend to have some interface quirks.


Yep

> You shouldn't complain: the
> library is superb compared to XML solutions like DOM.


Which is why I want to use it.

> A few minor
> things should be no big deal.


True and I will eventually get past the minor quirks. As a newbie,
figured I'd point out the difficult portions, things that conceptually
are confusing. I know that after lots of use I'm not going to notice
that it is strange that I have to stand on my head and touch my nose 3
times to open the fridge door. The contortions will seem normal.

Results tomorrow, thanks everyone for the assistance.

Kee Nethery
 
Reply With Quote
 
Carl Banks
Guest
Posts: n/a
 
      06-26-2009
On Jun 25, 8:53*pm, Kee Nethery <(E-Mail Removed)> wrote:
> On Jun 25, 2009, at 8:04 PM, Carl Banks wrote:
> > A few minor
> > things should be no big deal.

>
> True and I will eventually get past the minor quirks. As a newbie, *
> figured I'd point out the difficult portions, things that conceptually *
> are confusing. I know that after lots of use I'm not going to notice *
> that it is strange that I have to stand on my head and touch my nose 3 *
> times to open the fridge door. The contortions will seem normal.


Well it's not *that* bad.

(That would be PIL.


Carl Banks
 
Reply With Quote
 
Stefan Behnel
Guest
Posts: n/a
 
      06-26-2009
Carl Banks wrote:
>> Why isn't et.parse the only way to do this? Why have XML or fromstring
>> at all?

>
> Because Fredrick Lundh wanted it that way. Unlike most Python
> libraries ElementTree is under the control of one person, which means
> it was not designed or vetted by the community, which means it would
> tend to have some interface quirks.


Just for the record: Fredrik doesn't actually consider it a design "quirk".
He argues that it's designed for different use cases. While parse() parses
a file, which normally contains a complete document (represented in ET as
an ElementTree object), fromstring() and especially the 'literal wrapper'
XML() are made for parsing strings, which (most?) often only contain XML
fragments. With a fragment, you normally want to continue doing things like
inserting it into another tree, so you need the top-level element in almost
all cases.

Stefan
 
Reply With Quote
 
Carl Banks
Guest
Posts: n/a
 
      06-26-2009
On Jun 25, 10:11*pm, Stefan Behnel <(E-Mail Removed)> wrote:
> Carl Banks wrote:
> >> Why isn't et.parse the only way to do this? Why have XML or fromstring *
> >> at all?

>
> > Because Fredrick Lundh wanted it that way. *Unlike most Python
> > libraries ElementTree is under the control of one person, which means
> > it was not designed or vetted by the community, which means it would
> > tend to have some interface quirks.

>
> Just for the record: Fredrik doesn't actually consider it a design "quirk".


Well of course he wouldn't--it's his library.

> He argues that it's designed for different use cases. While parse() parses
> a file, which normally contains a complete document (represented in ET as
> an ElementTree object), fromstring() and especially the 'literal wrapper'
> XML() are made for parsing strings, which (most?) often only contain XML
> fragments. With a fragment, you normally want to continue doing things like
> inserting it into another tree, so you need the top-level element in almost
> all cases.


Whatever, like I said I am not going to nit-pick over small things,
when all the big things are done right.


Carl Banks
 
Reply With Quote
 
Stefan Behnel
Guest
Posts: n/a
 
      06-26-2009
Carl Banks wrote:
> On Jun 25, 10:11 pm, Stefan Behnel wrote:
>> Carl Banks wrote:
>>>> Why isn't et.parse the only way to do this? Why have XML or fromstring
>>>> at all?
>>> Because Fredrick Lundh wanted it that way. Unlike most Python
>>> libraries ElementTree is under the control of one person, which means
>>> it was not designed or vetted by the community, which means it would
>>> tend to have some interface quirks.

>> Just for the record: Fredrik doesn't actually consider it a design "quirk".

>
> Well of course he wouldn't--it's his library.


That's not an argument at all. Fredrik put out a alpha of ET 1.3 (long ago,
actually), which is (or was?) meant as a clean-up release for a number of
real quirks in the library (lxml also fixes most of them since 2.0). The
above definitely hasn't changed, simply because it's not considered 'wrong'
by the author(s).

Stefan
 
Reply With Quote
 
Stefan Behnel
Guest
Posts: n/a
 
      06-26-2009
Hi,

Kee Nethery wrote:
> Why isn't et.parse the only way to do this? Why have XML or fromstring
> at all?


Well, use cases. XML() is an alias for fromstring(), because it's
convenient (and well readable) to write

section = XML('<section id="XYZ"><title>A to Z</title></section>')
section.append(paragraphs)

for XML literals in source code. fromstring() is there because when you
want to parse a fragment from a string that you got from whatever source,
it's easy to express that with exactly that function, as in

el = fromstring(some_string)

If you want to parse a document from a file or file-like object, use
parse(). Three use cases, three functions. The fourth use case of parsing a
document from a string does not have its own function, because it is
trivial to write

tree = parse(BytesIO(some_byte_string))

I do not argue that fromstring() should necessarily return an Element, as
parsing fragments is more likely for literals than for strings that come
from somewhere else. However, given that the use case of parsing a document
from a string is so easily handled with parse(), I find it ok to give the
second use case its own function, simply because

tree = fromstring(some_string)
fragment_top_element = tree.getroot()

absolutely does not catch it.


> Why not enhance parse and deprecate XML and fromstring with
> something like:
>
> formPostData = cgi.FieldStorage()
> theXmlData = formPostData['theXml'].value
> theXmlDataTree =

et.parse(makeThisUnicodeStringLookLikeAFileSoParse WillDealWithIt(theXmlData))

This will not work because ET cannot parse from unicode strings (unless
they only contain plain ASCII characters and you happen to be using Python
2.x). lxml can parse from unicode strings, but it requires that the XML
must not have an encoding declaration (which would render it non
well-formed). This is convenient for parsing HTML, it's less convenient for
XML usually.

If what you meant is actually parsing from a byte string, this is easily
done using BytesIO(), or StringIO() in Py2.x (x<6).

Stefan
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
HttpWebRequest Post with ClientCertificates working in Debug and VS, but not working in IIS calebmeii@gmail.com ASP .Net 0 06-29-2007 04:55 PM
wifi not working on new hp, or not working after live update =?Utf-8?B?RHJhZ29ueA==?= Wireless Networking 1 10-01-2005 11:17 PM
Regular Expression validators NOT working, Required Field validators ARE working Ratman ASP .Net 0 09-14-2004 09:36 PM
freedom anti-virus working, but firewall and parental control not. working before. mmmarlee mcgarvey Computer Support 1 09-02-2004 05:28 PM
Runtime.exec() with env and working directory parameters is not working. Priyanka AGARWAL Java 9 05-25-2004 02:34 PM



Advertisments