Go Back   Velocity Reviews > Newsgroups > Python
User Name
Password
Register FAQ Members List Calendar Search Today's Posts Mark Forums Read

Reply

Python - Re: ElementTree: How to return only unicode?

 
Thread Tools Search this Thread
Old 03-14-2009, 09:57 PM   #1
Default Re: ElementTree: How to return only unicode?


Torsten Bronger wrote:
> I parse an XML file with ElementTree and get the contets with
> the .attrib, .text, .get etc methods of the tree's nodes.
> Additionally, I use the "find" and "findtext" methods.
>
> My problem is that if there is only ASCII, these methods return
> ordinary strings instead of unicode. So sometimes I get str,
> sometimes I get unicode. Can one change this globally so that they
> only return unicode?


That's a convenience measure to reduce memory and processing overhead.
Could you explain why this is a problem for you?

Stefan


Stefan Behnel
  Reply With Quote
Old 03-15-2009, 09:48 AM   #2
Stefan Behnel
 
Posts: n/a
Default Re: ElementTree: How to return only unicode?
Torsten Bronger wrote:
> Hallöchen!


und zurück!


> Stefan Behnel writes:
>
>> Torsten Bronger wrote:
>>
>>> [...]
>>>
>>> My problem is that if there is only ASCII, these methods return
>>> ordinary strings instead of unicode. So sometimes I get str,
>>> sometimes I get unicode. Can one change this globally so that
>>> they only return unicode?

>> That's a convenience measure to reduce memory and processing
>> overhead.

>
> But is this really worth the inconsistency of having partly str and
> partly unicode, given that the common origin is unicode XML data?


Yes. It's no difference in almost all use cases, as long as you assume Py2
string handling semantics. In Py3, you will always get Unicode strings anyway.


>> Could you explain why this is a problem for you?

>
> I feed ElementTree's output to functions in the unicodedata module.
> And they want unicode input. While it's not a big deal to write
> e.g. unicodedata.category(unicode(my_character)), I find this rather
> wasteful.


I just looked at the code. It seems that you can use your own
XMLTreeBuilder subclass and overwrite the "._fixtext()" method like this:

def _fixtext(self, text):
return text

Then pass an instance of that as "parser" when parsing in ElementTree. That
should do the trick.

Stefan


Stefan Behnel
  Reply With Quote
Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
DVD Verdict reviews: COWBOY BEBOP REMIX 2, JANE GOODALL'S RETURN TO GOMBE, and more! DVD Verdict DVD Video 0 12-22-2005 09:14 AM
DVD Verdict reviews: DAIMAJIN / RETURN OF DAIMAJIN / WRATH OF DAIMAJIN and more! DVD Verdict DVD Video 0 05-24-2005 09:13 AM
DVD Verdict reviews: THE LORD OF THE RINGS: THE RETURN OF THE KING: SPECIAL EXTENDED EDITION and more! DVD Verdict DVD Video 0 01-25-2005 10:34 AM
Return Of The King - UK Region 2 - Full DVD Details Chris - DVD Debate DVD Video 0 02-08-2004 06:39 PM
DVD Verdict reviews: BARBIE OF SWAN LAKE, THE RETURN OF SWAMP THING, and more! DVD Verdict DVD Video 0 11-22-2003 10:04 AM




SEO by vBSEO 3.3.2 ©2009, Crawlability, Inc.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46