Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Unicode problem.... as always

Reply
Thread Tools

Unicode problem.... as always

 
 
Todd Jenista
Guest
Posts: n/a
 
      07-01-2003
I have a parser I am building with python and, unfortunately, people
have decided to put unicode characters in the files I am parsing.
The parser seems to have a fit when I search for one \uXXXX symbol,
and there is another unicode symbol in the file. In this case, a
search and replace for © with a µ in the file causes the infamous
ordinal error.
My quick-fix, because they have good context, is to change them both
to "UTF8", and then attempt to replace the UTF8 at the end with the
original µ. The problem is that I am getting a µ when I try to
re-insert using \u00b5 which is the UTF8 code.
Words of wisdom would be greatly appreciated.
 
Reply With Quote
 
 
 
 
Thomas =?ISO-8859-15?Q?G=FCttler?=
Guest
Posts: n/a
 
      07-01-2003
Todd Jenista wrote:

> I have a parser I am building with python and, unfortunately, people
> have decided to put unicode characters in the files I am parsing.


Maybe this helps you. It converts a latin1 byte to unicode
and then converts it to utf8.
>>> s="ä"
>>> s_u=unicode(s, "latin1")
>>> s_utf8=s_u.encode("utf8")


You need to know the encoding of the input (utf8, utf16) .

thomas

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Unicode digit to unicode string Gabriele *darkbard* Farina Python 2 05-16-2006 01:15 PM
unicode wrap unicode object? ygao Python 6 04-08-2006 09:54 AM
Trying to create a CSS box that is always is always the width of an image placed inside it (and no wider) Deryck HTML 4 06-22-2004 08:25 PM
Unicode + jsp + mysql + tomcat = unicode still not displaying Robert Mark Bram Java 0 09-28-2003 05:37 AM
java and unicode: is decode always revertible by encode Harald Kirsch Java 2 08-28-2003 12:48 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57