Unicode problem.... as always
I have a parser I am building with python and, unfortunately, people
have decided to put unicode characters in the files I am parsing.
The parser seems to have a fit when I search for one \uXXXX symbol,
and there is another unicode symbol in the file. In this case, a
search and replace for © with a µ in the file causes the infamous
My quick-fix, because they have good context, is to change them both
to "UTF8", and then attempt to replace the UTF8 at the end with the
original µ. The problem is that I am getting a Âµ when I try to
re-insert using \u00b5 which is the UTF8 code.
Words of wisdom would be greatly appreciated.
Re: Unicode problem.... as always
Todd Jenista wrote:
> I have a parser I am building with python and, unfortunately, people
> have decided to put unicode characters in the files I am parsing.
Maybe this helps you. It converts a latin1 byte to unicode
and then converts it to utf8.
>>> s_u=unicode(s, "latin1")
You need to know the encoding of the input (utf8, utf16) .
|All times are GMT. The time now is 09:19 AM.|
Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.