Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Using codecs.EncodedFile() with Python 2.5

Reply
Thread Tools

Using codecs.EncodedFile() with Python 2.5

 
 
David Hughes
Guest
Posts: n/a
 
      01-03-2007
I used this function successfully with Python 2.4 to alter the encoding
of a set of database records from latin-1 to utf-8, but the same
program raises an exception using Python 2.5. This small example shows
the problem:

import codecs
fo = open('test.dat', 'w')
fo.write('G\xe2teaux')
fo.close()

fi = open("test.dat",'r')
fx = codecs.EncodedFile(fi, 'utf-8', 'latin-1')
astring = fx.readline()
print astring
ustring = unicode(astring, 'utf-8' )
print repr(ustring)
print ustring.encode('latin-1')
print ustring.encode('utf-8')

Python 2.4 gives:

Gâteaux
u'G\xe2teaux'
Gâteaux
Gâteaux

which I believe is correct, while 2.5 produces

Traceback (most recent call last):
File "test_codec.py", line 8, in <module>
astring = fx.readline()
File "C:\Python25\lib\codecs.py", line 709, in readline
data = self.reader.readline()
File "C:\Python25\lib\codecs.py", line 471, in readline
data = self.read(readsize, firstline=True)
File "C:\Python25\lib\codecs.py", line 418, in read
newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-3:
invalid data

Is there a genuine problem here, or have I been misusing this function?
--
Regards
David Hughes

 
Reply With Quote
 
 
 
 
Peter Otten
Guest
Posts: n/a
 
      01-03-2007
David Hughes wrote:

> I used this function successfully with Python 2.4 to alter the encoding
> of a set of database records from latin-1 to utf-8, but the same
> program raises an exception using Python 2.5. This small example shows
> the problem:
>
> import codecs
> fo = open('test.dat', 'w')
> fo.write('G\xe2teaux')
> fo.close()
>
> fi = open("test.dat",'r')
> fx = codecs.EncodedFile(fi, 'utf-8', 'latin-1')
> astring = fx.readline()
> print astring
> ustring = unicode(astring, 'utf-8' )
> print repr(ustring)
> print ustring.encode('latin-1')
> print ustring.encode('utf-8')
>
> Python 2.4 gives:
>
> Gâteaux
> u'G\xe2teaux'
> Gâteaux
> Gâteaux
>
> which I believe is correct, while 2.5 produces
>
> Traceback (most recent call last):
> File "test_codec.py", line 8, in <module>
> astring = fx.readline()
> File "C:\Python25\lib\codecs.py", line 709, in readline
> data = self.reader.readline()
> File "C:\Python25\lib\codecs.py", line 471, in readline
> data = self.read(readsize, firstline=True)
> File "C:\Python25\lib\codecs.py", line 418, in read
> newchars, decodedbytes = self.decode(data, self.errors)
> UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-3:
> invalid data
>
> Is there a genuine problem here, or have I been misusing this function?


This is indeed a bug in Python 2.5. Fixed in subversion.

http://svn.python.org/view/python/tr...52517&view=log

Peter

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: [Python-Dev] [python-committers] [RELEASED] Python 3.2 rc 1 R. David Murray Python 0 01-17-2011 02:23 PM
Re: [Python-Dev] [python-committers] [RELEASED] Python 3.2 rc 1 Senthil Kumaran Python 0 01-17-2011 10:31 AM
Re: [Python-Dev] [Python-3000] RELEASED Python 2.6a1 and 3.0a3 Martin v. Löwis Python 0 03-01-2008 10:51 PM
Re: [Python-Dev] [Python-3000] RELEASED Python 2.6a1 and 3.0a3 Paul Moore Python 0 03-01-2008 10:39 PM
Searching comp.lang.python/python-list@python.org (was: UTF-8) skip@pobox.com Python 0 03-10-2007 02:50 PM



Advertisments