Using codecs.EncodedFile() with Python 2.5

David Hughes · Jan 3, 2007

I used this function successfully with Python 2.4 to alter the encoding
of a set of database records from latin-1 to utf-8, but the same
program raises an exception using Python 2.5. This small example shows
the problem:

import codecs
fo = open('test.dat', 'w')
fo.write('G\xe2teaux')
fo.close()

fi = open("test.dat",'r')
fx = codecs.EncodedFile(fi, 'utf-8', 'latin-1')
astring = fx.readline()
print astring
ustring = unicode(astring, 'utf-8' )
print repr(ustring)
print ustring.encode('latin-1')
print ustring.encode('utf-8')

Python 2.4 gives:

GÃ¢teaux
u'G\xe2teaux'
Gâteaux
GÃ¢teaux

which I believe is correct, while 2.5 produces

Traceback (most recent call last):
File "test_codec.py", line 8, in <module>
astring = fx.readline()
File "C:\Python25\lib\codecs.py", line 709, in readline
data = self.reader.readline()
File "C:\Python25\lib\codecs.py", line 471, in readline
data = self.read(readsize, firstline=True)
File "C:\Python25\lib\codecs.py", line 418, in read
newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-3:
invalid data

Is there a genuine problem here, or have I been misusing this function?

Peter Otten · Jan 3, 2007

David said:
I used this function successfully with Python 2.4 to alter the encoding
of a set of database records from latin-1 to utf-8, but the same
program raises an exception using Python 2.5. This small example shows
the problem:

import codecs
fo = open('test.dat', 'w')
fo.write('G\xe2teaux')
fo.close()

fi = open("test.dat",'r')
fx = codecs.EncodedFile(fi, 'utf-8', 'latin-1')
astring = fx.readline()
print astring
ustring = unicode(astring, 'utf-8' )
print repr(ustring)
print ustring.encode('latin-1')
print ustring.encode('utf-8')

Python 2.4 gives:

GÃ¢teaux
u'G\xe2teaux'
Gâteaux
GÃ¢teaux

which I believe is correct, while 2.5 produces

Traceback (most recent call last):
File "test_codec.py", line 8, in <module>
astring = fx.readline()
File "C:\Python25\lib\codecs.py", line 709, in readline
data = self.reader.readline()
File "C:\Python25\lib\codecs.py", line 471, in readline
data = self.read(readsize, firstline=True)
File "C:\Python25\lib\codecs.py", line 418, in read
newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-3:
invalid data

Is there a genuine problem here, or have I been misusing this function?

This is indeed a bug in Python 2.5. Fixed in subversion.

http://svn.python.org/view/python/trunk/Lib/codecs.py?rev=52517&view=log

Peter

Problem pickling exceptions in Python 2.5/2.6	0	Jun 8, 2008
Inserting Unicode text with MySQLdb in Python 2.4-2.5?	5	Nov 18, 2009
py2exe compression not working with Python 2.5	2	Sep 22, 2006
bad marshal data in site.py in fresh 2.5 install win	5	Dec 29, 2006
problem parsing utf-8 encoded xml - minidom	2	Jul 4, 2008
Buffer Overflow with Python 2.5 on Vista in import site	2	Mar 29, 2008
io module and pdf question	2	Jun 25, 2013
Issues with nonfunctioning VTK under python 2.5	1	Jun 22, 2007

Using codecs.EncodedFile() with Python 2.5

David Hughes

Peter Otten

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads