E
Eric Brunel
Hi all,
I just found a problem in the xreadlines method/module when used with codecs.open: the codec specified in the open does not seem to be taken into account by xreadlines which also returns byte-strings instead of unicode strings.
For example, if a file foo.txt contains some text encoded in latin1:
The characters in latin1 are correctly "dumped" with readlines, but are still in latin1 encoding in byte-strings with xreadlines.
I tested with Python 2.1 and 2.3 on Linux and Windows: same result (I haven't Python 2.4 installed here)
Can anybody confirm the problem? Is this a bug? I searched this usegroup and the known Python bugs, but the problem did not seem to be reported yet.
TIA
I just found a problem in the xreadlines method/module when used with codecs.open: the codec specified in the open does not seem to be taken into account by xreadlines which also returns byte-strings instead of unicode strings.
For example, if a file foo.txt contains some text encoded in latin1:
[u'\ufffd\ufffd']['\xe9\xe0\xe7\xf9\n']import codecs
f = codecs.open('foo.txt', 'r', 'utf-8', 'replace')
[l for l in f.xreadlines()]
But:
The characters in latin1 are correctly "dumped" with readlines, but are still in latin1 encoding in byte-strings with xreadlines.
I tested with Python 2.1 and 2.3 on Linux and Windows: same result (I haven't Python 2.4 installed here)
Can anybody confirm the problem? Is this a bug? I searched this usegroup and the known Python bugs, but the problem did not seem to be reported yet.
TIA