G
gb345
I'm getting a UnicodeEncodeError during a call to repr:
Traceback (most recent call last):
File "bug.py", line 142, in <module>
element = parser.parse(INPUT)
File "bug.py", line 136, in parse
ps = Parser.Parse(open(filename,'r').read(), 1)
File "bug.py", line 97, in end_item
r = repr(CURRENT_ENTRY)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u3003' in position 0: o\
rdinal not in range(128)
This is what CURRENT_ENTRY.__repr__ looks like:
def __repr__(self):
k = SEP.join(self.k)
r = SEP.join(self.r)
s = SEP.join(self.s)
ret = u'\t'.join((k, r, s))
print type(ret) # prints "<type 'unicode'>", as expected
return ret
If I "inline" this CURRENT_ENTRY.__repr__ code so that the call to
repr(CURRENT_ENTRY) can be bypassed altogether, then the error
disappears.
Therefore, it is clear from the above that the problem, whatever
it is, occurs during the execution of the repr() built-in *after*
it gets the value returned by CURRENT_ENTRY.__repr__. It is also
clearly that repr is trying to encode something using the ascii
codec, but I don't understand why it needs to encode anything.
Do I need to do something especial to get repr to work strictly
with unicode?
Or should __repr__ *always* return bytes rather than unicode? What
about __str__ ? If both of these are supposed to return bytes,
then what method should I use to define the unicode representation
for instances of a class?
Thanks!
Gabe
Traceback (most recent call last):
File "bug.py", line 142, in <module>
element = parser.parse(INPUT)
File "bug.py", line 136, in parse
ps = Parser.Parse(open(filename,'r').read(), 1)
File "bug.py", line 97, in end_item
r = repr(CURRENT_ENTRY)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u3003' in position 0: o\
rdinal not in range(128)
This is what CURRENT_ENTRY.__repr__ looks like:
def __repr__(self):
k = SEP.join(self.k)
r = SEP.join(self.r)
s = SEP.join(self.s)
ret = u'\t'.join((k, r, s))
print type(ret) # prints "<type 'unicode'>", as expected
return ret
If I "inline" this CURRENT_ENTRY.__repr__ code so that the call to
repr(CURRENT_ENTRY) can be bypassed altogether, then the error
disappears.
Therefore, it is clear from the above that the problem, whatever
it is, occurs during the execution of the repr() built-in *after*
it gets the value returned by CURRENT_ENTRY.__repr__. It is also
clearly that repr is trying to encode something using the ascii
codec, but I don't understand why it needs to encode anything.
Do I need to do something especial to get repr to work strictly
with unicode?
Or should __repr__ *always* return bytes rather than unicode? What
about __str__ ? If both of these are supposed to return bytes,
then what method should I use to define the unicode representation
for instances of a class?
Thanks!
Gabe