UnicodeEncodeError during repr()

gb345 · Apr 19, 2010

I'm getting a UnicodeEncodeError during a call to repr:

Traceback (most recent call last):
File "bug.py", line 142, in <module>
element = parser.parse(INPUT)
File "bug.py", line 136, in parse
ps = Parser.Parse(open(filename,'r').read(), 1)
File "bug.py", line 97, in end_item
r = repr(CURRENT_ENTRY)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u3003' in position 0: o\
rdinal not in range(128)

This is what CURRENT_ENTRY.__repr__ looks like:

def __repr__(self):
k = SEP.join(self.k)
r = SEP.join(self.r)
s = SEP.join(self.s)
ret = u'\t'.join((k, r, s))
print type(ret) # prints "<type 'unicode'>", as expected
return ret

If I "inline" this CURRENT_ENTRY.__repr__ code so that the call to
repr(CURRENT_ENTRY) can be bypassed altogether, then the error
disappears.

Therefore, it is clear from the above that the problem, whatever
it is, occurs during the execution of the repr() built-in *after*
it gets the value returned by CURRENT_ENTRY.__repr__. It is also
clearly that repr is trying to encode something using the ascii
codec, but I don't understand why it needs to encode anything.

Do I need to do something especial to get repr to work strictly
with unicode?

Or should __repr__ *always* return bytes rather than unicode? What
about __str__ ? If both of these are supposed to return bytes,
then what method should I use to define the unicode representation
for instances of a class?

Thanks!

Gabe

Martin v. Loewis · Apr 19, 2010

Do I need to do something especial to get repr to work strictly

with unicode?

Yes, you need to switch to Python 3

Or should __repr__ *always* return bytes rather than unicode?

In Python 2.x: yes.

What about __str__ ?
Likewise.

If both of these are supposed to return bytes,
then what method should I use to define the unicode representation
for instances of a class?

__unicode__.

HTH,
Martin

gb345 · Apr 19, 2010

Yes, you need to switch to Python 3

In Python 2.x: yes.

__unicode__.

Thanks!

Dave Angel · Apr 20, 2010

gb345 said:
Thanks!

More precisely, __str__() and __repr__() return characters. Those
characters are 8 bits on Python 2.x, and Unicode on 3.x. If you need
unicode on 2.x, use __unicode__().

DaveA

python 3.3 repr	28	Nov 15, 2013
Trouble with UnicodeEncodeError and email	0	Jan 8, 2014
os.stat UnicodeEncodeError:	0	Mar 22, 2011
UnicodeEncodeError when not running script from IDE	17	Feb 12, 2013
UnicodeEncodeError - opening encoded URLs	3	Mar 27, 2009
[UnicodeEncodeError] Don't know what else to try	7	Nov 14, 2008
UnicodeEncodeError in Windows	2	Sep 17, 2007
UnicodeEncodeError - a bit out of my element...	3	Apr 11, 2007

UnicodeEncodeError during repr()

gb345

Martin v. Loewis

gb345

Dave Angel

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads