UnicodeEncodeError during repr()

G

gb345

I'm getting a UnicodeEncodeError during a call to repr:

Traceback (most recent call last):
File "bug.py", line 142, in <module>
element = parser.parse(INPUT)
File "bug.py", line 136, in parse
ps = Parser.Parse(open(filename,'r').read(), 1)
File "bug.py", line 97, in end_item
r = repr(CURRENT_ENTRY)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u3003' in position 0: o\
rdinal not in range(128)

This is what CURRENT_ENTRY.__repr__ looks like:

def __repr__(self):
k = SEP.join(self.k)
r = SEP.join(self.r)
s = SEP.join(self.s)
ret = u'\t'.join((k, r, s))
print type(ret) # prints "<type 'unicode'>", as expected
return ret

If I "inline" this CURRENT_ENTRY.__repr__ code so that the call to
repr(CURRENT_ENTRY) can be bypassed altogether, then the error
disappears.

Therefore, it is clear from the above that the problem, whatever
it is, occurs during the execution of the repr() built-in *after*
it gets the value returned by CURRENT_ENTRY.__repr__. It is also
clearly that repr is trying to encode something using the ascii
codec, but I don't understand why it needs to encode anything.

Do I need to do something especial to get repr to work strictly
with unicode?

Or should __repr__ *always* return bytes rather than unicode? What
about __str__ ? If both of these are supposed to return bytes,
then what method should I use to define the unicode representation
for instances of a class?

Thanks!

Gabe
 
M

Martin v. Loewis

Do I need to do something especial to get repr to work strictly
with unicode?

Yes, you need to switch to Python 3 :)
Or should __repr__ *always* return bytes rather than unicode?

In Python 2.x: yes.
What about __str__ ?
Likewise.

If both of these are supposed to return bytes,
then what method should I use to define the unicode representation
for instances of a class?

__unicode__.

HTH,
Martin
 
D

Dave Angel

gb345 said:
More precisely, __str__() and __repr__() return characters. Those
characters are 8 bits on Python 2.x, and Unicode on 3.x. If you need
unicode on 2.x, use __unicode__().

DaveA
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,969
Messages
2,570,161
Members
46,705
Latest member
Stefkari24

Latest Threads

Top