S
Simon Willison
Hello,
I'm using ElementTree to parse an XML file which includes some data
encoded as cp1252, for example:
<name>Bob\x92s Breakfast</name>
If this was a regular bytestring, I would convert it to utf8 using the
following:
Bob's Breakfast
But ElementTree gives me back a unicode string, so I get the following
error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/
python2.5/encodings/cp1252.py", line 15, in decode
return codecs.charmap_decode(input,errors,decoding_table)
UnicodeEncodeError: 'ascii' codec can't encode character u'\x92' in
position 3: ordinal not in range(128)
How can I tell Python "I know this says it's a unicode string, but I
need you to treat it like a bytestring"?
Thanks,
Simon Willison
I'm using ElementTree to parse an XML file which includes some data
encoded as cp1252, for example:
<name>Bob\x92s Breakfast</name>
If this was a regular bytestring, I would convert it to utf8 using the
following:
Bob's Breakfast
But ElementTree gives me back a unicode string, so I get the following
error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/
python2.5/encodings/cp1252.py", line 15, in decode
return codecs.charmap_decode(input,errors,decoding_table)
UnicodeEncodeError: 'ascii' codec can't encode character u'\x92' in
position 3: ordinal not in range(128)
How can I tell Python "I know this says it's a unicode string, but I
need you to treat it like a bytestring"?
Thanks,
Simon Willison