J
J Peyret
Well, as usual I am confused by unicode encoding errors.
I have a string with problematic characters in it which I'd like to
put into a postgresql table.
That results in a postgresql error so I am trying to fix things with
Trying for an encode:
Traceback (most recent call last):
File "<input>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position
10: ordinal not in range(128)
OK, that's pretty much as expected, I know this is not valid utf-8.
But I should be able to fix this with the errors parameter of the
encode method.
Traceback (most recent call last):
File "<input>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position
10: ordinal not in range(128)
Same exact error I got without the errors parameter.
Did I mistype the error handler name? Nope.
<built-in function xmlcharrefreplace_errors>
Same results with 'ignore' as an error handler.
Traceback (most recent call last):
File "<input>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position
10: ordinal not in range(128)
And with a bogus error handler:
print s.encode('utf-8','bogus')
Traceback (most recent call last):
File "<input>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position
10: ordinal not in range(128)
This all looks unusually complicated for Python.
Am I missing something incredibly obvious?
How does one use the errors parameter on strings' encode method?
Also, why are the exceptions above complaining about the 'ascii' codec
if I am asking for 'utf-8' conversion?
Version and environment below. Should I try to update my python from
somewhere?
./$ python
Python 2.5.1 (r251:54863, Oct 5 2007, 13:36:32)
[GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
Cheers
I have a string with problematic characters in it which I'd like to
put into a postgresql table.
That results in a postgresql error so I am trying to fix things with
Trying for an encode:
Traceback (most recent call last):
File "<input>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position
10: ordinal not in range(128)
OK, that's pretty much as expected, I know this is not valid utf-8.
But I should be able to fix this with the errors parameter of the
encode method.
Traceback (most recent call last):
File "<input>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position
10: ordinal not in range(128)
Same exact error I got without the errors parameter.
Did I mistype the error handler name? Nope.
<built-in function xmlcharrefreplace_errors>
Same results with 'ignore' as an error handler.
Traceback (most recent call last):
File "<input>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position
10: ordinal not in range(128)
And with a bogus error handler:
print s.encode('utf-8','bogus')
Traceback (most recent call last):
File "<input>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position
10: ordinal not in range(128)
This all looks unusually complicated for Python.
Am I missing something incredibly obvious?
How does one use the errors parameter on strings' encode method?
Also, why are the exceptions above complaining about the 'ascii' codec
if I am asking for 'utf-8' conversion?
Version and environment below. Should I try to update my python from
somewhere?
./$ python
Python 2.5.1 (r251:54863, Oct 5 2007, 13:36:32)
[GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
Cheers