R
Robert Kern
John said:I'm trying to clean up a bad ASCII string, one read from a
web page that is supposedly in the ASCII character set but has some
characters above 127. And I get this:
File "D:\projects\sitetruth\InfoSitePage.py", line 285, in httpfetch
sitetext = sitetext.encode('ascii','replace') # force to clean ASCII
UnicodeDecodeError: 'ascii' codec can't decode byte 0x92 in position 29151:
ordinal not in range(128)
Why is that exception being raised when the codec was told 'replace'?
The .encode('ascii') takes unicode strings to str strings. Since you gave it a
str string, it first tried to convert it to a unicode string using the default
codec ('ascii'), just as if you were to have done
unicode(sitetext).encode('ascii', 'replace').
I think you want something like this:
sitetext = sitetext.decode('ascii', 'replace').encode('ascii', 'replace')
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco