T
Tim Arnold
Hi, I'm using the codecs module to read in utf8 and write out cp1252
encodings. For some characters I'd like to override the default behavior.
For example, the mdash character comes out as the code point \227 and I'd
like to translate it as — instead.
Example: the file myutf8.txt contains this string:
'factor one - initially'
====================
import codecs
fd0 = codecs.open('myutf8.txt', 'rb', encoding='utf8')
line = fd0.read()
fd0.close()
fd1 = codecs.open('my1252.txt', 'wb', encoding='cp1252')
fd1.write(line)
fd1.close()
====================
The codec is doing its job, but I want to override the codepoint for this
character (plus others) to use the html entity instead (from \227 to
— in this case).
I see hints writing your own codec and updating the decoding_map, but I
could use some more detail.
Is that the best way to solve the problem?
thanks,
--Tim Arnold
encodings. For some characters I'd like to override the default behavior.
For example, the mdash character comes out as the code point \227 and I'd
like to translate it as — instead.
Example: the file myutf8.txt contains this string:
'factor one - initially'
====================
import codecs
fd0 = codecs.open('myutf8.txt', 'rb', encoding='utf8')
line = fd0.read()
fd0.close()
fd1 = codecs.open('my1252.txt', 'wb', encoding='cp1252')
fd1.write(line)
fd1.close()
====================
The codec is doing its job, but I want to override the codepoint for this
character (plus others) to use the html entity instead (from \227 to
— in this case).
I see hints writing your own codec and updating the decoding_map, but I
could use some more detail.
Is that the best way to solve the problem?
thanks,
--Tim Arnold