G
Guillermo
Hi,
I would appreciate if someone could point out what am I doing wrong
here.
Basically, I need to save a string containing non-ascii characters to
a file encoded in utf-8.
If I stay in python, everything seems to work fine, but the moment I
try to read the file with another Windows program, everything goes to
hell.
So here's the script unicode2file.py:
===================================================================
# encoding=utf-8
import codecs
f = codecs.open("m.txt",mode="w", encoding="utf8")
a = u"mañana"
print repr(a)
f.write(a)
f.close()
f = codecs.open("m.txt", mode="r", encoding="utf8")
a = f.read()
print repr(a)
f.close()
===================================================================
That gives the expected output, both calls to repr() yield the same
result.
But now, if I do type me.txt in cmd.exe, I get garbled characters
instead of "ñ".
I then open the file with my editor (Sublime Text), and I see "mañana"
normally. I save (nothing to be saved, really), go back to the dos
prompt, do type m.txt and I get again the same garbled characters.
I then open the file m.txt with notepad, and I see "mañana" normally.
I save (again, no actual modifications), go back to the dos prompt, do
type m.txt and this time it works! I get "mañana". When notepad opens
the file, the encoding is already UTF-8, so short of a UTF-8 bom being
added to the file, I don't know what happens when I save the
unmodified file. Also, I would think that the python script should
save a valid utf-8 file in the first place...
What's going on here?
Regards,
Guillermo
I would appreciate if someone could point out what am I doing wrong
here.
Basically, I need to save a string containing non-ascii characters to
a file encoded in utf-8.
If I stay in python, everything seems to work fine, but the moment I
try to read the file with another Windows program, everything goes to
hell.
So here's the script unicode2file.py:
===================================================================
# encoding=utf-8
import codecs
f = codecs.open("m.txt",mode="w", encoding="utf8")
a = u"mañana"
print repr(a)
f.write(a)
f.close()
f = codecs.open("m.txt", mode="r", encoding="utf8")
a = f.read()
print repr(a)
f.close()
===================================================================
That gives the expected output, both calls to repr() yield the same
result.
But now, if I do type me.txt in cmd.exe, I get garbled characters
instead of "ñ".
I then open the file with my editor (Sublime Text), and I see "mañana"
normally. I save (nothing to be saved, really), go back to the dos
prompt, do type m.txt and I get again the same garbled characters.
I then open the file m.txt with notepad, and I see "mañana" normally.
I save (again, no actual modifications), go back to the dos prompt, do
type m.txt and this time it works! I get "mañana". When notepad opens
the file, the encoding is already UTF-8, so short of a UTF-8 bom being
added to the file, I don't know what happens when I save the
unmodified file. Also, I would think that the python script should
save a valid utf-8 file in the first place...
What's going on here?
Regards,
Guillermo