M
Marcel Rodrigues
I'm using Python 3.3 (CPython) and am having trouble getting the standard
gettext module to handle Unicode messages.
My problem can be isolated as follows:
I have 3 files in a folder: greeting.py, greeting.po and msgfmt.py.
-- greeting.py --
import gettext
t = gettext.translation("greeting", "locale", ["pt"])
_ = t.lgettext
print("_charset = {0}\n".format(t._charset))
print(_("hello"))
-- EOF --
-- greeting.po --
msgid ""
msgstr ""
"Project-Id-Version: 1.0\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
msgid "hello"
msgstr "olá"
-- EOF --
msgfmt.py was downloaded from
http://hg.python.org/cpython/file/9e6ead98762e/Tools/i18n/msgfmt.py, since
this tool apparently isn't included in the python3 package available on
Arch Linux official repositories.
It's probably also worth noting that the file greeting.po is encoded itself
as UTF-8.
$ mkdir -p locale/pt/LC_MESSAGES
$ python msgfmt.py -o !$/greeting.mo greeting.po
$ python greeting.py
The output is:
_charset = UTF-8
Traceback (most recent call last):
File "greeting.py", line 7, in <module>
print(_("hello"))
File "/usr/lib/python3.3/gettext.py", line 314, in lgettext
return tmsg.encode(locale.getpreferredencoding())
UnicodeEncodeError: 'ascii' codec can't encode character '\xe1' in position
2: ordinal not in range(128)
My interpretation of this output is that even though gettext correctly
detects the MO file charset as UTF-8, it tries to encode the translated
message with the system's "preferred encoding", which happens to be ASCII.
Anyone know why this happens? Is this a bug on my code? Maybe I have
misunderstood gettext...
Thanks,
Marcel
gettext module to handle Unicode messages.
My problem can be isolated as follows:
I have 3 files in a folder: greeting.py, greeting.po and msgfmt.py.
-- greeting.py --
import gettext
t = gettext.translation("greeting", "locale", ["pt"])
_ = t.lgettext
print("_charset = {0}\n".format(t._charset))
print(_("hello"))
-- EOF --
-- greeting.po --
msgid ""
msgstr ""
"Project-Id-Version: 1.0\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
msgid "hello"
msgstr "olá"
-- EOF --
msgfmt.py was downloaded from
http://hg.python.org/cpython/file/9e6ead98762e/Tools/i18n/msgfmt.py, since
this tool apparently isn't included in the python3 package available on
Arch Linux official repositories.
It's probably also worth noting that the file greeting.po is encoded itself
as UTF-8.
From that folder, I run the following commands:
$ mkdir -p locale/pt/LC_MESSAGES
$ python msgfmt.py -o !$/greeting.mo greeting.po
$ python greeting.py
The output is:
_charset = UTF-8
Traceback (most recent call last):
File "greeting.py", line 7, in <module>
print(_("hello"))
File "/usr/lib/python3.3/gettext.py", line 314, in lgettext
return tmsg.encode(locale.getpreferredencoding())
UnicodeEncodeError: 'ascii' codec can't encode character '\xe1' in position
2: ordinal not in range(128)
My interpretation of this output is that even though gettext correctly
detects the MO file charset as UTF-8, it tries to encode the translated
message with the system's "preferred encoding", which happens to be ASCII.
Anyone know why this happens? Is this a bug on my code? Maybe I have
misunderstood gettext...
Thanks,
Marcel