K
klappnase
Hello all,
I am trying to internationalize my Tkinter program using gettext and
encountered various problems, so it looks like it's not a trivial
task.
After some "research" I made up a few rules for a concept that I hope
lets me avoid further encoding trouble, but I would feel more
confident if some of the experts here would have a look at the
thoughts I made so far and told me if I'm still going wrong somewhere
(BTW, the program is supposed to run on linux only). So here is what I
have so far:
1. use unicode instead of byte strings wherever possible. This can be
a little tricky, because in some situations I cannot know in advance
if a certain string is unicode or byte string; I wrote a helper module
for this which defines convenience methods for fail-safe
decoding/encoding of strings and a Tkinter.UnicodeVar class which I
use to convert user input to unicode on the fly (see the code below).
2. so I will have to call gettext.install() with unicode=1
3. make sure to NEVER mix unicode and byte strings within one
expression
4. in order to maintain code readability it's better to risk excess
decode/encode cycles than having one too few.
5. file operations seem to be delicate; at least I got an error when I
passed a filename that contains special characters as unicode to
os.access(), so I guess that whenever I do file operations
(os.remove(), shutil.copy() ...) the filename should be encoded back
into system encoding before; The filename manipulations by the os.path
methods seem to be simply string manipulations so encoding the
filenames doesn't seem to be necessary.
6. messages that are printed to stdout should be encoded first, too;
the same with strings I use to call external shell commands.
############ file UnicodeHandler.py ##################################
# -*- coding: iso-8859-1 -*-
import Tkinter
import sys
import locale
import codecs
def _find_codec(encoding):
# return True if the requested codec is available, else return
False
try:
codecs.lookup(encoding)
return 1
except LookupError:
print 'Warning: codec %s not found' % encoding
return 0
def _sysencoding():
# try to guess the system default encoding
try:
enc = locale.getpreferredencoding().lower()
if _find_codec(enc):
print 'Setting locale to %s' % enc
return enc
except AttributeError:
# our python is too old, try something else
pass
enc = locale.getdefaultlocale()[1].lower()
if _find_codec(enc):
print 'Setting locale to %s' % enc
return enc
# the last try
enc = sys.stdin.encoding.lower()
if _find_codec(enc):
print 'Setting locale to %s' % enc
return enc
# aargh, nothing good found, fall back to latin1 and hope for the
best
print 'Warning: cannot find usable locale, using latin-1'
return 'iso-8859-1'
sysencoding = _sysencoding()
def fsdecode(input, errors='strict'):
'''Fail-safe decodes a string into unicode.'''
if not isinstance(input, unicode):
return unicode(input, sysencoding, errors)
return input
def fsencode(input, errors='strict'):
'''Fail-safe encodes a unicode string into system default
encoding.'''
if isinstance(input, unicode):
return input.encode(sysencoding, errors)
return input
class UnicodeVar(Tkinter.StringVar):
def __init__(self, master=None, errors='strict'):
Tkinter.StringVar.__init__(self, master)
self.errors = errors
self.trace('w', self._str2unicode)
def _str2unicode(self, *args):
old = self.get()
if not isinstance(old, unicode):
new = fsdecode(old, self.errors)
self.set(new)
#######################################################################
So before I start to mess up all of my code, maybe someone can give me
a hint if I still forgot something I should keep in mind or if I am
completely wrong somewhere.
Thanks in advance
Michael
I am trying to internationalize my Tkinter program using gettext and
encountered various problems, so it looks like it's not a trivial
task.
After some "research" I made up a few rules for a concept that I hope
lets me avoid further encoding trouble, but I would feel more
confident if some of the experts here would have a look at the
thoughts I made so far and told me if I'm still going wrong somewhere
(BTW, the program is supposed to run on linux only). So here is what I
have so far:
1. use unicode instead of byte strings wherever possible. This can be
a little tricky, because in some situations I cannot know in advance
if a certain string is unicode or byte string; I wrote a helper module
for this which defines convenience methods for fail-safe
decoding/encoding of strings and a Tkinter.UnicodeVar class which I
use to convert user input to unicode on the fly (see the code below).
2. so I will have to call gettext.install() with unicode=1
3. make sure to NEVER mix unicode and byte strings within one
expression
4. in order to maintain code readability it's better to risk excess
decode/encode cycles than having one too few.
5. file operations seem to be delicate; at least I got an error when I
passed a filename that contains special characters as unicode to
os.access(), so I guess that whenever I do file operations
(os.remove(), shutil.copy() ...) the filename should be encoded back
into system encoding before; The filename manipulations by the os.path
methods seem to be simply string manipulations so encoding the
filenames doesn't seem to be necessary.
6. messages that are printed to stdout should be encoded first, too;
the same with strings I use to call external shell commands.
############ file UnicodeHandler.py ##################################
# -*- coding: iso-8859-1 -*-
import Tkinter
import sys
import locale
import codecs
def _find_codec(encoding):
# return True if the requested codec is available, else return
False
try:
codecs.lookup(encoding)
return 1
except LookupError:
print 'Warning: codec %s not found' % encoding
return 0
def _sysencoding():
# try to guess the system default encoding
try:
enc = locale.getpreferredencoding().lower()
if _find_codec(enc):
print 'Setting locale to %s' % enc
return enc
except AttributeError:
# our python is too old, try something else
pass
enc = locale.getdefaultlocale()[1].lower()
if _find_codec(enc):
print 'Setting locale to %s' % enc
return enc
# the last try
enc = sys.stdin.encoding.lower()
if _find_codec(enc):
print 'Setting locale to %s' % enc
return enc
# aargh, nothing good found, fall back to latin1 and hope for the
best
print 'Warning: cannot find usable locale, using latin-1'
return 'iso-8859-1'
sysencoding = _sysencoding()
def fsdecode(input, errors='strict'):
'''Fail-safe decodes a string into unicode.'''
if not isinstance(input, unicode):
return unicode(input, sysencoding, errors)
return input
def fsencode(input, errors='strict'):
'''Fail-safe encodes a unicode string into system default
encoding.'''
if isinstance(input, unicode):
return input.encode(sysencoding, errors)
return input
class UnicodeVar(Tkinter.StringVar):
def __init__(self, master=None, errors='strict'):
Tkinter.StringVar.__init__(self, master)
self.errors = errors
self.trace('w', self._str2unicode)
def _str2unicode(self, *args):
old = self.get()
if not isinstance(old, unicode):
new = fsdecode(old, self.errors)
self.set(new)
#######################################################################
So before I start to mess up all of my code, maybe someone can give me
a hint if I still forgot something I should keep in mind or if I am
completely wrong somewhere.
Thanks in advance
Michael