K
Karsten Hilbert
Dear all,
I have a problem with logging an exception.
environment:
Python 2.4, Debian testing
${LANGUAGE} not set
${LC_ALL} not set
${LC_CTYPE} not set
${LANG}=de_DE.UTF-8
activating user-default locale with <locale.setlocale(locale.LC_ALL, '')> returns: [de_DE.UTF-8]
locale.getdefaultlocale() - default (user) locale: ('de_DE', 'utf-8')
encoding sanity check (also check "locale.nl_langinfo(CODESET)" below):
sys.getdefaultencoding(): [ascii]
locale.getpreferredencoding(): [UTF-8]
locale.getlocale()[1]: [utf-8]
sys.getfilesystemencoding(): [UTF-8]
_logfile = codecs.open(filename = _logfile_name, mode = 'wb', encoding = 'utf8', errors = 'replace')
logging.basicConfig (
format = fmt,
datefmt = '%Y-%m-%d %H:%M:%S',
level = logging.DEBUG,
stream = _logfile
)
I am using psycopg2 which in turn uses libpq. When trying to
connect to the database and providing faulty authentication
information:
try:
... try to connect ...
except StandardError, e:
_log.error(u"login attempt %s/%s failed:", attempt+1, max_attempts)
print "exception type :", type(e)
print "exception dir :", dir(e)
print "exception args :", e.args
msg = e.args[0]
print "msg type :", type(msg)
print "msg.decode(utf8):", msg.decode('utf8')
t,v,tb = sys.exc_info()
print "sys.exc_info() :", t, v
_log.exception(u'exception detected')
the following output is generated:
exception type : <type 'instance'>
exception dir : ['__doc__', '__getitem__', '__init__', '__module__', '__str__', 'args']
exception args : ('FATAL: Passwort-Authentifizierung f\xc3\xbcr Benutzer \xc2\xbbany-doc\xc2\xab fehlgeschlagen\n',)
msg type : <type 'str'>
msg.decode(utf8): FATAL: Passwort-Authentifizierung für Benutzer »any-doc« fehlgeschlagen
sys.exc_info() : psycopg2.OperationalError FATAL: Passwort-Authentifizierung für Benutzer »any-doc« fehlgeschlagen
Traceback (most recent call last):
File "/usr/lib/python2.4/logging/__init__.py", line 739, in emit
self.stream.write(fs % msg.encode("UTF-8"))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 191: ordinal not in range(128)
Now, the string "FATAL: Passwort-Auth..." comes from libpq
via psycopg2. It is translated to German via gettext within
libpq (at the C level). As we can see it is of type string.
I know from the environment that it is likely encoded in
utf8 manually applying which (see the decode call) succeeds.
On _log.exception() the logging module wants to output the
message as encoded as utf8 (that's what the log file is set
up as). So it'll look at the string, decide it is of type
"str" and decode with the *Python default encoding* to get
to type "unicode". Following which it'll re-encode with utf8
to get back to type "str" ready for outputting to the log
file.
However, since the Python default encoding is "ascii" that
conversion fails.
Changing the Python default encoding isn't really an option
as it is advocated against and would have to be made to work
reliably on other users machines.
One could, of course, write code to specifically check for
this condition and manually pre-convert the message string
to unicode but that seems not as things should be.
How can I cleanly handle this situation ?
Should the logging module internally use an encoding gotten
from the locale module rather than the default string encoding ?
Karsten
I have a problem with logging an exception.
environment:
Python 2.4, Debian testing
${LANGUAGE} not set
${LC_ALL} not set
${LC_CTYPE} not set
${LANG}=de_DE.UTF-8
activating user-default locale with <locale.setlocale(locale.LC_ALL, '')> returns: [de_DE.UTF-8]
locale.getdefaultlocale() - default (user) locale: ('de_DE', 'utf-8')
encoding sanity check (also check "locale.nl_langinfo(CODESET)" below):
sys.getdefaultencoding(): [ascii]
locale.getpreferredencoding(): [UTF-8]
locale.getlocale()[1]: [utf-8]
sys.getfilesystemencoding(): [UTF-8]
_logfile = codecs.open(filename = _logfile_name, mode = 'wb', encoding = 'utf8', errors = 'replace')
logging.basicConfig (
format = fmt,
datefmt = '%Y-%m-%d %H:%M:%S',
level = logging.DEBUG,
stream = _logfile
)
I am using psycopg2 which in turn uses libpq. When trying to
connect to the database and providing faulty authentication
information:
try:
... try to connect ...
except StandardError, e:
_log.error(u"login attempt %s/%s failed:", attempt+1, max_attempts)
print "exception type :", type(e)
print "exception dir :", dir(e)
print "exception args :", e.args
msg = e.args[0]
print "msg type :", type(msg)
print "msg.decode(utf8):", msg.decode('utf8')
t,v,tb = sys.exc_info()
print "sys.exc_info() :", t, v
_log.exception(u'exception detected')
the following output is generated:
exception type : <type 'instance'>
exception dir : ['__doc__', '__getitem__', '__init__', '__module__', '__str__', 'args']
exception args : ('FATAL: Passwort-Authentifizierung f\xc3\xbcr Benutzer \xc2\xbbany-doc\xc2\xab fehlgeschlagen\n',)
msg type : <type 'str'>
msg.decode(utf8): FATAL: Passwort-Authentifizierung für Benutzer »any-doc« fehlgeschlagen
sys.exc_info() : psycopg2.OperationalError FATAL: Passwort-Authentifizierung für Benutzer »any-doc« fehlgeschlagen
Traceback (most recent call last):
File "/usr/lib/python2.4/logging/__init__.py", line 739, in emit
self.stream.write(fs % msg.encode("UTF-8"))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 191: ordinal not in range(128)
Now, the string "FATAL: Passwort-Auth..." comes from libpq
via psycopg2. It is translated to German via gettext within
libpq (at the C level). As we can see it is of type string.
I know from the environment that it is likely encoded in
utf8 manually applying which (see the decode call) succeeds.
On _log.exception() the logging module wants to output the
message as encoded as utf8 (that's what the log file is set
up as). So it'll look at the string, decide it is of type
"str" and decode with the *Python default encoding* to get
to type "unicode". Following which it'll re-encode with utf8
to get back to type "str" ready for outputting to the log
file.
However, since the Python default encoding is "ascii" that
conversion fails.
Changing the Python default encoding isn't really an option
as it is advocated against and would have to be made to work
reliably on other users machines.
One could, of course, write code to specifically check for
this condition and manually pre-convert the message string
to unicode but that seems not as things should be.
How can I cleanly handle this situation ?
Should the logging module internally use an encoding gotten
from the locale module rather than the default string encoding ?
Karsten