problem with logging exceptions with non-ASCII __str__ result

K

Karsten Hilbert

Dear all,

I have a problem with logging an exception.

environment:

Python 2.4, Debian testing

${LANGUAGE} not set
${LC_ALL} not set
${LC_CTYPE} not set
${LANG}=de_DE.UTF-8

activating user-default locale with <locale.setlocale(locale.LC_ALL, '')> returns: [de_DE.UTF-8]

locale.getdefaultlocale() - default (user) locale: ('de_DE', 'utf-8')
encoding sanity check (also check "locale.nl_langinfo(CODESET)" below):
sys.getdefaultencoding(): [ascii]
locale.getpreferredencoding(): [UTF-8]
locale.getlocale()[1]: [utf-8]
sys.getfilesystemencoding(): [UTF-8]

_logfile = codecs.open(filename = _logfile_name, mode = 'wb', encoding = 'utf8', errors = 'replace')

logging.basicConfig (
format = fmt,
datefmt = '%Y-%m-%d %H:%M:%S',
level = logging.DEBUG,
stream = _logfile
)

I am using psycopg2 which in turn uses libpq. When trying to
connect to the database and providing faulty authentication
information:

try:
... try to connect ...
except StandardError, e:
_log.error(u"login attempt %s/%s failed:", attempt+1, max_attempts)

print "exception type :", type(e)
print "exception dir :", dir(e)
print "exception args :", e.args
msg = e.args[0]
print "msg type :", type(msg)
print "msg.decode(utf8):", msg.decode('utf8')
t,v,tb = sys.exc_info()
print "sys.exc_info() :", t, v
_log.exception(u'exception detected')

the following output is generated:

exception type : <type 'instance'>
exception dir : ['__doc__', '__getitem__', '__init__', '__module__', '__str__', 'args']
exception args : ('FATAL: Passwort-Authentifizierung f\xc3\xbcr Benutzer \xc2\xbbany-doc\xc2\xab fehlgeschlagen\n',)
msg type : <type 'str'>
msg.decode(utf8): FATAL: Passwort-Authentifizierung für Benutzer »any-doc« fehlgeschlagen

sys.exc_info() : psycopg2.OperationalError FATAL: Passwort-Authentifizierung für Benutzer »any-doc« fehlgeschlagen

Traceback (most recent call last):
File "/usr/lib/python2.4/logging/__init__.py", line 739, in emit
self.stream.write(fs % msg.encode("UTF-8"))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 191: ordinal not in range(128)

Now, the string "FATAL: Passwort-Auth..." comes from libpq
via psycopg2. It is translated to German via gettext within
libpq (at the C level). As we can see it is of type string.
I know from the environment that it is likely encoded in
utf8 manually applying which (see the decode call) succeeds.

On _log.exception() the logging module wants to output the
message as encoded as utf8 (that's what the log file is set
up as). So it'll look at the string, decide it is of type
"str" and decode with the *Python default encoding* to get
to type "unicode". Following which it'll re-encode with utf8
to get back to type "str" ready for outputting to the log
file.

However, since the Python default encoding is "ascii" that
conversion fails.

Changing the Python default encoding isn't really an option
as it is advocated against and would have to be made to work
reliably on other users machines.

One could, of course, write code to specifically check for
this condition and manually pre-convert the message string
to unicode but that seems not as things should be.

How can I cleanly handle this situation ?

Should the logging module internally use an encoding gotten
from the locale module rather than the default string encoding ?

Karsten
 
V

Vinay Sajip

Dear all,

I have a problem withloggingan exception.

environment:

Python 2.4, Debian testing

${LANGUAGE} not set
${LC_ALL} not set
${LC_CTYPE} not set
${LANG}=de_DE.UTF-8

activating user-default locale with <locale.setlocale(locale.LC_ALL, '')> returns: [de_DE.UTF-8]

locale.getdefaultlocale() - default (user) locale: ('de_DE', 'utf-8')
encoding sanity check (also check "locale.nl_langinfo(CODESET)" below):
sys.getdefaultencoding(): [ascii]
locale.getpreferredencoding(): [UTF-8]
locale.getlocale()[1]: [utf-8]
sys.getfilesystemencoding(): [UTF-8]

_logfile = codecs.open(filename = _logfile_name, mode = 'wb', encoding = 'utf8', errors = 'replace')

logging.basicConfig (
format = fmt,
datefmt = '%Y-%m-%d %H:%M:%S',
level =logging.DEBUG,
stream = _logfile
)

I am using psycopg2 which in turn uses libpq. When trying to
connect to the database and providing faulty authentication
information:

try:
... try to connect ...
except StandardError, e:
_log.error(u"login attempt %s/%s failed:", attempt+1, max_attempts)

print "exception type :", type(e)
print "exception dir :", dir(e)
print "exception args :", e.args
msg = e.args[0]
print "msg type :", type(msg)
print "msg.decode(utf8):", msg.decode('utf8')
t,v,tb = sys.exc_info()
print "sys.exc_info() :", t, v
_log.exception(u'exception detected')

the following output is generated:

exception type : <type 'instance'>
exception dir : ['__doc__', '__getitem__', '__init__', '__module__', '__str__', 'args']
exception args : ('FATAL: Passwort-Authentifizierung f\xc3\xbcr Benutzer \xc2\xbbany-doc\xc2\xab fehlgeschlagen\n',)
msg type : <type 'str'>
msg.decode(utf8): FATAL: Passwort-Authentifizierung für Benutzer »any-doc« fehlgeschlagen

sys.exc_info() : psycopg2.OperationalError FATAL: Passwort-Authentifizierung für Benutzer »any-doc« fehlgeschlagen

Traceback (most recent call last):
File "/usr/lib/python2.4/logging/__init__.py", line 739, in emit
self.stream.write(fs % msg.encode("UTF-8"))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 191: ordinal not in range(128)

Now, the string "FATAL: Passwort-Auth..." comes from libpq
via psycopg2. It is translated to German via gettext within
libpq (at the C level). As we can see it is of type string.
I know from the environment that it is likely encoded in
utf8 manually applying which (see the decode call) succeeds.

On _log.exception() theloggingmodule wants to output the
message as encoded as utf8 (that's what the log file is set
up as). So it'll look at the string, decide it is of type
"str" and decode with the *Python default encoding* to get
to type "unicode". Following which it'll re-encode with utf8
to get back to type "str" ready for outputting to the log
file.

However, since the Python default encoding is "ascii" that
conversion fails.

Changing the Python default encoding isn't really an option
as it is advocated against and would have to be made to work
reliably on other users machines.

One could, of course, write code to specifically check for
this condition and manually pre-convert the message string
to unicode but that seems not as things should be.

How can I cleanly handle this situation ?

Should theloggingmodule internally use an encoding gotten
from the locale module rather than the default string encoding ?

Karsten

Please reduce to a minimal program which demonstrates the issue and
log an issue on bugs.python.org.

Best regards,

Vinay Sajip
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,962
Messages
2,570,134
Members
46,690
Latest member
MacGyver

Latest Threads

Top