logging of strings with broken encoding

Thomas Guettler · Jul 2, 2009

Hi,

I have bug in my code, which results in the same error has this one:

https://bugs.launchpad.net/bzr/+bug/295653
{{{
Traceback (most recent call last):
File "/usr/lib/python2.6/logging/__init__.py", line 765, in emit
self.stream.write(fs % msg.encode("UTF-8"))
..
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 8: ordinal not in range(128)
}}}

I run Python 2.6. In SVN the code is the same (StreamHandler ... def emit...):
http://svn.python.org/view/python/b...ogging/__init__.py?revision=72507&view=markup

I think msg.encode("UTF-8", 'backslashreplace') would be better here.

What do you think?

Should I fill a bugreport?

Thomas

David Smith · Jul 2, 2009

Thomas said:
Hi,

I have bug in my code, which results in the same error has this one:

https://bugs.launchpad.net/bzr/+bug/295653
{{{
Traceback (most recent call last):
File "/usr/lib/python2.6/logging/__init__.py", line 765, in emit
self.stream.write(fs % msg.encode("UTF-8"))
..
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 8: ordinal not in range(128)
}}}

I run Python 2.6. In SVN the code is the same (StreamHandler ... def emit...):
http://svn.python.org/view/python/b...ogging/__init__.py?revision=72507&view=markup

I think msg.encode("UTF-8", 'backslashreplace') would be better here.

What do you think?

Should I fill a bugreport?

Thomas

I think you have to decode it first using the strings original encoding
whether that be cp1252 or mac-roman or any of the other 8-bit encodings.
Once that's done, you can encode in UTF-8

--David

Peter Otten · Jul 2, 2009

Thomas said:
I have bug in my code, which results in the same error has this one:

https://bugs.launchpad.net/bzr/+bug/295653
{{{
Traceback (most recent call last):
File "/usr/lib/python2.6/logging/__init__.py", line 765, in emit
self.stream.write(fs % msg.encode("UTF-8"))
..
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 8:
ordinal not in range(128) }}}

I run Python 2.6. In SVN the code is the same (StreamHandler ... def
emit...):
http://svn.python.org/view/python/branches/release26- maint/Lib/logging/__init__.py?revision=72507&view=markup

I think msg.encode("UTF-8", 'backslashreplace') would be better here.

What do you think?

That won't help. It's a *decoding* error. You are feeding it a non-ascii
byte string.

Peter

Lie Ryan · Jul 2, 2009

Thomas said:
Hi,

I have bug in my code, which results in the same error has this one:

https://bugs.launchpad.net/bzr/+bug/295653
{{{
Traceback (most recent call last):
File "/usr/lib/python2.6/logging/__init__.py", line 765, in emit
self.stream.write(fs % msg.encode("UTF-8"))
..
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 8: ordinal not in range(128)
}}}

What's the encoding of self.stream? Is it sys.stdout/sys.stderr or a
file object?

Thomas Guettler · Jul 2, 2009

My quick fix is this:

class MyFormatter(logging.Formatter):
def format(self, record):
msg=logging.Formatter.format(self, record)
if isinstance(msg, str):
msg=msg.decode('utf8', 'replace')
return msg

But I still think handling of non-ascii byte strings should be better.
A broken logging message is better than none.

And, if there is a UnicodeError, handleError() should not send the message
to sys.stderr, but it should use emit() of the current handler.

In my case sys.stderr gets discarded. Its very hard to debug, if you don't
see any logging messages.

Thomas

Stefan Behnel · Jul 2, 2009

Thomas said:
My quick fix is this:

class MyFormatter(logging.Formatter):
def format(self, record):
msg=logging.Formatter.format(self, record)
if isinstance(msg, str):
msg=msg.decode('utf8', 'replace')
return msg

But I still think handling of non-ascii byte strings should be better.
A broken logging message is better than none.

Erm, may I note that this is not a problem in the logging library but in
the code that uses it? How should the logging library know what you meant
by passing that byte string in the first place? And where is the difference
between accidentally passing a byte string and accidentally passing another
non-printable object? Handling this "better" may simply hide the bugs in
your code, I don't find that's any "better" at all.

Anyway, this has been fixed in Py3.

Stefan

Lie Ryan · Jul 2, 2009

Thomas said:
My quick fix is this:

class MyFormatter(logging.Formatter):
def format(self, record):
msg=logging.Formatter.format(self, record)
if isinstance(msg, str):
msg=msg.decode('utf8', 'replace')
return msg

But I still think handling of non-ascii byte strings should be better.
A broken logging message is better than none.

The problem is, python 2.x assumed the default encoding of `ascii`
whenever you don't explicitly mention the encoding, and your code
apparently broke with that assumption. I haven't looked at your code,
but others have suggested that you've fed the logging module with
non-ascii byte strings. The logging module can only work with 1) unicode
string, 2) ascii-encoded byte string

If you want a quick fix, you may be able to get away with repr()-ing
your log texts. A proper fix, however, is to pass a unicode string to
the logging module instead.
Traceback (most recent call last):
File "/usr/lib64/python2.6/logging/__init__.py", line 773, in emit
stream.write(fs % msg.encode("UTF-8"))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd1 in position 13:
ordinal not in range(128)WARNING:root:Ñ‹

Thomas Guettler · Jul 3, 2009

Stefan said:
Erm, may I note that this is not a problem in the logging library but in
the code that uses it?

I know that my code passes the broken string to the logging module. But maybe
I get the non-ascii byte string from a third party (psycopg2 sometime passes
latin1 byte strings from postgres in error messages).

I like Python very much because "it refused to guess". But in this case, "best effort"
is a better approach.

It worked in 2.5 and will in py3k. I think it is a bug, that it does not in 2.6.

Thomas

Lie Ryan · Jul 3, 2009

Thomas said:
I know that my code passes the broken string to the logging module. But maybe
I get the non-ascii byte string from a third party (psycopg2 sometime passes
latin1 byte strings from postgres in error messages).

If the database contains non-ascii byte string, then you could repr()
them before logging (repr also adds some niceties such as quotes). I
think that's the best solution, unless you want to decode the byte
string (which might be an overkill, depending on the situation).

I like Python very much because "it refused to guess". But in this case, "best effort"
is a better approach.

One time it refused to guess, then the next time it tries best effort. I
don't think Guido liked such inconsistency.

It worked in 2.5 and will in py3k. I think it is a bug, that it does not in 2.6.

In python 3.x, the default string is unicode string. If it works in
python 2.5, then it is a bug in 2.5

logging module and binary strings	1	Jul 1, 2009
Logging library unicode problem	0	Aug 13, 2008
problem with logging exceptions with non-ASCII __str__ result	1	Jan 14, 2008
Use of logging module to track TODOs	0	Nov 27, 2013
Permission denied and lock issue with multiprocess logging	1	Jun 12, 2011
Permission dened and lock runtime error with multiprocess logging	0	Jun 11, 2011
multiple threads with Logging: ValueError: I/O operation on closedfile	2	Nov 8, 2008
Double decoding of strings??	1	Dec 5, 2005

logging of strings with broken encoding

Thomas Guettler

David Smith

Peter Otten

Lie Ryan

Thomas Guettler

Stefan Behnel

Lie Ryan

Thomas Guettler

Lie Ryan

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads