error messages containing unicode

J

Jim

Hello,

I'm trying to write exception-handling code that is OK in the
presence
of unicode error messages. I seem to have gotten all mixed up and
I'd
appreciate any un-mixing that anyone can give me.

I'm used to writing code like this.

class myException(Exception):
pass

fn='README'
try:
f=open(fn,'r')
except Exception, err:
mesg='unable to open file'+fn+': '+str(err)
raise myException, mesg

But what if fn is non-ascii? The following code raises the
dreaded (to me) UnicodeEncodeError.

class myException(Exception):
pass

def fcnCall():
fn=u'a\N{LATIN SMALL LETTER O WITH DIAERESIS}k'
try:
# was: f=open(fn,'r')
2/0 # just substitute something that will raise an
exception
except Exception, err:
mesg='unable to open file '+fn+': '+str(err)
raise myException, mesg

try:
fcnCall()
except Exception, err:
print 'trouble calling fcnCall: '+str(err)

Maybe my trouble is the "str()", which is supposed to return a
regular
string? (BTW, unicode() makes no difference, and help(BaseException)
didn't give me any inspirations.) So I looked for an end-around past
the
str() call.

As I understand lib/module-exceptions.html, "For class
exceptions, [err] receives the exception instance. If the exception
class is derived from the standard root class BaseException, the
associated value is present as the exception instance's args
attribute.", I should be able to get the string out of err.args. Sure
enough, putting the above text into test.py and changing str(err)
to repr(err.args) yields this.

$ python test.py
trouble calling fcnCall: (u'unable to open file a\xf6k: integer
division or modulo by zero',)

so that changing the above repr(err.args) to err.args[0] gives the
desired result.

$ python test.py
trouble calling fcnCall: unable to open file aök: integer division
or modulo by zero

(In case this doesn't show up as I intended on your screen, I see an
o with a diaeresis in the filename.)

But the documentation "This may be a string or a tuple containing
several
items of information (e.g., an error code and a string explaining the
code)." gives me no reason to believe that all exceptions have the
desired
unicode string as the 0-th element of the tuple. I confess that I'm
unable
to confidently read exceptions.c .

No doubt I've missed something (I googled around the net and on this
group
but I didn't have any luck). I'd be grateful if someone could show
me
striaght.

Thanks,
Jim
 
S

Steven D'Aprano

Hello,

I'm trying to write exception-handling code that is OK in the
presence
of unicode error messages. I seem to have gotten all mixed up and
I'd
appreciate any un-mixing that anyone can give me.
[snip]
Traceback (most recent call last):

Works fine with an ASCII argument, but not with Unicode:
Traceback (most recent call last):
File "<stdin>", line 1, in ?
__main__.MyException>>>

Notice the terminal problem? (The error message doesn't print, and the
prompt ends up stuck after the exception.)

Let's capture the exception and dissect it:
.... except Exception, err:
.... print type(err)
.... print err
....
<type 'instance'>
Traceback (most recent call last):
File "<stdin>", line 4, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in
position 1: ordinal not in range(128)

Now we have the answer: your exception (which just sub-classes Exception)
does the simplest conversion of Unicode to ASCII possible, and when it
hits a character it can't deal with, it barfs. That doesn't happen
until you try to print the exception, not when you create it.

The easiest ways to fix that are:

(1) subclass an exception that already knows about Unicode;

(2) convert the file name to ASCII before you store it; or

(3) add a __str__ method to your exception that is Unicode aware.

I'm going to be lazy and do a real simple-minded version of (2):
.... def __init__(self, arg):
.... self.args = arg.encode('ascii', 'replace')
.... self.unicode_arg = arg # save the original in case


Traceback (most recent call last):
File "<stdin>", line 1, in ?
__main__.MyBetterException: a?k


And now it works.
 
J

Jim

Thank you for the reply. It happens that, as I understand it, none of
the options that you mentioned is a solution for my situation.

The easiest ways to fix that are:

(1) subclass an exception that already knows about Unicode;
But I often raise one of Python's built-in errors. And also, is it
really true that subclassing one of Python's built-ins give me
something that is unicode deficient? I assumed that I had missed
something (because that's happened so many times before :) ).

For instance, I write a lot of CGI and I want to wrap everything in a
try .. except.
try:
main()
except Exception, err:
print "Terrible blunder: ",str(err)
so that the err can be one of my exceptions, or can be one that came
with Python. (And, that I can see, err.args can be either the relevant
string or a tuple containing the relevant string and the documentation
is silent on whether in the built-in exceptions if err.args is a tuple
then the string is guaranteed to be first in the tuple.)
(2) convert the file name to ASCII before you store it; or
I need the non-ascii information, though, which is why I included it
in the error message.
(3) add a __str__ method to your exception that is Unicode aware.
I have two difficulties with this: (1) as above I often raise Python's
built-in exceptions and for those __str__() is what it is, and (2)
this goes against the meaning of __str__() that I find in the
documentation in ref/customization.html which says that the return
value must be a string object. Obviously no one will come to my house
and slap me if I violate that, but I'll venture that it would be odd
if the best practice were to be to do the opposite of the
documentation.
I'm going to be lazy and do a real simple-minded version of (2):

... self.args = arg.encode('ascii', 'replace')
... self.unicode_arg = arg # save the original in case
This is illuminating. How do you know that for exceptions __init__()
should take one non-self argument? I missed finding this information.

Thanks again,
Jim
 
D

Diez B. Roggisch

(2) convert the file name to ASCII before you store it; or
I need the non-ascii information, though, which is why I included it
in the error message.

Then convert it to utf-8, or some encoding you know it will be used by your
terminal.

Diez
 
J

Jim

Then convert it to utf-8, or some encoding you know it will be used by your
terminal.
Thank you for the suggestion. Remember please that I am asking for a
safe way to pull the unicode object from the exception object (derived
from a Python built-in), so I can't store it as unicode first and then
convert to regular string when I need to print it out-- my exact
question is how to get the unicode. So I take your answer to be to
refuse to put in a unicode-not-ascii in there in the first place.

It then seems to me that you are saying that the best practice is that
every function definition should contain a parameter, like so.

def openNewFile(fn,errorEncoding='utf-8'):
:
try:
open(fn,'r')
except Exception, err
raise myException 'unable to open
'+fn.encode(errorEncoding,'replace')

I guess that beyond that passing those parameters and putting encode
on every variable in my routines that occurs in an error message it is
ugly, it seems to me that it violates the principle that you should do
everything inside the program in unicode and only encode at the
instant you need the output, in that the exception object is carrying
around an ascii-not-unicode object.

Jim
 
P

Peter Otten

Jim said:
Thank you for the suggestion. Remember please that I am asking for a
safe way to pull the unicode object from the exception object (derived
from a Python built-in), so I can't store it as unicode first and then
convert to regular string when I need to print it out-- my exact
question is how to get the unicode. So I take your answer to be to
refuse to put in a unicode-not-ascii in there in the first place.

It then seems to me that you are saying that the best practice is that
every function definition should contain a parameter, like so.

def openNewFile(fn,errorEncoding='utf-8'):
:
try:
open(fn,'r')
except Exception, err
raise myException 'unable to open
'+fn.encode(errorEncoding,'replace')

I guess that beyond that passing those parameters and putting encode
on every variable in my routines that occurs in an error message it is
ugly, it seems to me that it violates the principle that you should do
everything inside the program in unicode and only encode at the
instant you need the output, in that the exception object is carrying
around an ascii-not-unicode object.

Printing to a terminal should work:
.... raise Exception(u"gewöhnlich ähnlich üblich")
.... except Exception, e:
.... print e.message
....
gewöhnlich ähnlich üblich

If you're writing to a file you still have to encode explicitly.

Peter
 
J

Jim

... except Exception, e:
... print e.message
...
gewöhnlich ähnlich üblich
Ah, so that's what "If there is a single argument (as is preferred),
it is bound to the message attribute" means. Through some imbecility
I failed to understand it. Thank you.
If you're writing to a file you still have to encode explicitly.
Yes; I know that. It wasn't what to do with the unicode object that
confused me, it was how to get it in the first place.

Much obliged,
Jim
 
S

Steven D'Aprano

Thank you for the reply. It happens that, as I understand it, none of
the options that you mentioned is a solution for my situation.



But I often raise one of Python's built-in errors. And also, is it
really true that subclassing one of Python's built-ins give me
something that is unicode deficient? I assumed that I had missed
something (because that's happened so many times before :) ).

If the built-in isn't Unicode aware, subclassing it won't magically make
it so :)

For instance, I write a lot of CGI and I want to wrap everything in a
try .. except.
try:
main()
except Exception, err:
print "Terrible blunder: ",str(err)
so that the err can be one of my exceptions, or can be one that came
with Python.
(And, that I can see, err.args can be either the relevant
string or a tuple containing the relevant string and the documentation
is silent on whether in the built-in exceptions if err.args is a tuple
then the string is guaranteed to be first in the tuple.)

Does it matter? Just print the tuple.

I need the non-ascii information, though, which is why I included it
in the error message.

If you have the exception captured in "err", then you can grab it with
err.where_i_put_the_unicode.

I have two difficulties with this: (1) as above I often raise Python's
built-in exceptions and for those __str__() is what it is, and

Then don't use the built-in exception. If it won't do what you want it do
do, use something else.

(2) this
goes against the meaning of __str__() that I find in the documentation
in ref/customization.html which says that the return value must be a
string object.

I didn't mean return a unicode object :)

You're absolutely correct. Your __str__ would need to return a string
object, which means encoding the Unicode correctly to get a string object
without raising an exception.

e.g. something like this maybe (untested, not thought-through, probably
won't work correctly, blah blah blah):

def __str__(self):
s = self.args.encode('ascii', 'replace')
return "Unicode error converted to plain ASCII:\n" + s

or whatever encoding scheme works for your application.

[snip]
This is illuminating. How do you know that for exceptions __init__()
should take one non-self argument? I missed finding this information.

It can take whatever you want it to take:

class MyStupidException(Exception):
def __init__(self, dayofweek, breakfast="spam and baked beans",
*everythingelse):
self.day = dayofweek
self.breakfast = breakfast
self.args = everythingelse
def __str__(self):
s = "On %s I ate %s and then an error '%s' occurred." % \
(self.day.title(), self.breakfast, self.args)
return s

Traceback (most recent call last):
File "<stdin>", line 1, in ?
__main__.MyStupidException: On Monday I ate cheese and then an error
'('bad things', 'happened', 2)' occurred.
 
J

Jim

Thanks Steve, I appreciate your patience.

If the built-in isn't Unicode aware, subclassing it won't magically make
it so :)
Oh, I agree. If I have a string mesg that is unicode-not-ascii and I
say
try:
raise Exception mesg
except Exception, err:
print "Trouble"+mesg
then I have problems. I however am under the impression, perhaps
mistaken, that the built-in exceptions in the library will return as
error strings only ascii. (I take as evidence of my understanding
that the built-in exceptions have a __str__() method but do not have
an explicit __unicode__() and so rely on a unicode(err) call being
passed on to __str__(). But as I've said above, I've been wrong so
many times before. ;-)

My main point about the built-ins is that I want to catch them along
with my own exceptions. That's what I meant by the next paragraph.
My class myException is a subclass of Exception so I can catch my
stuff and the standard stuff with an all-in-one panic button.
Does it matter? Just print the tuple.
In truth, it does matter. In that example, for instance, some error
message is passed on to the user and I don't want it to be too bad.
"Database cannot be opened" is better than a "(u'Database cannot be
opened,1)"-type thing. Besides which, Python is a nice language, and
I'm certain there is a nice way to do this; it is just that I'm having
trouble making it out.
If you have the exception captured in "err", then you can grab it with
err.where_i_put_the_unicode.
I want a method of grabbing it that is the same as the method used by
the built-ins, for the uniformity reasons that I gave above. That I
could make out, the documentation was silent on what is the approved
way to grab the string.
Then don't use the built-in exception. If it won't do what you want it do
do, use something else.
I use my exceptions for errors in my logic, etc. But not being
perfect, sometimes I raise exceptions that I had not anticipated;
these are built-ins.
I didn't mean return a unicode object :)

You're absolutely correct. Your __str__ would need to return a string
object, which means encoding the Unicode correctly to get a string object
without raising an exception.

e.g. something like this maybe (untested, not thought-through, probably
won't work correctly, blah blah blah):

def __str__(self):
s = self.args.encode('ascii', 'replace')
return "Unicode error converted to plain ASCII:\n" + s

or whatever encoding scheme works for your application.
I did discuss this suggestion from another person above. That would
mean either (a) throwing away the unicode-not-ascii parts of the error
message (but I want those parts, which is why I put them in there) or
(b) hard-coding the output encoding for error strings in hundreds of
error cases (yes, I have hundreds) or (c) passing as a parameter the
errorEncoding to each function that I write. That last case doesn't
seem to be to be a likely best practice for such a nice language as
Python; I want a way to get the unicode object and go forward in the
program with that.
It can take whatever you want it to take:

class MyStupidException(Exception):
def __init__(self, dayofweek, breakfast="spam and baked beans",
*everythingelse):
self.day = dayofweek
self.breakfast = breakfast
self.args = everythingelse
def __str__(self):
s = "On %s I ate %s and then an error '%s' occurred." % \
(self.day.title(), self.breakfast, self.args)
return s


Traceback (most recent call last):
File "<stdin>", line 1, in ?
__main__.MyStupidException: On Monday I ate cheese and then an error
'('bad things', 'happened', 2)' occurred.
Thank you for the example; I learned something from it. But as I
mentioned above, I need to guard against the system raising built-ins
also and so I am still a bit puzzled by how to get at the error
strings in built-ins.

In case anyone is still reading this :) someone else suggested the
err.message attribute. I had missed that in the documentation
somehow, but on rereading it, I thought he had solved my problem.
However, sadly, I cannot get Python to like a call to err.message:
...........................................................
$ python
Python 2.4.4c1 (#2, Oct 11 2006, 21:51:02)
[GCC 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu5)] on linux2
Type "help", "copyright", "credits" or "license" for more information..... raise Exception, 'this is the error message'
.... except Exception, err:
.... print "result: ",err.message
....
result:
Traceback (most recent call last):
.......................................................................................

So, in case it helps anyone, I am going with this:
......................................................................................
def errorValue(err):
"""Return the string error message from an exception message
string.
err exception instance
Note: I cannot get err.message to work. I sent a note to clp on
Jan 29 2007 with a related query and this is the best that I
figured
out.
"""
return err[0]

class jhError(StandardError):
"""Subclass this to get exceptions that behave correctly when
you do this.
try:
raise subclassOfJhError, 'some error message with unicode
chars'
except subclassOfJhError, err
mesg='the message is '+unicode(err)
"""
def __unicode__(self):
return errorValue(self)

class myException(jhError):
pass
.....................................................................................

No doubt I'll discover what is wrong with it today. :)

Jim
 
J

Jim

Oops, there is a typo in what I wrote above. Sorry.

Oh, I agree. If I have a string mesg that is unicode-not-ascii and I
say
try:
raise Exception mesg
except Exception, err:
print "Trouble"+mesg
then I have problems.
should say:
try:
raise Exception mesg
except Exception, err:
print "Trouble"+str(err)

Jim
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,969
Messages
2,570,161
Members
46,709
Latest member
AustinMudi

Latest Threads

Top