API for custom Unicode error handlers

  • Thread starter Steven D'Aprano
  • Start date
S

Steven D'Aprano

I have some custom Unicode error handlers, and I'm looking for advice on
the right API for dealing with them.

I have a module containing custom Unicode error handlers. For example:

# Python 3
import unicodedata
def namereplace_errors(exc):
c = exc.object[exc.start]
try:
name = unicodedata.name(c)
except (KeyError, ValueError):
n = ord(c)
if n <= 0xFFFF:
replace = "\\u%04x"
else:
assert n <= 0x10FFFF
replace = "\\U%08x"
replace = replace % n
else:
replace = "\\N{%s}" % name
return replace, exc.start + 1


Before I can use the error handler, I need to register it using this:


import codecs
codecs.register_error('namereplace', namereplace_errors)

And now:

py> 'abc\u04F1'.encode('ascii', 'namereplace')
b'abc\\N{CYRILLIC SMALL LETTER U WITH DIAERESIS}'


Now, my question:

Should the module holding the error handlers automatically register them?
In other words, if I do:

import error_handlers

just importing it will have the side-effect of registering the error
handlers. Normally, I dislike imports that have side-effects of this
sort, but I'm not sure that the alternative is better, that is, to put
responsibility on the caller to register some, or all, of the handlers:

import error_handlers
error_handlers.register(error_handlers.namereplace_errors)
error_handlers.register_all()


As far as I know, there is no way to find out what error handlers are
registered, and no way to deregister one after it has been registered.

Which API would you prefer if you were using this module?
 
C

Chris Angelico

Should the module holding the error handlers automatically register them?
In other words, if I do:

import error_handlers

just importing it will have the side-effect of registering the error
handlers. Normally, I dislike imports that have side-effects of this
sort, but I'm not sure that the alternative is better, that is, to put
responsibility on the caller to register some, or all, of the handlers:

import error_handlers
error_handlers.register(error_handlers.namereplace_errors)
error_handlers.register_all()

Caveat: I don't actually use codecs much, so I don't know the specifics.

I'd be quite happy with importing having a side-effect here. If you
import a module that implements a numeric type, it should immediately
register itself with the Numeric ABC, right? This is IMO equivalent to
that.
As far as I know, there is no way to find out what error handlers are
registered, and no way to deregister one after it has been registered.

The only risk that I see is of an accidental collision. Having a codec
registered that you don't use can't hurt (afaik). Is there any
mechanism for detecting a name collision? If not, I wouldn't worry
about it.

ChrisA
 
E

Ethan Furman

Should the module holding the error handlers automatically register them?

I think it should.

Registration only needs to happen once, the module is useless without being registered, no threads nor processes are
being started, and the only reason to import the module is to get the functionality... isn't it?

What about help(), sphynx (sp?), or other introspection tools?

This sounds similar to cgitb -- another module which you only import if you want the html'ized traceback, and yet it
requires a separate cgitb.enable() call...

I change my mind, it shouldn't.

Throw in a .enable() function and call it good. :)
 
S

Serhiy Storchaka

04.10.13 20:22, Chris Angelico напиÑав(ла):
I'd be quite happy with importing having a side-effect here. If you
import a module that implements a numeric type, it should immediately
register itself with the Numeric ABC, right? This is IMO equivalent to
that.

There is a difference. You can't use a numeric type without importing a
module, but you can use error handler registered outside of your module.

This leads to subtle bugs. Let the A module imports error_handlers and
uses error handle. The module B uses error handle but doesn't import
error_handlers. C.py imports A and B and all works. D.py imports B and A
and fails.
 
S

Serhiy Storchaka

04.10.13 16:56, Steven D'Aprano напиÑав(ла):
I have some custom Unicode error handlers, and I'm looking for advice on
the right API for dealing with them.

I have a module containing custom Unicode error handlers. For example:

# Python 3
import unicodedata
def namereplace_errors(exc):
c = exc.object[exc.start]
try:
name = unicodedata.name(c)
except (KeyError, ValueError):
n = ord(c)
if n <= 0xFFFF:
replace = "\\u%04x"
else:
assert n <= 0x10FFFF
replace = "\\U%08x"
replace = replace % n
else:
replace = "\\N{%s}" % name
return replace, exc.start + 1

I'm planning to built this error handler in 3.4 (see
http://comments.gmane.org/gmane.comp.python.ideas/21296).

Actually Python implementation should looks like:

def namereplace_errors(exc):
if not isinstance(exc, UnicodeEncodeError):
raise exc
replace = []
for c in exc.object[exc.start:exc.end]:
try:
replace.append(r'\N{%s}' % unicodedata.name(c))
except KeyError:
n = ord(c)
if n < 0x100:
replace.append(r'\x%02x' % n)
elif n < 0x10000:
replace.append(r'\u%04x' % n)
else:
replace.append(r'\U%08x' % n)
return ''.join(replace), exc.end
Now, my question:

Should the module holding the error handlers automatically register them?

This question interesting me too.
 
T

Terry Reedy

04.10.13 16:56, Steven D'Aprano напиÑав(ла):
I'm planning to built this error handler in 3.4 (see
http://comments.gmane.org/gmane.comp.python.ideas/21296).

This question interesting me too.

I did not respond on the p-i thread, but +1 for 'namereplace' also. Like
others, I would prefer auto-register unless that creates a problem. If
it is a problem, perhaps the registry mechanism needs improvement. On
the other hand, it is it built-in, it will be pre-registered.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,969
Messages
2,570,161
Members
46,705
Latest member
Stefkari24

Latest Threads

Top