S
Steven D'Aprano
I have some custom Unicode error handlers, and I'm looking for advice on
the right API for dealing with them.
I have a module containing custom Unicode error handlers. For example:
# Python 3
import unicodedata
def namereplace_errors(exc):
c = exc.object[exc.start]
try:
name = unicodedata.name(c)
except (KeyError, ValueError):
n = ord(c)
if n <= 0xFFFF:
replace = "\\u%04x"
else:
assert n <= 0x10FFFF
replace = "\\U%08x"
replace = replace % n
else:
replace = "\\N{%s}" % name
return replace, exc.start + 1
Before I can use the error handler, I need to register it using this:
import codecs
codecs.register_error('namereplace', namereplace_errors)
And now:
py> 'abc\u04F1'.encode('ascii', 'namereplace')
b'abc\\N{CYRILLIC SMALL LETTER U WITH DIAERESIS}'
Now, my question:
Should the module holding the error handlers automatically register them?
In other words, if I do:
import error_handlers
just importing it will have the side-effect of registering the error
handlers. Normally, I dislike imports that have side-effects of this
sort, but I'm not sure that the alternative is better, that is, to put
responsibility on the caller to register some, or all, of the handlers:
import error_handlers
error_handlers.register(error_handlers.namereplace_errors)
error_handlers.register_all()
As far as I know, there is no way to find out what error handlers are
registered, and no way to deregister one after it has been registered.
Which API would you prefer if you were using this module?
the right API for dealing with them.
I have a module containing custom Unicode error handlers. For example:
# Python 3
import unicodedata
def namereplace_errors(exc):
c = exc.object[exc.start]
try:
name = unicodedata.name(c)
except (KeyError, ValueError):
n = ord(c)
if n <= 0xFFFF:
replace = "\\u%04x"
else:
assert n <= 0x10FFFF
replace = "\\U%08x"
replace = replace % n
else:
replace = "\\N{%s}" % name
return replace, exc.start + 1
Before I can use the error handler, I need to register it using this:
import codecs
codecs.register_error('namereplace', namereplace_errors)
And now:
py> 'abc\u04F1'.encode('ascii', 'namereplace')
b'abc\\N{CYRILLIC SMALL LETTER U WITH DIAERESIS}'
Now, my question:
Should the module holding the error handlers automatically register them?
In other words, if I do:
import error_handlers
just importing it will have the side-effect of registering the error
handlers. Normally, I dislike imports that have side-effects of this
sort, but I'm not sure that the alternative is better, that is, to put
responsibility on the caller to register some, or all, of the handlers:
import error_handlers
error_handlers.register(error_handlers.namereplace_errors)
error_handlers.register_all()
As far as I know, there is no way to find out what error handlers are
registered, and no way to deregister one after it has been registered.
Which API would you prefer if you were using this module?