P
Perry Johnson
Python's re module does not support POSIX character classes, for
example [:alpha:]. It is, of course, trivial to simulate them using
character ranges when the text to be matched uses the ASCII character
set. Sadly, my problem is that I need to process Unicode text. The re
module has its own character classes that do support Unicode, however
they are not sufficient.
I would find it extremely useful if there was information on the
Unicode code points that map to each of the POSIX character classes.
example [:alpha:]. It is, of course, trivial to simulate them using
character ranges when the text to be matched uses the ASCII character
set. Sadly, my problem is that I need to process Unicode text. The re
module has its own character classes that do support Unicode, however
they are not sufficient.
I would find it extremely useful if there was information on the
Unicode code points that map to each of the POSIX character classes.