Regex for unicode letter characters

S

schickb

I need a regex that will match strings containing only unicode letter
characters (not including numeric or the _ character). I was surprised
to find the 're' module does not include a special character class for
this already (python 2.6). Or did I miss something?

It seems like this would be a very common need. Is the following the
only option to generate the character class (based on an old post by
Martin v. Löwis )?

import unicodedata, sys

def letters():
start = end = None
result = []
for index in xrange(sys.maxunicode + 1):
c = unichr(index)
if unicodedata.category(c)[0] == 'L':
if start is None:
start = end = c
else:
end = c
elif start:
if start == end:
result.append(start)
else:
result.append(start + "-" + end)
start = None
return u'[' + u''.join(result) + u']'

Seems rather cumbersome.

-Brad
 
M

MRAB

schickb said:
I need a regex that will match strings containing only unicode letter
characters (not including numeric or the _ character). I was surprised
to find the 're' module does not include a special character class for
this already (python 2.6). Or did I miss something?

It seems like this would be a very common need. Is the following the
only option to generate the character class (based on an old post by
Martin v. Löwis )?
[snip]
Basically, yes.

The re module was last worked on in 2003 (remember it's all voluntary!).
Such omissions should be addressed in Python 2.7.
 
S

Steve Holden

MRAB said:
schickb said:
I need a regex that will match strings containing only unicode letter
characters (not including numeric or the _ character). I was surprised
to find the 're' module does not include a special character class for
this already (python 2.6). Or did I miss something?

It seems like this would be a very common need. Is the following the
only option to generate the character class (based on an old post by
Martin v. Löwis )?
[snip]
Basically, yes.

The re module was last worked on in 2003 (remember it's all voluntary!).
Such omissions should be addressed in Python 2.7.

By "should be" do you mean "ought to be (but I have no intention of
helping)", "are expected to be (but someone else will be doing the
work", "it's on my list and I am expecting to get finished in time for
2.7 integration" or something else?

regards
Steve
 
M

MRAB

Steve said:
MRAB said:
schickb said:
I need a regex that will match strings containing only unicode letter
characters (not including numeric or the _ character). I was surprised
to find the 're' module does not include a special character class for
this already (python 2.6). Or did I miss something?

It seems like this would be a very common need. Is the following the
only option to generate the character class (based on an old post by
Martin v. Löwis )?
[snip]
Basically, yes.

The re module was last worked on in 2003 (remember it's all voluntary!).
Such omissions should be addressed in Python 2.7.

By "should be" do you mean "ought to be (but I have no intention of
helping)", "are expected to be (but someone else will be doing the
work", "it's on my list and I am expecting to get finished in time for
2.7 integration" or something else?
The third one.
 
S

Steve Holden

MRAB said:
Steve said:
MRAB said:
schickb wrote:
I need a regex that will match strings containing only unicode letter
characters (not including numeric or the _ character). I was surprised
to find the 're' module does not include a special character class for
this already (python 2.6). Or did I miss something?

It seems like this would be a very common need. Is the following the
only option to generate the character class (based on an old post by
Martin v. Löwis )?

[snip]
Basically, yes.

The re module was last worked on in 2003 (remember it's all voluntary!).
Such omissions should be addressed in Python 2.7.

By "should be" do you mean "ought to be (but I have no intention of
helping)", "are expected to be (but someone else will be doing the
work", "it's on my list and I am expecting to get finished in time for
2.7 integration" or something else?
The third one.

Well, that's good news. Let me know if you need help.

regards
Steve
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,969
Messages
2,570,161
Members
46,709
Latest member
AustinMudi

Latest Threads

Top