i18n for Character Classes in Patterns.

L

Luke

Hi

Does anybody know how to match character classes in i18n mode? E.g. ü
(Unicode 00FC) (a german umlaut) should actually be matched by the
pattern "[a-z]", it does not.

Regards,
Lukas
 
L

Lothar Kimmeringer

Luke said:
Does anybody know how to match character classes in i18n mode? E.g. ü
(Unicode 00FC) (a german umlaut) should actually be matched by the
pattern "[a-z]", it does not.

Of course not, because [a-z] means all characters between (including)
'a' and 'z'. And ü is not part of that - 'A', either by the way.

See http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html#sum
for a list of available patterns, but I think a
\p{javaLowerCase} should do the trick to match all unicode-
characters. If you simply want to check for umlauts you have to
add them "manually" with [a-zäöüß]

If you want to check for specific Unicode-Blocks you can do
that by e.g. \p{InGreek} for the Greek-Block. See the linked
API-description above for more informations.


Regards, Lothar
--
Lothar Kimmeringer E-Mail: (e-mail address removed)
PGP-encrypted mails preferred (Key-ID: 0x8BC3CD81)

Always remember: The answer is forty-two, there can only be wrong
questions!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top