i18n for Character Classes in Patterns.

Luke · Feb 18, 2008

Hi

Does anybody know how to match character classes in i18n mode? E.g. ü
(Unicode 00FC) (a german umlaut) should actually be matched by the
pattern "[a-z]", it does not.

Regards,
Lukas

Lothar Kimmeringer · Feb 18, 2008

Luke said:
Does anybody know how to match character classes in i18n mode? E.g. ü
(Unicode 00FC) (a german umlaut) should actually be matched by the
pattern "[a-z]", it does not.

Of course not, because [a-z] means all characters between (including)
'a' and 'z'. And ü is not part of that - 'A', either by the way.

See http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html#sum
for a list of available patterns, but I think a
\p{javaLowerCase} should do the trick to match all unicode-
characters. If you simply want to check for umlauts you have to
add them "manually" with [a-zäöüß]

If you want to check for specific Unicode-Blocks you can do
that by e.g. \p{InGreek} for the Greek-Block. See the linked
API-description above for more informations.

Regards, Lothar
--
Lothar Kimmeringer E-Mail: (e-mail address removed)
PGP-encrypted mails preferred (Key-ID: 0x8BC3CD81)

Always remember: The answer is forty-two, there can only be wrong
questions!

Testing I18N web applications.	1	Feb 7, 2008
New rules for literal characters in source code?	11	Feb 19, 2011
matching patterns after regex?	8	Aug 12, 2009
Regular Expression for the special character "\|" pipe	7	May 27, 2014
Enabling the use of POSIX character classes in Python	3	Dec 11, 2010
Design patterns using anonymous inner classes	8	Jan 11, 2006
i18n - Newlines in ResourceBundle messages	2	Dec 22, 2003
Display Byte value for GB2123 Character	3	May 26, 2010

i18n for Character Classes in Patterns.

Luke

Lothar Kimmeringer

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads