Unicode property problems in RegExp

vnick · Jun 3, 2005

I have a problem with a relatively simple RE with Unicode Properties:

[280] tmp% perl -Dr -e '$f = "KURZ_1"; if ($f =~ /[_\d\p{IsUpper}]+/)
{print"$&\n"};' |& less

[281] tmp%

The RegExp debugger output shows this:

Matching REx `[_\d\p{IsUpper}]+' against `KURZ_1'
Matching stclass `ANYOF[0-9_{unicode}+utf8::IsDigit +utf8::IsUpper]'
against `KURZ_1'
Guessing start of match, REx `^_<' against
`/cadappl/perl/5.8.5/lib/5.8.5/utf8.pm'...
String not equal...
Match rejected by optimizer

So I made some more trials with UP RE's and their output is even
stranger:

[289] tmp% perl -e '$f = "KURZ"; if ($f =~ /[\p{IsUpper}]+/) {print
"$&\n"};'

@8(p
[290] tmp% perl -e '$f = "KURZ"; if ($f =~ /\p{IsUpper}+/) {print
"$&\n"};'

[291] tmp% perl -e '$f = "KURZ"; if ($f =~ /\p{IsLu}+/) {print
"$&\n"};'
@4v
[292] tmp% perl -e '$f = "KURZ"; if ($f =~ /(\p{IsLu}+)/) {print
"$1\n"};'
@1îd
[293] tmp% perl -e '$f = "KURZ"; if ($f =~ /[A-Z]+/) {print "$&\n"};'
KURZ
[294] tmp%

Anybody out there who can tell me what is wrong here?
Thanks
vnick

How to replace UniCode representation with actual character?	6	Dec 18, 2013
unicode (hebrew) regexp search for new line headaches	10	Dec 5, 2005
regexp(ing) Backus-Naurish expressions ...	7	Mar 13, 2013
unexplained warning message in m{...} regexp	34	Apr 24, 2009
Regexp discovery - using ^ with /m is a time sink	5	Feb 14, 2009
Unicode statistics (uses Data::Alias)	0	Jun 7, 2006
unicode: equal strings give different results?	2	Sep 27, 2004
Finding all the links in a Unix file/directory path	3	May 12, 2009

Unicode property problems in RegExp

vnick

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads