P
peter pilsl
I use unicode and locales (de_AT.UTF-8) and - against warnings of
combing these - everything works fine (finally !!)
I can sort(), lc() and pattermatch but there is one very interesting
problem left with m//i : Characters with multibyte-representation only
match if the pattern is beheaded by something. I would not be suprised
if it never matches, but I am suprised that it only matches "sometimes"
and not even matches itself !!
example: (in german Ä ist the uppercase to ä)
Ä =~ m/Ä/i => no match !!
Ä =~ m/^Ä/i => match !!
Ä =~ m/^ä/i => match
bÄc =~ m/bä/i => match
bÄc =~ m/ä/i => no match
real source:
use locale;
$s="\x{e4}";
utf8::upgrade($s);
print $s=~/$s/i?"ok\n":"fail\n"
==> fail !!!
There is an easy workaround this, by calling lc() to the searchterm and
the pattern first and use m// without the i-flag then, but its an
interesting behaviour.
As I could see till now this phenomen does not depend on the special
locale used (de_AT.UTF-8, en_US, C ....) but occures as soon "use local"
is invoked.
peter
combing these - everything works fine (finally !!)
I can sort(), lc() and pattermatch but there is one very interesting
problem left with m//i : Characters with multibyte-representation only
match if the pattern is beheaded by something. I would not be suprised
if it never matches, but I am suprised that it only matches "sometimes"
and not even matches itself !!
example: (in german Ä ist the uppercase to ä)
Ä =~ m/Ä/i => no match !!
Ä =~ m/^Ä/i => match !!
Ä =~ m/^ä/i => match
bÄc =~ m/bä/i => match
bÄc =~ m/ä/i => no match
real source:
use locale;
$s="\x{e4}";
utf8::upgrade($s);
print $s=~/$s/i?"ok\n":"fail\n"
==> fail !!!
There is an easy workaround this, by calling lc() to the searchterm and
the pattern first and use m// without the i-flag then, but its an
interesting behaviour.
As I could see till now this phenomen does not depend on the special
locale used (de_AT.UTF-8, en_US, C ....) but occures as soon "use local"
is invoked.
peter