D
David Hag
Hi,
I'm a bit puzzeled. I'm stuck in the place where one has to deal with
different sets of charachters, and it sucks.
Here goes my problem:
I want to do lots of transformations on Sweish texts, using s// and
m// etc. However, perl's \w for example does not cover any of the
Swedish chars "å", "ä" or "ö", besides missing any accented char as
"á" etc. This is a REAL problem for me, and I could make nasty work
arounds like defining a set of characters to look for each time needed
(like: [A-Za-zÅÄÖåäöéèáÀàÀ...]), but that is just crazy.
As far as I understand it should be possible to tell perl that I want
to use a specific locale, and that e.g. \w then should know about the
correct char set. First of all, my locale is already
Swedish-ISO-8859-1 on my Linux system. I tried to use the POSIX
module:
use POSIX qw(setlocale LC_ALL);
setlocale(LC_ALL, "sv_SE.iso-8859-1");
No complaints from perl, but still doing "m/\w+/" misses any word
containing "åäö"...
Any tips on how to solve this in a general way will be very much
appreciated!
Thanks,
David
I'm a bit puzzeled. I'm stuck in the place where one has to deal with
different sets of charachters, and it sucks.
Here goes my problem:
I want to do lots of transformations on Sweish texts, using s// and
m// etc. However, perl's \w for example does not cover any of the
Swedish chars "å", "ä" or "ö", besides missing any accented char as
"á" etc. This is a REAL problem for me, and I could make nasty work
arounds like defining a set of characters to look for each time needed
(like: [A-Za-zÅÄÖåäöéèáÀàÀ...]), but that is just crazy.
As far as I understand it should be possible to tell perl that I want
to use a specific locale, and that e.g. \w then should know about the
correct char set. First of all, my locale is already
Swedish-ISO-8859-1 on my Linux system. I tried to use the POSIX
module:
use POSIX qw(setlocale LC_ALL);
setlocale(LC_ALL, "sv_SE.iso-8859-1");
No complaints from perl, but still doing "m/\w+/" misses any word
containing "åäö"...
Any tips on how to solve this in a general way will be very much
appreciated!
Thanks,
David