D
Daniel Bretoi
how do I match non-english alphabetical characters? Such as the german
double-s ? (ß)
db
double-s ? (ß)
db
Hi,
In message "non-english characters"
|how do I match non-english alphabetical characters? Such as the german
|double-s ? (?)
Which encoding do you wish to use?
Hi,
In message "Re: non-english characters"
|I'm not sure, how can I find out what the germans use? and once I know
|that part, how do I use it?
Ask somebody around you to find out. Then if you're going to use
Unicode (UTF-8), write your script in UTF-8 and invoke Ruby with -Ku
option. If you use ISO-8859-* or any other single byte encoding, you
don't have to do anything special.
matz.
messju mohr said:hmm.
regexp works fine for me with unicode. either with "ruby -Ku" on
startup or with the /u as regexp-option.
but with ISO-8859-+ (1 or 15 in my case) i don't get \w to match
accented characters.
I guess \w is defined in terms of ASCII - and there you don't have "ß", "é"
and similar chars.
yes, it looks like i got confused by the PCRE library which treats \w
according to the current locale. too-many-languages error.
yes, it looks like i got confused by the PCRE library which treats \w
according to the current locale. too-many-languages error.
depends on your definition of 'treats' and 'locale' ;-)
-bash-2.05b$ cat /etc/redhat-release
Red Hat Enterprise Linux WS release 3 (Taroon)
-bash-2.05b$ perl -v | head -2 # why so much output!
This is perl, v5.8.0 built for i386-linux-thread-multi
-bash-2.05b$ ruby -v
ruby 1.6.8 (2002-12-24) [i386-linux-gnu]
# BROKEN "TREATMENT" OF LOCALE
-bash-2.05b$ export LANG=en_US.UTF-8
-bash-2.05b$ echo abc | perl -ne 'print if /[^\s]+/'
-bash-2.05b$ echo abc | ruby -ne 'print if /[^\s]+/'
abc
# THIS IS OK
-bash-2.05b$ export LANG=en_US
-bash-2.05b$ echo abc | perl -ne 'print if /[^\s]+/'
abc
-bash-2.05b$ echo abc | ruby -ne 'print if /[^\s]+/'
abc
definitely need to examine output carefully where regexes and locale are in
effect - probably better off using ruby since matz presumably has more
experience with multibyte chars than 'ol larry!
I'm not sure, how can I find out what the germans use? and once I
know that part, how do I use it?
Want to reply to this thread or ask your own question?
You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.