R
Rob Biedenharn
Why don't you just find out which characters are in the [:alnum:] and
\w sets?
$ LANG=nl_NL irbc.chr}.joinalnums = (0..0377).select {|c| c.chr =~ /[[:alnum:]]/ }.map {|c|
=> "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
\252
\265\272\300\301\302\303\304\305\306\307\310\311\312\313\314\315\316
\317\320\321\322\323\324\325\326\330\331\332\333\334\335\336\337\340
\341\342\343\344\345\346\347\350\351\352\353\354\355\356\357\360\361
\362\363\364\365\366\370\371\372\373\374\375\376\377"=> "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz"dubyas = (0..0377).select {|c| c.chr =~ /\w/ }.map {|c|c.chr}.join
Yes, but all this really does is indicate that the irb behaviour is
the correct one.
When I run this in a stand-alone script, I get this:
$ LANG=nl_NL ./foo
"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
It's almost as if the locale isn't being propagated to the process via
the environment. But...
$ LANG=nl_NL ruby -e "puts ENV['LANG']"
nl_NL
...it _is_ being propagated.
Is is the same for you?
Ian
--
Ian Macdonald | When a man is tired of London, he is
tired
(e-mail address removed) | of life. -- Samuel Johnson
http://www.caliban.org/ |
Yes, the LANG is affecting the result in irb, but not ruby.
$ irb -v
irb 0.9.5(05/04/13)
Whether the irb behavior is "correct" or anomalous is probably a
question for the maintainers to debate. The man page for ctype(3)
(on my Mac OS X 10.4.8) indicates that the macros are supposed to be
based on the locale and my copy of the pickaxe (p.71) says that the
character classes are based on the ctype macros of the same name.
However, a quick C program shows effectively the same behavior as
ruby (i.e., only the [0-9A-Za-z] satisfy isalnum() even for nl_NL).
I'm now more curious as to how irb is finding the character classes.
-Rob
Rob Biedenharn http://agileconsultingllc.com
(e-mail address removed)