sorting issue when using locales

H

Heinz Werntges

Hi all,

ever used Ruby to sort arrays of strings which contain ISO-8859-1 encoded
"umlaut" characters as is typical for German text? Recently I found that
Ruby itself
does not seem to sort according to my locale setups. Then I learned about
module "locale", which seemed to make Ruby do just what I was looking for.

My initial tests using "locale" failed however - Ruby's sorting order
does not
seem to be changed. In contrast, Linux command "sort" reacts as expected
when changing the locale settings (see attached test protocol.
Note: using LC_ALL instead of LC_COLLATE gave the same results).

Actually I need a solution that (also) works under Windows, but got
stuck already here.
Am I trying something stupid? Does the "locale" module perhaps need an
update,
as I am running it under Ruby 1.8.0, while it was specified for Ruby 1.6.7?

I tried to contact the author of "locale" a week ago, so far without luck.
Armin Roehrl kindly provided me with a workaround, but maybe the regular
approach
could be fixed, as I expect it to run faster? Any help would be greatly
appreciated.

Kind regards,

-- Heinz

------ Attachment: Test protocol ------------
heinz@linux:~> uname
Linux
heinz@linux:~> ruby -v
ruby 1.8.0 (2003-08-04) [i686-linux]
heinz@bibo:~> set | grep LC_
LC_COLLATE=POSIX
heinz@linux:~> cat sort_samples
a
z
ä
ü
ö
ß
A
Z
Ä
Ö
Ü
heinz@linux:~> od -x sort_samples
0000000 0a61 0a7a 0ae4 0afc 0af6 0adf 0a41 0a5a
0000020 0ac4 0ad6 0adc
0000026
heinz@linux:~> sort < sort_samples
A
Z
a
z
Ä
Ö
Ü
ß
ä
ö
ü
heinz@linux:~> export LC_COLLATE=de_DE
heinz@linux:~> sort < sort_samples
a
A
ä
Ä
ö
Ö
ß
ü
Ü
z
Z
heinz@linux:~> cat nlsort.rb
require "locale"

Locale.setlocale(Locale::LC_COLLATE, 'de_DE')

a = IO.readlines('sort_samples')
puts a.sort

heinz@linux:~> ruby nlsort.rb
A
Z
a
z
Ä
Ö
Ü
ß
ä
ö
ü
#--------- End of protocol -------------
 

Members online

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,813
Latest member
lawrwtwinkle111

Latest Threads

Top