T
Tom Link
Hi,
Right now, I'm not exactly thrilled by the way ruby 1.9 handles
encodings. Could somebody please explain things or point me to some
reference material:
I have the following files:
testEncoding.rb:
#!/usr/bin/env ruby
# encoding: ISO-8859-1
p __ENCODING__
text = File.read("text.txt")
text.each_line do |line|
p line =~ /foo/
end
text.rb:
Foo äöü bar.
I use: ruby 1.9.1 (2008-12-01 revision 20438) [i368-cygwin]
If I run: ruby19 testEncoding.rb, I get:
#<Encoding:ISO-8859-1>
testEncoding.rb:8:in `block in <main>': broken US-ASCII string
(ArgumentError)
Ruby detects the encoding line but suspects the text file to be 7bit
ascii nevertheless. The source file encoding is only respected if I
add the command line option -E ISO-8859-1. I could also set the
encoding explicitly for each string but ...
I found some hints that the default charset for external sources is
deduced from the locale. So I set LANG to de_AT, de_AT.ISO-8859-1 and
some more variants with no avail.
How exactly is this supposed to work? What other options do I have to
make ASCII8BIT or Latin-1 the default encoding without having to
supply an extra command-line option and without having to rely on an
environment variable? Why isn't ASCII8BIT the default in the first
place? Why isn't __ENCODING__ a global variable I can assign a value
too?
Thanks,
Thomas.
Right now, I'm not exactly thrilled by the way ruby 1.9 handles
encodings. Could somebody please explain things or point me to some
reference material:
I have the following files:
testEncoding.rb:
#!/usr/bin/env ruby
# encoding: ISO-8859-1
p __ENCODING__
text = File.read("text.txt")
text.each_line do |line|
p line =~ /foo/
end
text.rb:
Foo äöü bar.
I use: ruby 1.9.1 (2008-12-01 revision 20438) [i368-cygwin]
If I run: ruby19 testEncoding.rb, I get:
#<Encoding:ISO-8859-1>
testEncoding.rb:8:in `block in <main>': broken US-ASCII string
(ArgumentError)
Ruby detects the encoding line but suspects the text file to be 7bit
ascii nevertheless. The source file encoding is only respected if I
add the command line option -E ISO-8859-1. I could also set the
encoding explicitly for each string but ...
I found some hints that the default charset for external sources is
deduced from the locale. So I set LANG to de_AT, de_AT.ISO-8859-1 and
some more variants with no avail.
How exactly is this supposed to work? What other options do I have to
make ASCII8BIT or Latin-1 the default encoding without having to
supply an extra command-line option and without having to rely on an
environment variable? Why isn't ASCII8BIT the default in the first
place? Why isn't __ENCODING__ a global variable I can assign a value
too?
Thanks,
Thomas.