Kless said:
I need store raw strings as this one:
"V-\243\230mJ\262.\031\023-4\301\324\241Y"
and I would to know if there will any problem with Ruby 1.9
The answer is, "that depends": Ruby 1.9's string handling is extremely
complicated.
* If the string is a literal within the program source, then adding a
comment
# encoding: ASCII-8BIT
as the very first line of your program (or the second line if you have a
shebang line) will make literals have this encoding by default. Having
said that, strings with backslash-escapes like that will probably get
ASCII-8BIT by default.
* If the string comes from reading a file, then you need to open it in
binary mode: File.open("xxx","rb") { |f| ... }
* If the string comes from reading from a socket, then I believe it will
be ASCII-8BIT by default
* If the string comes from reading STDIN, then you will have to be very
careful; for safety you need something like
STDIN.set_encoding "ASCII-8BIT"
Your program may or may not work without these changes, because Ruby
1.9's behaviour at runtime depends on settings in your environment. That
is, the same program with the same data might work on one computer but
crash on another computer. Using the above incantations is your first
line of defense against this stupidity.
Then you need to be sure that every single method that you call in other
people's libraries, which takes string arguments or returns string
values, behaves in the way you want. For example, if you call
Library.foo and it returns a string whose encoding is UTF-8 and contains
characters with the high bit set, and you try to concatenate it with one
of your own binary strings, the program will crash.
Here's a somewhat contrived example:
-------- main.rb (your program) --------
# encoding: ASCII-8BIT
require 'library'
binary_data = "\xff\xee\xdd"
msg = Library.err_to_str
binary_data << [msg.bytesize].pack("N")
binary_data << msg
-------- library.rb (someone else's code that you don't control)
--------
# encoding: UTF-8
module Library
def self.err_to_str
"über-error"
end
end
$ ruby19 main.rb
main.rb:7:in `<main>': incompatible character encodings: ASCII-8BIT and
UTF-8 (Encoding::CompatibilityError)
Your only way to protect against this is to force encodings at every
point where two strings of differing provenance might encounter each
other. e.g.
msg = Library.err_to_str
binary_data << [msg.bytesize].pack("N")
msg.force_encoding "ASCII-8BIT"
binary_data << msg
Beware also that ruby 1.9's documentation is often either missing or
misleading when it comes to character encodings. For example, ri19
Array#pack says:
Directive Meaning
---------------------------------------------------------------
@ | Moves to absolute position
A | arbitrary binary string (space padded, count is
width)
a | arbitrary binary string (null padded, count is width)
So you might expect that an arbitrary String can be packed using a*:
# encoding: ASCII-8BIT
require 'library'
binary_data = "\xff\xee\xdd"
msg = Library.err_to_str
binary_data << [msg.bytesize,msg].pack("Na*") # CRASH
puts binary_data.inspect
No, you still need a msg.force_encoding "ASCII-8BIT" before the pack.
If all this scares you - and it does me - then remember that staying
with ruby 1.8 is a reasonable alternative. Ruby 1.8.6 is going to be
maintained for a long time going forward, thanks to the people at
EngineYard and Phusion Passenger.
HTH,
Brian.