gabriele renzi said:
I found this article really interesting, maybe it can help you too.
http://www.joelonsoftware.com/articles/Unicode.html
Nicely written but nothing I didn't new already. Still the question
remains what Ruby does about handling mixed content internally. IMHO the
most efficient way is to store code points internally. An alternative
would be to store a raw binary stream together with it's encoding but that
would make comparisons (which happen all the time, just think of hash
lookups) slow for strings with different encodings.
IMHO the Java approach* (although it burns mem by using 16 bit per char)
is the most practical among current programming languages. And I wouldn't
bother Ruby borrowing that - especially when considering attempts to use
Java bytecode and a JVM as runtime system.
Regards
robert
* Characters are stored internally with 16 bits, thus allowing a lot
(although not all) of the Unicode code points to be representable. Input
and output always uses an encoding (either explicit or implicit the
platform's default encoding). There's built in support for a number of
well known encodings, including UTF-8, UTF-16, ISO-8859-1 etc.