M
Mark J. Reed
Okay, last I checked, strings were just treated as collections of bytes, and
any multibyte character semantics were up to the programmer to implement. But
I just noticed that in 1.8.3, utf8string.split(//) yeilds an array of
strings, each containing a single UTF-8 character, irrespective of byte
count.
So are regexes in general Unicode-aware now? Any other UTF-8 tidbits
in there I should know about?
Thanks!
any multibyte character semantics were up to the programmer to implement. But
I just noticed that in 1.8.3, utf8string.split(//) yeilds an array of
strings, each containing a single UTF-8 character, irrespective of byte
count.
So are regexes in general Unicode-aware now? Any other UTF-8 tidbits
in there I should know about?
Thanks!