M
Miles Bader
MikeP said:I guess I have a hard time seeing how anything multi-byte is a boon. But,
and it's a big but (not to be confused with a phat azz!), if one doesn't
need "internationalization" (I mean other than English), it's a waste of
effort. Yes?
But that's the thing: if you're just doing things casually, but,
e.g., want to use a few special chars here and there, or allow users
more freedom in what filenames they're allowed to use, then UTF-8
_doesn't_ require much effort, it's a fairly easy tweak to ASCII-only
code. If the bulk of strings are English, then UTF-8 is also very
space-efficient.
It's UTF-16, which requires even the most trivial parts of
string-handling paths to be completely replaced, that's a pain in the
butt -- and then really offers almost no advantage to offset the
various disadvantages!
The only reasons I can see to use UTF-16, are: (1) you're writing
windows-only code, never expect to port it, and want to fit better
with windows library functions that expect UTF-16 strings, or (2)
you're writing an app to handle absolutely _massive_ amounts of CJK
text, and the space savings for CJK text in UTF-16 compared to UTF-8
are critical for you.
Very, very, few people are doing (2), so basically that leaves (1).
-Miles