The forward slash character poses a problem, because they can't
appear in file names.
A '/' will never appear in a UTF-8 string except as a real U+002F,
unlike in several other character encodings. One of the design goals of
UTF-8 was strict compatibility with ASCII for all code units from
0x00-0x7F, which is sufficient (actually, overkill) for dealing with the
special meanings assigned to various ASCII characters. As a result, you
can safely pass UTF-8 strings to/through the Unix API.
If Windows had better support for CP_UTF8, the same would be true of the
(original) Win32 "ANSI" (i.e. non-UTF-16) API, completely removing the
need for the parallel "Unicode" (i.e. UTF-16) API and all the attendant
problems that having two APIs causes.
UTF-8 was not the first multi-byte character set encoding to aim for
ASCII compatibility. Big5, Shift-JIS, and the ISO-2022 family are all
functionally ASCII compatible.
They're also more complicated and can only represent a small subset of
Unicode code points, so they're not suitable candidates for worldwide
use. UTF-8 is.
And these may actually still be more common in Asia than UTF-8.
According to web survey statistics, UTF-8 has a stunning ~77% of web
sites. ISO-8859-1 comes in at ~12%, and everything else is in the
single digits.
Certain countries complain that UTF-8 is larger than UTF-16 for their
scripts, which fall in the range U+8000 to U+FFFF, but that is only true
for pure text. Add punctuation, Arabic numbers, HTML markup, etc. to a
document and UTF-8 usually wins. Even if not, given the other problems
that UTF-16 causes, UTF-8 is the clear winner.
OS X has settled on UTF-8 internally, which has already caused
interoperability issues, e.g. with networked file systems. My money
says it was a similarly poor decision. I'm sure UTF-8 will have more
staying power than UCS-2, but most of the caveats stem from relying
on any non-ASCII encoding. They shouldn't have choosen any poison.
You've got to choose something, and UTF-8 is the most backward
compatible and most universal. If everyone (e.g. Microsoft) would quit
trying to force obsolete, script-specific encodings down other people's
throats and just get on the UTF-8 bandwagon, we'd all be better off.
S