Again, languages are not the issue; character encodings are, though
naturally the language has an impact on the repertoire of feasible
encodings. If you have pages with different encodings, then the
simplest way, on Apache, is to put files in one encoding into one
directory and create a .htaccess file into that directory, with a
suitable directive to Apache in it, e.g.
AddType text/html;charset=utf-8 HTML
This is not always suitable if you are providing hosting to various people
and don't want to give them permissions to use .htaccess files. I much
rather the character set be defined within the HTML itself rather than
dished out by the server. Also, if you are providing hosting for other
people, you cannot really force people to use different directories to
separate character sets - they may have a website which has all the files in
one single directory, for example, but using multiple character sets.
Whether you can do that depends on Apache 2. Have you checked its
documentation? I would guess that using an AddType without a charset
parameter would do it. But that's really _not_ the WWW way. The WWW way
is to specify the encoding in actual HTTP headers, and <meta> tags are
just surrogates that some people need to resort to (and that _might_ be
including for certain reasons even when you have made the server send
adequate headers).
It may _not_ be the WWW way, but sometimes the proper way doesn't fulfill
ones requirements. And I think much of the world that is used to using
romanised alphabets sees the difficulties faced in countries such as China
when it comes to language representation in an oversimplistic way.
Or too early. But it is true that UTF-8 is _inefficient_ for most East
Asian languages.
Naah... too late. If the world had adopted UTF-8 before anything else (such
as Big5, GB) were devised then everyone would be using it and sticking to
it. The problem now is that so much software and so many websites are
already made and utilising GB/BIG5 as the encoding format, not to mention
that Windows releases still use them natively, that people are rather
reluctant to change.
Again, encodings, not languages. And the software needs to grow up.
UTF-8 is the way the WWW and the Internet are going, in the sense that
support to UTF-8 is the primary goal (according to official IEFT
policy) - any new protocols and software _should_ support it and
_may_ support other encodings.
You're quite pedantic about this language/charset thing aren't you? Yes
okay, I mean character set and not language. Still, I think you realised
that quite early on in my post.
I've got nothing against people using UTF-8 to represent Chinese characters,
but the fact of the matter is that many still aren't - nevermind what
*should* be used. Using UTF vs. more native formats isn't just an easy
matter of taking the text and dumping it on a website - then setting the
character set. The text itself, as i'm sure you already know, has to be in
the right format to begin with - but unless UTF is specifically specified
most native versions of Windows (which let's face it most people use, vs.
say linux or Mac OS) will use GB/Big5 to save the text. Then there is a
problem of the font. UTF fonts are not the same as the Big5 fonts - of which
there are many which are well established already in this part of the world.
If you slap on UTF-8 text on the web browser and expect a certain Big5 font
to be used - it obviously won't work. Yet, the choice of UTF-8 fonts that
comes standard with IE/Windows isn't exactly very inspiring.
Also when it comes to fonts - trying to view English using UTF-8 in, say,
Netscape 7 results in ugly looking text. And trying to view English in Big5
also occasionally looks strange. So it's better to be able to set different
character sets to different pages - not just to different directories, which
isn't flexible enough.
Can you name a web browser (still in use) that can handle BIG5/GB but
not UTF-8?
No... but... (read above) - that's not really the point.
Terence