Smike said:
I do not know about Chinese, but Cyrillic (Russian) version is
suggested to be done in 16-bit code version.
Are you kidding? Suggested by whom? Surely not by the Internet Architecture
Board, which clearly favors UTF-8.
No charset specification is required,
You _are_ kidding, are you not?
Cyrillic text will be visible in most modern web browsers
immediately without Encoding selection.
Web browsers recognize encoding from HTTP headers and don't need any manual
encoding selection, for any registered encoding they support.
View source of this example:
Huh? Why should we view source, in an issue like this? There's no sensible
way of viewing source without knowing or guessing the encoding, so what
could it possibly demonstrate?
These all appear to be documents that
a) lack any declaration about character encoding, which is a protocol error
and leaves it to browsers to make their guesses
b) contain just octets < 128, so any reasonable guess, such as US-ASCII or
ISO-8859-1 or ISO-8859-5 or ISO-8869-6 or UTF-8 or UTF-16, will do
c) represent non-ASCII characters as character references, which is of
course possible but rather inefficient and hopelessly obscure unless you
have an editing tool that interprets the references, and if you have, you
could use it cleverly, saving the data as UTF-8 encoded
d) demonstrate nothing relevant to the topic.
Long ago, a "conservative approach" as described in c) made sense, but it's
hardly fruitful these days for authoring in a language that uses a non-Latin
script. Besides, it has absolutely nothing to do with using some "16 bit
version".