Ben Morrow (
[email protected]) wrote:
: (e-mail address removed) (Malcolm Dew-Jones) wrote:
: > Ben Morrow (
[email protected]) wrote:
: > : OK, your problem here is that Win2k is being stupid about Unicode: any
: > : sensible OS that understood UTF8 would be fine
.
: >
: > Hum, NT has been handling unicode for at least ten years (3.5, 1993) by
: > the simple expedient of using 16 bit characters. It is hardware that is
: > stupid, by continuing to use ancient tiny 8 bit elementary units.
: OK, I invited that with gratuitous OS-bashing
... nevertheless:
: 2. Given that the world does, in fact, use 8-bit bytes, any 16-bit
: encoding has this small problem of endianness... again, solved
: (IMHO) less-than-elegantly by the Unicode Consortium.
Endianness is a hardware problem. 16 bit character hardware would not
have this problem. For other hardware, this is identical to the problem
encountered when transmitting numeric values over mediums such as the
internet, and is solved just as easily, (and as it has been) by specifying
the order in which the hi and lo parts are transmitted.
: 3. Given that the most widespread character set is likely to be either
: ASCII or Chinese ideograms, and ideograms won't fit into less than
: 16 bits anyway, it seems pretty silly to encode a 7-bit charset
: with 16 bits per character.
Hum, did you notice the contradiction in what you say "character set" and
"ideograms".
You might also say it seams silly to encode a 7-bit charset in 8 bits. I
think it's silly to worry about a few bits of storage when the simplicity
of software involved in handling larger characters would be greatly
enhanced by simply updating the size of characters. Simply look at the
number of years and bugs it has taken to introduce unicode handling to
many applications, and compare that to the time it took for the NT team
originally to update notepad - it consisted basically of recompiling it
with characters defined to be a larger size.
: 4. It also seems pretty silly to break everything in the world that
: relies on a byte of 0 meaning end-of-string, not to mention '/'
: being '/' (or '\', or whatever, as appropriate).
huh? if you used 16 bit characters then the (16 bit) character with
numeric value of 0 is still the null byte.