B
Ben Bacarisse
Bartc said:I thought Unicode used ASCII (character codes 32 thru 127 at least). (And
the 'sixbit' I once used was just ASCII 32 to 95, offset to start at
0.)
That does not mean that either of them is ASCII -- it just simplifies a
part of the transform to and from ASCII. The key point is that if you
stick to the common characters you fail to get the benefit of the larger
sets, and if you assume a particular encoding you get code that is even
more tied down by that assumption.
EBCDIC should be put painlessly to sleep... It was a terrible encoding. The
A-Z range is ubiquitous and deserves a concise, consecutive encoding.
Maybe, but it can't be done painlessly.
The whole of Unicode is a nightmare to deal with (for example, the exact
same glyph having dozens of character codes), and it's nice to have one sane
corner of it where things are kept simple.
You would probably not be saying that if you had been a Swedish C
programmer in the 80s or if Japan had been the dominant culture at the
time programming flourished!
[2] I know it's more than US and UK but I needed a shorthand. You can't
print the alphabet that Spanish children are taught by iterating from
ASCII 'a' to ASCII 'z'.
And the Italian alphabet had just 21 letters at one time. Now with foreign
words and www, everyone will recognise the same 26 letters we use in
English. (Although I understand in Spanish things like ñ and ll might be
letters in their own right.)
You seem to be supporting my case. There are both technical and
linguistic reasons why for (c = 'a'; c <= 'z'; c++) is not the right way
to iterate over "the alphabet". It iterates over *an* alphabet on
*some* systems.