S
Skarmander
Oh, I don't know, let's be generous. 256 bits. That's about on par withP.J. Plauger said:Good. Now tell me the practical upper limit that we can use
to standardize the all-singing, all-dancing physical address
for now and all future times.
the estimates for the number of particles in the universe. It's not that
impractical, if you use some form of segmenting or banking. A flat
256-bit memory model is probably a bit much, though.
Now you can of course object that people will one day use extra bits for
bytes in different colors or fonts... or perhaps parity, quantum flux or
heaven knows what. But like the character set examples, meta-information
like that doesn't count.
Or you could argue that we will discover radical new physics that give
us computers existing as pure energy, or computers that can address
branes... But this does not translate to character sets.
It may not be unreasonable, but I maintain that, on the basis of
history, it's wildly optimistic.
And I think you're making an invalid extrapolation, backed up by
analogies that don't apply.
I'll take what's behind door number three, Monty: a way to reassureIIRC, SC2/WG2 (the ISO committee corresponding to the Unicode
Consortium) even saw fit to pass a resolution that UTF-16 will
forever more be adequate to express all expansions of ISO 10646 (the
ISO standard corresponding to Unicode). I consider that either a) a
mark of remarkable self confidence, or b) whistling in the dark. Take
your pick.
standards adopters that UTF-16 would always be practical and sufficient
to "do" all of Unicode with. If they're wrong, UTF-16 as it is will
become worthless, but so what? Unicode will "break" as well, and a new
standard would be required. Is that "remarkable" self-confidence? I'd
say it's a statement of practicality.
So I did RC. The question I raised, however, was whether Unicode can
resist the inevitable pressures to grow beyond their currently
self-imposed barrier of 1,114,112 codes.
The question *I* raised was whether that pressure is inevitable.
1,114,112 codes may well *be* enough, and not just because the Unicode
consortium says so. Though that plays a role as well: crazy folk who
want to push beyond this barrier will have to come with very good and
plausible reasons why Unicode must be overhauled. And keep in mind that
that's going to involve thousands upon thousands of new code points.
If I told you 1,114,112 bytes of main memory are enough, you'd laugh at
me, and rightly so. Why 1,114,112? Can't I meaningfully want double that
amount? But memory size and character sets are different things. There
*are* only so much characters on this Earth, and if we take the actual
growth rate of them into account (not the rate at which they're added to
Unicode) it bottoms out.
Oh, please. If that's the best you've got to argue the character setsOh, my, I think you really believe that. When "politics" is backed
by the odd billion dollars worth of contracts, you'd be surprised
what it can get.
will grow bigger, I'm not impressed. Character sets will continue to
grow because political intrigue and financial interests are also boundless?
Yes, standards can and have been ruined by interests that have nothing
to do with technical merits. (Cynics might say that's the rule rather
than the exception.) But we're not talking about someone slipping a
misfeature in a language here, we're talking breaking the entire Unicode
standard by adding large numbers of redundant characters (which ones?)
for some purpose (whatever could that be?) Could it happen? Sure. Would
it mean that 21 bits really aren't enough? Only if you also believe that
black is white if the government says so. And it certainly has nothing
to do with arguments of exponential growth.
Okay. My "argument" was that 21 bits will not long prove to be enough.
Just one order of magnitude will be enough to blow UTF-16 to kingdom
come. And that was my point.
Conceded, given my cautious formulation. Although I will also state I'm
with the Unicode peeps on this one: 21 bits is going to be enough. Yes,
that *exact* order of magnitude. For the next five years at least, and
I'd wager for longer.
Oh, don't worry. Google archives anything these days, and I plan to beHaving just survived several years of UTF-16 jingoism, however, I
expect to be ungracious if Unicode does indeed have to issue a new
standard that leaves UTF-16 in the same rest home as UCS-2. I also
hope to remain intellectually honest enough to issue a mea culpa in
five years if I prove to be wrong.
around five years from now. I'll take care of it for you.
You're going to lost this one, easily.
S.