wchar_t

Skarmander · Nov 20, 2005

P.J. Plauger said:
Good. Now tell me the practical upper limit that we can use
to standardize the all-singing, all-dancing physical address
for now and all future times.

Oh, I don't know, let's be generous. 256 bits. That's about on par with
the estimates for the number of particles in the universe. It's not that
impractical, if you use some form of segmenting or banking. A flat
256-bit memory model is probably a bit much, though.

Now you can of course object that people will one day use extra bits for
bytes in different colors or fonts... or perhaps parity, quantum flux or
heaven knows what. But like the character set examples, meta-information
like that doesn't count.

Or you could argue that we will discover radical new physics that give
us computers existing as pure energy, or computers that can address
branes... But this does not translate to character sets.

It may not be unreasonable, but I maintain that, on the basis of
history, it's wildly optimistic.

And I think you're making an invalid extrapolation, backed up by
analogies that don't apply.

IIRC, SC2/WG2 (the ISO committee corresponding to the Unicode
Consortium) even saw fit to pass a resolution that UTF-16 will
forever more be adequate to express all expansions of ISO 10646 (the
ISO standard corresponding to Unicode). I consider that either a) a
mark of remarkable self confidence, or b) whistling in the dark. Take
your pick.

I'll take what's behind door number three, Monty: a way to reassure
standards adopters that UTF-16 would always be practical and sufficient
to "do" all of Unicode with. If they're wrong, UTF-16 as it is will
become worthless, but so what? Unicode will "break" as well, and a new
standard would be required. Is that "remarkable" self-confidence? I'd
say it's a statement of practicality.

So I did RC. The question I raised, however, was whether Unicode can
resist the inevitable pressures to grow beyond their currently
self-imposed barrier of 1,114,112 codes.

The question *I* raised was whether that pressure is inevitable.
1,114,112 codes may well *be* enough, and not just because the Unicode
consortium says so. Though that plays a role as well: crazy folk who
want to push beyond this barrier will have to come with very good and
plausible reasons why Unicode must be overhauled. And keep in mind that
that's going to involve thousands upon thousands of new code points.

If I told you 1,114,112 bytes of main memory are enough, you'd laugh at
me, and rightly so. Why 1,114,112? Can't I meaningfully want double that
amount? But memory size and character sets are different things. There
*are* only so much characters on this Earth, and if we take the actual
growth rate of them into account (not the rate at which they're added to
Unicode) it bottoms out.

Oh, my, I think you really believe that. When "politics" is backed
by the odd billion dollars worth of contracts, you'd be surprised
what it can get.

Oh, please. If that's the best you've got to argue the character sets
will grow bigger, I'm not impressed. Character sets will continue to
grow because political intrigue and financial interests are also boundless?

Yes, standards can and have been ruined by interests that have nothing
to do with technical merits. (Cynics might say that's the rule rather
than the exception.) But we're not talking about someone slipping a
misfeature in a language here, we're talking breaking the entire Unicode
standard by adding large numbers of redundant characters (which ones?)
for some purpose (whatever could that be?) Could it happen? Sure. Would
it mean that 21 bits really aren't enough? Only if you also believe that
black is white if the government says so. And it certainly has nothing
to do with arguments of exponential growth.

Okay. My "argument" was that 21 bits will not long prove to be enough.
Just one order of magnitude will be enough to blow UTF-16 to kingdom
come. And that was my point.

Conceded, given my cautious formulation. Although I will also state I'm
with the Unicode peeps on this one: 21 bits is going to be enough. Yes,
that *exact* order of magnitude. For the next five years at least, and
I'd wager for longer.

Having just survived several years of UTF-16 jingoism, however, I
expect to be ungracious if Unicode does indeed have to issue a new
standard that leaves UTF-16 in the same rest home as UCS-2. I also
hope to remain intellectually honest enough to issue a mea culpa in
five years if I prove to be wrong.

Oh, don't worry. Google archives anything these days, and I plan to be
around five years from now. I'll take care of it for you.

You're going to lost this one, easily.

S.

P.J. Plauger · Nov 20, 2005

Richard Tobin said:
It wasn't "21 bits", it was "a little under 21 bits".
I thought that might reflect a misunderstanding by the poster.

My posting was intended to be helpful, yours was just rude.

True. My apologies.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com

Jordan Abel · Nov 20, 2005

Used only by computer scientists. (Commerce on computing being
non-existent.)

Used only in english speaking countries.

And other 7-bit codes were used in other countries. Generally these
lacked things like [brackets], etc, in favor of accented letters or
other such things. Since ISO 646 [which specifies all of these] is a
direct descendant of ANSI X3.4, it's fair to colloquially call all these
"ASCII". and even if you won't, that doesn't change the fact that
they're all 7 bits.

Used only in english, and *some* european countries.

Eh? What european countries didn't use some form of 'extended ascii'?
Note that he didn't say ISO 8859-*1* in particular.

A nonsensical hack.

EUC is a nonsensical hack? [and the approximate number of bits would be
13.11, for a 94x94 space) [incidentally, the JIS X 0201 fonts on my
system are dated 1983, so his date estimate is wrong, or more likely
chosen arbitrarily to fit a curve]

Jordan Abel · Nov 20, 2005

Used only by computer scientists. (Commerce on computing being
non-existent.)

Forgot to answer this one - While the various 6-bit codes were only used
by computer scientists, a number of 5-bit codes were used by
telegraphers, and thus in actual commerce. A five-bit code with
context-based shifting [for letters vs digits, or for case] could be
considered to have six or six and a half bits.

Mark McIntyre · Nov 20, 2005

Oh, I don't know, let's be generous. 256 bits. That's about on par with
the estimates for the number of particles in the universe.

The one thing you can be almost certain of is that in say 100 years,
your "generous" estimate will seem laughably small.

Now you can of course object that people will one day use extra bits for
bytes in different colors or fonts... or perhaps parity, quantum flux or
heaven knows what. But like the character set examples, meta-information
like that doesn't count.

I hate to say it, but arbritrarily ruling out certain sorts of data
because it would be inconvenient for your model is generally regarded
as dodgy science!

websnarf · Nov 20, 2005

Mark said:
The one thing you can be almost certain of is that in say 100 years,
your "generous" estimate will seem laughably small.

You must be new to computers. Moore's law says 150 years before 256
bits are needed.

Skarmander · Nov 20, 2005

Mark said:
The one thing you can be almost certain of is that in say 100 years,
your "generous" estimate will seem laughably small.

I think you fail to appreciate the magnitude of the number. This limit
will *not* be reached in 100 years, no more than spacecraft will fly
faster than the speed of light in 100 years. (Confident much? Yes.
Sorry, sci-fi fans.)

The estimates for the number of particles in the universe range from
10^72 to 10^87. 2^256 is about 10^77. We are talking about a computer
that can assign a bit to every single particle in the universe (ignoring
the problem of identifying particles with memory addresses).

To give you an idea of how big that is: if every single atom comprising
the planet Earth were used to store a byte, the resulting memory would
have a capacity of a paultry 10^50 bytes. Let's call this a "blue memory
marble". If you replaced all the grains of sand on all the beaches of
our Earth with blue memory marbles, you'd still be about a million times
short of the actual capacity of our hypothetical supercomputer... Of
course, there could be better ways of building such a computer, one
that's not so wasteful with atoms -- this is just to give a feel for the
number.

I can be more generous, if you like. 512 bits. That well exceeds the
estimates for the total amount of information that can possibly be
computed in this universe (http://arxiv.org/abs/astro-ph/0404510), and
any physical addresses larger than that are wasteful. But I was asked
for a *practical* upper limit, and 256 bits still seems very practical
in that regard.

You are probably misled by two things when you confidently assert these
limits will be broken. The first is that historically, no address space
has been sufficient yet. From this it does not follow that this will
continue to be the case, however.

Second, we've gone from 16 to 32 to 64 and even 128 bit machines... and
256 bit machines are actually already in the works. But the question was
about a practical upper limit for *physical addresses*, not *machine
words*, which may well grow larger as more and more memory is fetched by
the CPU every cycle. Some day there may be CPUs with a 512 bit word size
-- maybe they can even address that much memory! But will the full 512
bits be needed for addresses? No.

I hate to say it, but arbritrarily ruling out certain sorts of data
because it would be inconvenient for your model is generally regarded
as dodgy science!

What model? This is not a scientific theory. I was specifically asked
for an upper limit on physical addresses. Using meta-bits to add
information to addresses invalidates my estimates, obviously, because
we're no longer talking about physical addresses. Nothing dodgy about that.

To come back to my earlier statement, a CPU with addresses of 512 bits
could not use all of them to address physical memory, so it's very
likely it'll use the bits for something else -- that's simple
engineering efficiency. But those bits don't count towards physical
address size.

S.

Mark McIntyre · Nov 20, 2005

You must be new to computers. Moore's law says 150 years before 256
bits are needed.

Grin. Lemme see... I started in around 1980, when 8-bit micros were
all the rage, and 16-bit was just coming along. Nowadays, 32-bit is
all the rage and 64-bit is just happening. Thats say n^2 in 25 years.
So in another 25 years, we ought to have 32^2 bits, in 50 years 32^2^2
and in 100 32^2^2^2^2. == a very silly number of bits.

Mark McIntyre · Nov 20, 2005

I think you fail to appreciate the magnitude of the number.

Not in the slightest.

This limit
will *not* be reached in 100 years, no more than spacecraft will fly
faster than the speed of light in 100 years. (Confident much? Yes.
Sorry, sci-fi fans.)

*shrug*.

I'm sure dozens, nay millions of people said 100 years ago "pah, fly
to another continent, not any time in the next century".

The estimates for the number of particles in the universe range from
10^72 to 10^87. 2^256 is about 10^77.

What sort of particles? Atoms? electrons? Quarks? whatever quarks are
made out of?

We are talking about a computer
that can assign a bit to every single particle in the universe (ignoring
the problem of identifying particles with memory addresses).

And 50 years ago, 4Gigs would have required more magnetic beads and
wire than existed in the entire planet, let alone how on earth you'd
get enough valves to identify each one.

Skarmander · Nov 20, 2005

Mark said:
On Sun, 20 Nov 2005 18:12:44 +0100, in comp.lang.c , Skarmander

*shrug*.

I'm sure dozens, nay millions of people said 100 years ago "pah, fly
to another continent, not any time in the next century".

Yes, and "we broke the sound barrier, so why can't we go faster than the
speed of light"? Maybe because one requires violating known and
well-tested laws of physics, and the other is merely hard to do.

Accuse me of a lack of imagination if you will. But 100 years from now,
we'll see who's right!

(I am assuming we discover the secret of immortality, of course. That
shouldn't take long, right?)
Hmm, mea culpa. Various sources show the low end of the range is
probably closer to 10^79, not 10^72. So I'm off two orders of magnitude
at least. Not that it matters much.

What sort of particles? Atoms? electrons? Quarks? whatever quarks are
made out of?

Doesn't matter. The range is fairly encompassing. The low end is
counting hydrogen atoms as four particles (one electron, three quarks
for the proton). The high end is when you start counting neutrinos.
Virtual particles are excluded, since they're tricky to pin down
(certainly for purposes of addressing, you'd imagine...) See
http://home.earthlink.net/~mrob/pub/math/numbers-13.html.

And 50 years ago, 4Gigs would have required more magnetic beads and
wire than existed in the entire planet, let alone how on earth you'd
get enough valves to identify each one.

I'm not talking beads, wires, transistors or quantum gates. I'm talking
elementary particles. What you're saying is that we're going to discover
orders upon orders of new elementary particles to build computers out
of, to the point where the ability to address 10^77 units of memory will
become impractical as a constraint. Now that's what I call a bold
assumption.

There is a huge difference between a computer we cannot imagine being
built with today's technologies, and a computer whose physical address
space encompasses the known universe.

But, hey, let's take the 512 bit address space if you feel more
comfortable, and assume the laws of physics will be overhauled to the
point where estimates of absolute information limits (that is,
*regardless* of what hardware is used) no longer apply. It could happen.
I just don't think it will. But at this point we'll just have to wait
and see.

I'll stand by my estimate. See you in a century.

S.

Eric Sosman · Nov 20, 2005

Mark said:
Grin. Lemme see... I started in around 1980, when 8-bit micros were
all the rage, and 16-bit was just coming along.

In 1980 the 32-bit (or 24-bit, if you want to quibble)
IBM S/360 had been around for almost two decades. The
solidly 32-bit DEC VAX 11/780 had been on the market for
two years, with a "compatibility mode" to support programs
for the 16-bit PDP-11 that had been on the market since 1970.
How does that square with "16-bit was just coming along?"

Nowadays, 32-bit is
all the rage and 64-bit is just happening.

This will come as a shock to the late great DEC and
their 64-bit "Alpha" processor, introduced thirteen years
ago. It will also unsettle IBM, H-P, and Sun, all of whom
have been making and selling 64-bit computers since, oh,
gee, at least August.

Thats say n^2 in 25 years.
So in another 25 years, we ought to have 32^2 bits, in 50 years 32^2^2
and in 100 32^2^2^2^2. == a very silly number of bits.

Starting from invalid data, any damn' fool can produce
silly results.

Mark McIntyre · Nov 20, 2005

Yes, and "we broke the sound barrier, so why can't we go faster than the
speed of light"? Maybe because one requires violating known and
well-tested laws of physics, and the other is merely hard to do.

Come now. Two things
1) the speed of light can be exceeded without breaking any laws of
physics. If you don't know this, please go away and do a physics
degree specialising in particle physics.

2) there are no absolute laws of physics - all of them are best
estimates based on measurement available at the time. Consider the
meaning of the word 'atom'...

(I am assuming we discover the secret of immortality, of course. That
shouldn't take long, right?)

I read in the paper that my parents won't last much past 80, my
generation can expect to live to 100+, my kids to 120+.

Doesn't matter.

It most certainly does.

Virtual particles are excluded,

Again, I consider this cheating

I'm not talking beads, wires, transistors or quantum gates. I'm talking
elementary particles. What you're saying is that we're going to discover
orders upon orders of new elementary particles

Go back 40 years, and check out how many elementary particles we'd
discovered then.

Mark McIntyre · Nov 20, 2005

In 1980 the 32-bit (or 24-bit, if you want to quibble)
IBM S/360 had been around for almost two decades.

Does the word "micro" ring a bell?

This will come as a shock to the late great DEC and
their 64-bit "Alpha" processor, introduced thirteen years
ago.

Roaring success wasn't it?

It will also unsettle IBM, H-P, and Sun, all of whom
have been making and selling 64-bit computers since, oh,
gee, at least August.

Sure, I have about 100 of them in my compute farm at work. I don't
recall any of them being sold as micros.

Starting from invalid data, any damn' fool can produce
silly results.

Indeed. If you propose to enter a bogus argument half-way through, be
prepared to look a fool.

Skarmander · Nov 21, 2005

Mark said:
Come now. Two things
1) the speed of light can be exceeded without breaking any laws of
physics. If you don't know this, please go away and do a physics
degree specialising in particle physics.

I do know this, and I will not go away. I suspect you're deliberately
misinterpreting me, since you know as well as I do (from the context you
helpfully cut away) that we're not talking about Cherenkov radiation or
the Casimir effect or what-have-you. I'm talking about Marvin the
Martian revving up his spaceship to Ludricous Speed. I'm fairly
confident even those people with physics degrees will admit they enter
the realm of pure and hopeful speculation on that.

No, wait. Marvin the Martian was the one with the death ray. Dark Helmet
had the Ludicrous Speed. Sorry.

2) there are no absolute laws of physics - all of them are best
estimates based on measurement available at the time. Consider the
meaning of the word 'atom'...

Yep, can't disagree here. Nor do I have to. The laws of physics could be
wrong or incomplete, I'll give you that. No, better, the ones we have
likely *are* wrong or incomplete.

That's not to say we'll wake up tomorrow and find we can go faster than
light after all. And call me a ninny, but I think we won't wake up in a
century and tell the Enterprise to go to warp 9, either.

It most certainly does.

"Does too!" "Does not!" This is pointless. My argument was a bit longer
than just "doesn't matter".

Go back 40 years, and check out how many elementary particles we'd
discovered then.

Yeah, you know what? I'm going to give up, here.

You're right. We are not just going to discover many more elementary
particles, but it'll turn out that they are millions of times more
numerous than all elementary particles known so far. Not only that, but
we'll learn how to build computers with circuits of just a few of these
new particles, and as a result they're going to have way more than 10^77
memory units and still fit on your desk (or in your solar system, at
least). My estimates will look stupid, and so will I. Luckily I was just
one Usenet poster, so hardly anyone will care, and my descendants won't
be *too* embarrassed.

There. You win. Now let's go do useful things to fill up the century we
have to wait.

S.

Eric Sosman · Nov 21, 2005

Mark McIntyre wrote On 11/20/05 18:42,:

Does the word "micro" ring a bell?

My apologies. I had entirely forgotten that C
is a language for micros exclusively.

Walter Roberson · Nov 21, 2005

On Sun, 20 Nov 2005 17:14:22 -0500, in comp.lang.c , Eric Sosman

Sure, I have about 100 of them in my compute farm at work. I don't
recall any of them being sold as micros.

I did some checking around last night, but I could not seem to come
up with a firm definition of a "micro" was or was not ?

When the term "microcomputer" was first being used, the distinctions
were "mainframe", "midi", "mini" and then "micro".

SGI introduced deskside and desktop MIPS R4000 based machines in March
and September 1993 (respectively). You can see from the following image
that the desktop machine, the Indy, was smaller than today's typical PC.
The deskside machine, the Indigo^2, was pretty much the size of today's
"tower" PC cases.

http://www.schrotthal.de/sgi/indy/indy_r5000_front.html
http://www.unix-ag.uni-hannover.de/163.html

Michael Wojcik · Nov 22, 2005

It is
likely that all current and past scripts will fit into those 21 bits,
and it is unlikely that new scripts will be invented.

Unlikely? Tell that to the Klingon fans. For that matter, tell it
to the Cherokee and the Cree. Klingon isn't in the Unicode
standard, but the ConScript Unicode Registry has standardized its
code points in the UPUA. The Cherokee script is in the Unicode
standard, and the Cree is covered by the Unified Canadian Aboriginal
section of the standard, I believe.

The Cherokee syllabary is less than 200 years old. The Cree
syllabary is about 60 years younger.

The Blackfoot syllabary is barely over a century, and I don't think
it's in Unicode yet. But I bet it will be.

The (likely precolumbian) pictographic Micmac script isn't in
Unicode. At the moment it's relegated to the UPUA, but politics or
new archeological evidence could change that.

Over 700 languages are spoken in Papua New Guinea, many by peoples
who have thus far had minimal contact with the outside world, spread
over an area of 450,000 square kilometers. Someday some of them may
decided to create their own writing systems.

For that matter, there's a lot of questionable stuff in Unicode now
(who really needed \u2328, the "keyboard" character?). And what
silliness the Unicode Consortium rejects, groups like CUSR will
probably standardized under their name. And if enough people pay
attention, that's all it takes.

Whether enough new scripts will be invented to start to crowd the
Unicode 21-bit space is another question, but the evidence suggests
that there will be new scripts.

--
Michael Wojcik (e-mail address removed)

Auden often writes like Disney. Like Disney, he knows the shape of beasts --
(& incidently he, too, might have a company of artists producing his lines) --
unlike Lawrence, he does not know what shapes or motivates these beasts.
-- Dylan Thomas

lawrence.jones · Nov 23, 2005

Used only by computer scientists. (Commerce on computing being
non-existent.)

Nonsense. The IBM 1401 -- the first commercially available
transistorized computer -- was announced in 1959, was primarily intended
for business applications (replacing accounting machines), was widely
used in commerce (it was the first computer to ship 10,000 units), and
had a six-bit character set (BCD). Certainly Univac and others had
similar commercial systems in the same timeframe.

-Larry Jones

What's the matter? Don't you trust your own kid?! -- Calvin

Dik T. Winter · Nov 24, 2005

>
> Nonsense. The IBM 1401 -- the first commercially available
> transistorized computer -- was announced in 1959, was primarily intended
> for business applications (replacing accounting machines), was widely
> used in commerce (it was the first computer to ship 10,000 units), and
> had a six-bit character set (BCD).

What I remember from the manuals, it had a 48 character set, barely 6 bits.

Mike Wahler · Nov 25, 2005

Nonsense. The IBM 1401 -- the first commercially available
transistorized computer -- was announced in 1959, was primarily intended
for business applications (replacing accounting machines), was widely
used in commerce (it was the first computer to ship 10,000 units), and
had a six-bit character set (BCD). Certainly Univac and others had
similar commercial systems in the same timeframe.

And this old fart

had his first hands-on computer experience
(in school) with the 1602. Most of our learning exercises:
accounting applications.

-Mike

wchar_t is useless	18	Nov 21, 2011
wide character file to wstring - unexpected results	1	Dec 14, 2011
attempting to print unicode characters.	23	Aug 29, 2010
I'm tempted to quit out of frustration	1	Aug 13, 2023
Is char obsolete?	20	Apr 8, 2011
I need help in understanding these files on my phone, Could someone help me understand these files? Urgent help needed. Please help.	4	Jun 4, 2023
Help with finding difference between two bodies of text in order	0	Sep 11, 2024
C++, wchar_t, Unicode and all that stuff	3	Dec 23, 2005

wchar_t

Skarmander

P.J. Plauger

Jordan Abel

Jordan Abel

Mark McIntyre

websnarf

Skarmander

Mark McIntyre

Mark McIntyre

Skarmander

Eric Sosman

Mark McIntyre

Mark McIntyre

Skarmander

Eric Sosman

Walter Roberson

Michael Wojcik

lawrence.jones

Dik T. Winter

Mike Wahler

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads