Is C99 the final C?

Paul Hsieh · Dec 8, 2003

Sidney Cadot said:
Arthur said:

Paul Hsieh wrote:
(e-mail address removed) says...
[...]
So would adding &&& and |||. Remember that my premise is that if one is
motivated to add one of those, why not add in something more general?

Bad comparison. Support for &&& and ||| would be infinitely easier to
add than support for what you propose.

Click to expand...

Click to expand...

Good comparison. Support for &&& and ||| is exactly as likely to
break existing tools, and is exactly as likely to make it into a
future version of C. [There may not be a firm cause-and-effect there,
but it is correlated.]

Click to expand...

Yes, &&& and ||| would break existing tools, but there is precedent for
that (introduction of // comments).

// was added in because so many compilers had already supported them
anyway. This was in keeping with the C standards committee "codifying
standard practices" nonsense. It was already there, all over the
place.

[...] The not unimportant difference is
that adding support for &&& and ||| would be rather trivial, while
support for Paul's proposal is a complete nightmare.

As to the likelihood of either feature making it into the C standard: I
disagree. The chances of ||| and &&& being added is many orders of
magnitudes greater than that of the the "operator introduction" feature
championed by Paul. Still very close to zero, of course. One has to
approach probabilities of this magnitude logarthmically

Probably true, but your suggestion is also infinitely less ambitious,
and infinitely less useful. Your suggestion is like the complex
number of variable length macros thing in C99. They are interesting
additions which provide some kind of epsilon more functionality, but
which don't compensate with the fact that you are leaving all C89
compilers behind.

My suggestion is more like "restrict" in C99 which adds functionality
that just is not in the language in any way shape or form. Some
people would use it no matter what -- because it adds something so
totally powerful in terms of functionality that its worth losing the
compatibility with C89.

Paul Hsieh · Dec 8, 2003

I used to do programming contests in university (both as a participant
and as a judge later on), so I quite like this sort of problem. Care to
provide some background (via regular e-mail), I think I'd like to think
about this for a bit.

This isn't a rigged contest problem. This is a real world problem. Imagine a
*SERVERLESS* online game topology. How do you communicate "game-moves" in
realtime between players in a way that is most efficient and with the least
perceived lag, given a limited amount of bandwidth? Solve the above problem,
and you have a solution for this problem. (Actually for games you can crank d
a little higher than 2 -- the exact problem that I am looking at is different,
and I am loathe to increase d above 2.)

You can think about this problem all you like. If you want to know if I am
fully of it or not, see if you can get above n=10,d=2.

Ok, that's a case. But wouldn't it have been better then just to make a
program generator (written in C or something else)? Or perhaps use the
m4 macro processor?

Tell me how you do this in a way that's compatible with testing it against 3
different compilers (one of the *OTHER* techniques I use to go after
performance ...) two of which are very unfriendly to being commandline driven,
while still making sense from a debugging point of view?

These are side cases, that are at least mentioned in the standard. No
big deal, IMHO.

Its a big deal to me. I don't own any material on the C language that
specifies all of this. I actually go reference source code for a C library I
happen to have access to. But, of course, these sources and a real world
programmer's sense leads me to the very wrong conclusion that isgraph(-1) just \0
returns false.

[...] This is much more of a big deal:

int count_nodes(struct Tree *t)
{
return t ? count_nodes(t->left)+count_nodes(t->right)
: 0;
}

Things like this are ubiquitous in many kinds of code.

Uh ... dude, that just looks like an expensive way of writing:

int count_nodes (struct Tree *t) {
return 0;
}

I don't suppose you did very well on these programming contests did you? Btw,
weren't you railing against trigraphs?

[...] Trees with a big depth are not uncommon.

Assuming you actually wanted to count the *NODES* of the tree, and were a
little concerned about performance try this:

static int count_nodes_inner (const struct Tree *t) {
return (t->left ? (1 + count_nodes_inner (t->left )) : 0) +
(t->right ? (1 + count_nodes_inner (t->right)) : 0);
}

int count_nodes (const struct Tree *t) {
return t ? count_nodes_innter (t) : 0;
}

as it avoids the leaf node tail calls, which will roughly double the
performance.

[...] How can I ever claim such a function is portable if I have no
guarantee whatsoever that it will work on some architectures?

Huh? Are you talking about the stack thing again? Dude: malloc (65536) is
not portable by the same reasoning, and:

int fib(int n) {
if (n <= 2) return 1;
return fib(n-1) + fib(n-2);
}

isn't going to work in any relevant way in any language, for a call like
f(1000000).

[powerful heap manager]

But a third party library can't do this portably.

I don't see why not?

Click to expand...

Click to expand...

You simply do not understand. Malloc and multithreading are intertwined in the
implementation. If you have multithreading, they you have to lock and unlock
your heap, or use other strategies to make sure one thread isn't in the middle
of freeing while another is mallocing. If you don't, it introduces something
called a "race condition". Without exact considerations malloc and free would
simply fail to function properly in a multithreaded environment.

The point is that any extensions of the heap's semantics end up having exactly
the same problem. For example, how would one write a "freeall()" on top of
malloc? (Ignoring the fact that writing it *ON TOP* of malloc/free loses all
the potential *MASSIVE* performance advantage) Well presumably you'd have some
hidden part at the head of each allocation creating a linked list of
outstanding mallocs that were removed upon frees, and the freeall() function
would just walk through them calling free one by one. In a multithreaded
environment, once again, you'd be in race condition city. Same problem for
trying to build seperated heaps (virtual or otherwise.)

Now the C language standard says nothing about this of course, because C
doesn't say anything about multithreading at all. But that doesn't make the
problem go away -- most of the commonly used C compilers today include
multitasking extensions.

The point is that there is a *HUGE* difference between pushing such
functionality into the C standard, versus trying to obtain them from 3rd
parties. If it were put into the standard, then each vendor could modify their
heap implementation (a quite doable endeavour, since each vendor has already
solved the heap lock/unlock problem.) But via 3rd parties, the library vendor
has to learn and keep up to date with the OS primitives for each platform they
support.

Since in your original statement there was no mention of multithreading.

Mentioning the implementation malloc implicitely brings multithreading into the
discussion. That is what you don't understand.

Allow me to refresh your memory:

"But a third party library can't do this portably.

Because of the multithreading issue ...

[...] Its actual useful
functionality that you just can't get from the C language, and there's
no way to reliably map such functionality to the C language itself.
One is forced to know the details of the underlying platform to
implement such things. Its something that really *should* be in the
language." [Paul Hsieh, 4 posts up]

See?

Yes -- the implicit answer to the "WHY" question that follows each statement,
is "because of the multithreading problem". If you were aware of the issues,
I'm sure this wouldn't be quite so bewildering to you.

Nonsense. There's many real hardware out there where multitasking is a
non-issue.

Obviously I meant for platforms that support multithreading -- and BTW it has
nothing to do with hardware. Multithreading is a software feature, not a
hardware one. (You're probably thinking of pre-emption, which is a totally
different thing.)

Nonsense. Unless you need support for something inherently
platform-specific like multithreading.

If the feature of multithreading is in the platform, then extensions to malloc
must take these into consideration. Do you understand the difference this and
what you said? malloc/calloc/realloc/free, or any of the extensions I propose
have no API, modes, or anything that exposes anything about multithreading to
the end user. Its all in the internals.

[...] I still don't see a problem writing a super-duper heap manager on top
of malloc().

Well, that much is obvious ...

Sure but then the compiler would be more expensive to make, and I would
have to pay more $$$ for a feature I don't need.

I have written such a malloc myself (its almost much faster than typical heap
managers, and contains rich built-in debugging facilities) that I would be only
too happy to donate to the C standards committee on the condition that they
endorse my extensions. Its been well tested, and is easily portable to any
flat address implementations and whose multitasking mechanisms include one of
locks, mutexes (preferred) or semaphores.

Adding multithreading support to a language is a can of worms.

Which I am not suggesting. And its not impossible -- look at the language
ErLang. The VxWorks or QNX or OSE APIs also suggests very good abstractions
for multithreading that should be reasonably applicable to any C-style
language.

[...] I don't
think you quite know what you're suggesting here. Many difficult
semantic issues to resolve. The fun thing about C is that it's simple.

I *DO* know what I am suggesting. This heap stuff is not just off the top of
my head. I've written applications where I've *needed* these features.

Huh? The C standard describes the library.

Right, but the library doesn't specify anything about multitasking. Its only
implementations that need to deal with it.

Perhaps I don't understand. The implementation... of what?

malloc/free/calloc/realloc or any extensions on top of them.

Well, let's see your proposal then. How would you weave threads in the C
language portably? No handwaving please, you will have to come up with
rock-solid semantics.

I'm not proposing anything about threads. I am talking about the heap.
Extending the heap means you have to be aware of multithreading if your
platform supports it.

I'm eagerly awaiting your proposal that shows how to do all this in a
platform independent way.

The NULL heap is the system heap.

struct heap * newHeap (struct heap *);
- allocate a subheap from a given heap.
int destroyHeap (struct heap *);
- frees all allocated memory with respect to a given heap (including
sub-heaps), then destroy the heap.
int freeAllH (struct heap *);
- frees all allocated memory with respect to a given heap (including
sub-heaps), but don't destroy the heap.

void * mallocH (struct heap *, size_t);
- Like malloc with respect to a given heap.
void *callocH (size_t, size_t);
- Like calloc with respect to a given heap.
void * reallocH (struct heap *, void *, size_t);
- Like realloc with respect to a given heap.
void * expandH (struct heap *, void *, size_t);
- Like reallocH, but will return NULL instead of moving, and will never
move the pointer.
int freeH (struct heap *, void *);
- Like free with respect to a given heap.

int dgbIsAlloc (struct heap *, void *);
- Returns 1 if, to the best of the library's ability to determine, the
pointer is a properly allocated heap entry. Returns 0, if it can
determine the pointer to not be a properly allocated heap entry.

Notice how there is no mention of multithreading in anything there? Of course,
what is not seen, is that they in fact *ALL* require very exact considerations
for multithreading if the platform supports multithreading.

The value of this is that destroyHeap/freeAllH actually costs about the same as
a handful of frees (even if you've made tens of thousands of outstanding
mallocs.) And expandH can be used to in cases were a complete copy is not
necessary in order to move an allocation to a resized allocation (this feature
would improve the performance of the "Better String Library" in many cases,
that I cannot in any way capture with C's current heap semantics.) Although
dgbIsAlloc is probabilistic (though it would never return 0 on valid heap
pointer), in practice its exactness would be so close as to be a reasonable
realiable mechanism to ensure the correctness of an allocated pointer for
debugging purposes.

Don't worry, you can look it up in the man pages. Unlike your
user-defined operators...

You would have to look up "%b" ("%db"? -- whatever) in *UP TO DATE* man pages.
(Is *restrict* or the new *complex* type, from C99 properly documented in
*YOUR* current man pages?) My proposed plug-in mechanism would be in modules
or header files that would, one way or another, have to be in your project
somewhere. Presumably, the source would have to be available somewhere, so
there would be *NO QUESTION* as to what it does.

Same comment applies to &&&, ||| versus user defined operators.

Well, lets start with ?< ?> and ?: for now; ?: instead of |||, and I
drop the &&& requests since it is much less useful. It's already in gcc
so it's implementable. Give everybody 10 years to get used to it, then
we may add a couple more.

My claim is that so few people would use these operators, that the would be
defacto not supported. They would be less useful than bitfields.

Because I want to use them in C programs.

But I want them *not* to be used in C programs -- unless they are fully
documented in the source. Always.

More handwaving... Please humour me, and spell out a reasonable
convention for your operator-introduction.

I can't tell; I haven't seen a coding standard for operator introduction
yet.

I just gave some. What's hand waving is saying something about stack limits --
you haven't even suggested a way in which that *COULD* be possible to put down
on a piece of paper. Its *OBVIOUS* what I am talking about above. You just
want something specific to have something arbitrary to disagree with. Just
pick a convention -- if something bothers you about certain operators, spell it
out in your convention.

...Have you ever laid eyes on a moderately complicated APL program,
without knowing APL?

No, but I have looked at the results of the annual obfuscated C code constest.
I have never suggested that you *COULDN'T* abuse the idea. But you can abuse
just about *ANYTHING* in C. You don't think adding in ||| or &&& would add to
the potential for noise in those contests? I gave a very wordy description of
how to define an operator -- its kind of hard to obfuscate something completely
while you have so many hints (a keyword like _Operator, a whole function-like
declaration, etc) as to what it is. You would have to hide it behind macros to
truly obfuscate it, but you can use macros for arbitrary obfuscation
regardless.

The meanings you gave for these rang some bells with my programming
experience. It leads me to believe you have encountered similar
situations, which gives you some bonus points for credibility in my book
Seriously, I don't think such operators would be a good idea, but
I'd welcome library functions (the old fashioned ones, with names) for them.

Well I'd never use &&& or ||| as you proposed them for any reason whatsoever.
At least with the <<< and @ operators, both add just enough useful
functionality, that I *would* consider using them, knowing I would be non-
portable. That's the point of all this -- its all just opinion. With
redefinable operators someone's arbitrary opinion doesn't enter into it -- you
can make it conform to whatever opinion you have. If you disagree with someone
else's opinion, you can even *change* it, subject only to your control over the
source.

A resounding 'no'. It's an interesting thought, but for me it falls
short on technical and psychological grounds, as I previously tried to
explain.

You haven't shown me that it falls on technical grounds. You've just said that
its hard to implement in the compiler -- but that's obvious.

Well they "provide" them, if that's what you mean.

They *have* to.

Ok. So your point was, in the first place.... ?

My point is that e-commerce is not a *hardware* problem. At least, since all
hardware supports it, its not a hardware problem. Its a software problem. The
point is the *C* does not offer a solution for e-commerce. The Java people can
point and laugh and say they are better for e-commerce than C, and ironically,
they would be correct (Java has a BigNum class).

You shouldn't trivialize IEEE-754, it's one hell of a standard, probably
one of the best I have ever seen.

You missed the implied sarcasm ...

And as you are probably well aware,
there's a perfectly good reason why its only now that the C standard
committee is looking at it (only now hardware supporting it is widely
available). I really don't see your point here.

Only NOW?!??! If by *now*, you mean 1991, then sure. The first IEEE-754
implementation came out in like 1984 or so (the 8087). Within a few years,
everyone else came on board -- Alpha (even though DEC was one of the vendors
originally most vigorously opposed to IEEE-754), Sparc, MIPS, PPC, they are all
IEEE-754, of course.

[...] I guess they're pretty satisfied with the bignum
libs that exist, that provide assembly implementations for all important
platforms (and even a slow fallback for others). The reality is that
no-one seems to care except you, on this.

Click to expand...

Click to expand...

The hardware people care that it exists, and not about the form of its
existence, so long as it gets used. For anyone who wants to *USE* this
functionality, though, they are stuck with assembly, or third party libraries.

Click to expand...

What's so bad about using third-party libraries?

From a usability point of view, GMP is crap, RSA's bignum library is crap (and
its expensive). None of them are portable to platforms which are in
development (you know like might be a concern for things like *SMART CARDS*.)

Reference, please.

It was in a presentation at the Intel Developer Forum in 1999 or 2000. I only
have this presentation in print form.

"load on typical e-commerce transaction" != "load on e-commerce server".

I meant the server. Clients have cycles to burn, obviously the client
performance doesn't matter on anything less than Half-life 2.

If the transaction is simply setup session/send credit-card
information/close session, this could be right. However, a jiffy ago we
were talking SSL.

Again, you're equating server load with transaction load.

Uhh ... both sides have to do the bignum operation. You have to do it both
ways because of man-in-the-middle attacks. Server load is what we are
concerned with here, obviously (Itanium is *not* a client platform.)

Fun thing is: it handles both signed and unsigned numbers. Not-so-funny
is the fact that the x86 supports (as do most microprocessors) both an
overflow flag and a carry flag. So, armed with this, please elaborate on
the meaning.

Yes, but I never mentioned/introduced overflow. I only considered carry. For
bignum implementation, OF is not relevant. Only the CF is.

It sure is... There are libraries containing hand-written assembly that
do fine

Go look at the original purpose of C again. Its supposed to be a portable
assembler. This is a job for C.

So on adding a signed plus unsigned.... The carry gets a value "as if"
the signed value was actually unsigned?

Yes. Note that x86 does not have a version of ADC that considers the sign of
its operands in any special way that is reflected in the CF.

[...] On adding two signed numbers,
the carry gets a value "as if" both numbers were unsigned (instead of
the much more useful "overflow" indicator)? This looks like badly
designed semantics.

It might to you, but the carry is not related to the sign of the operands.
(Your understanding of x86 seems to be shakey.)

Ok, so if either is signed, they will both be signed,

No ... that is not the rule used in C. unsigned supersedes signed.

Sure, it's an important instruction.

Important enough to be ubiquitous and available in all modern CPUs.

We're still about 999,500,000 USD short of a billion.

They are willing to spend a half million in development, because they know its
going to cost them a hundred million in extra silicon, and want to make sure
its money well spent (Whoops! I am practically giving away which chip company
I am talking about). You, of course, have missed this and the obvious implied
multiplier over all the other chip architectures, over all generations that
have such a feature. A billion dollars looks like a conservative, but
obviously reasonable order of magnitude guess as to the overall cost of
implementing widening multiplies.

Sure does. I estimate the cost of this design work would be in the range
of 10^6 dollars, but not 10^9 dollars.

That's *design* cost. That's nothing compared to the *silicon* cost.

Paul Hsieh · Dec 8, 2003

I used to do programming contests in university (both as a participant
and as a judge later on), so I quite like this sort of problem. Care to
provide some background (via regular e-mail), I think I'd like to think
about this for a bit.

This isn't a rigged contest problem. This is a real world problem. Imagine a
*SERVERLESS* online game topology. How do you communicate "game-moves" in
realtime between players in a way that is most efficient and with the least
perceived lag, given a limited amount of bandwidth? Solve the above problem,
and you have a solution for this problem. (Actually for games you can crank d
a little higher than 2 -- the exact problem that I am looking at is different,
and I am loathe to increase d above 2.)

You can think about this problem all you like. If you want to know if I am
fully of it or not, see if you can get above n=10,d=2.

Ok, that's a case. But wouldn't it have been better then just to make a
program generator (written in C or something else)? Or perhaps use the
m4 macro processor?

Tell me how you do this in a way that's compatible with testing it against 3
different compilers (one of the *OTHER* techniques I use to go after
performance ...) two of which are very unfriendly to being commandline driven,
while still making sense from a debugging point of view?

These are side cases, that are at least mentioned in the standard. No
big deal, IMHO.

Its a big deal to me. I don't own any material on the C language that
specifies all of this. I actually go reference source code for a C library I
happen to have access to. But, of course, these sources and a real world
programmer's sense leads me to the very wrong conclusion that isgraph(-1) just \0
returns false.

[...] This is much more of a big deal:

int count_nodes(struct Tree *t)
{
return t ? count_nodes(t->left)+count_nodes(t->right)
: 0;
}

Things like this are ubiquitous in many kinds of code.

Uh ... dude, that just looks like an expensive way of writing:

int count_nodes (struct Tree *t) {
return 0;
}

I don't suppose you did very well on these programming contests did you? Btw,
weren't you railing against trigraphs?

[...] Trees with a big depth are not uncommon.

Assuming you actually wanted to count the *NODES* of the tree, and were a
little concerned about performance try this:

static int count_nodes_inner (const struct Tree *t) {
return (t->left ? (1 + count_nodes_inner (t->left )) : 0) +
(t->right ? (1 + count_nodes_inner (t->right)) : 0);
}

int count_nodes (const struct Tree *t) {
return t ? count_nodes_innter (t) : 0;
}

as it avoids the leaf node tail calls, which will roughly double the
performance.

[...] How can I ever claim such a function is portable if I have no
guarantee whatsoever that it will work on some architectures?

Huh? Are you talking about the stack thing again? Dude: malloc (65536) is
not portable by the same reasoning, and:

int fib(int n) {
if (n <= 2) return 1;
return fib(n-1) + fib(n-2);
}

isn't going to work in any relevant way in any language, for a call like
f(1000000).

[powerful heap manager]

But a third party library can't do this portably.

I don't see why not?

Click to expand...

Click to expand...

You simply do not understand. Malloc and multithreading are intertwined in the
implementation. If you have multithreading, they you have to lock and unlock
your heap, or use other strategies to make sure one thread isn't in the middle
of freeing while another is mallocing. If you don't, it introduces something
called a "race condition". Without exact considerations malloc and free would
simply fail to function properly in a multithreaded environment.

The point is that any extensions of the heap's semantics end up having exactly
the same problem. For example, how would one write a "freeall()" on top of
malloc? (Ignoring the fact that writing it *ON TOP* of malloc/free loses all
the potential *MASSIVE* performance advantage) Well presumably you'd have some
hidden part at the head of each allocation creating a linked list of
outstanding mallocs that were removed upon frees, and the freeall() function
would just walk through them calling free one by one. In a multithreaded
environment, once again, you'd be in race condition city. Same problem for
trying to build seperated heaps (virtual or otherwise.)

Now the C language standard says nothing about this of course, because C
doesn't say anything about multithreading at all. But that doesn't make the
problem go away -- most of the commonly used C compilers today include
multitasking extensions.

The point is that there is a *HUGE* difference between pushing such
functionality into the C standard, versus trying to obtain them from 3rd
parties. If it were put into the standard, then each vendor could modify their
heap implementation (a quite doable endeavour, since each vendor has already
solved the heap lock/unlock problem.) But via 3rd parties, the library vendor
has to learn and keep up to date with the OS primitives for each platform they
support.

Since in your original statement there was no mention of multithreading.

Mentioning the implementation malloc implicitely brings multithreading into the
discussion. That is what you don't understand.

Allow me to refresh your memory:

"But a third party library can't do this portably.

Because of the multithreading issue ...

[...] Its actual useful
functionality that you just can't get from the C language, and there's
no way to reliably map such functionality to the C language itself.
One is forced to know the details of the underlying platform to
implement such things. Its something that really *should* be in the
language." [Paul Hsieh, 4 posts up]

See?

Yes -- the implicit answer to the "WHY" question that follows each statement,
is "because of the multithreading problem". If you were aware of the issues,
I'm sure this wouldn't be quite so bewildering to you.

Nonsense. There's many real hardware out there where multitasking is a
non-issue.

Obviously I meant for platforms that support multithreading -- and BTW it has
nothing to do with hardware. Multithreading is a software feature, not a
hardware one. (You're probably thinking of pre-emption, which is a totally
different thing.)

Nonsense. Unless you need support for something inherently
platform-specific like multithreading.

If the feature of multithreading is in the platform, then extensions to malloc
must take these into consideration. Do you understand the difference this and
what you said? malloc/calloc/realloc/free, or any of the extensions I propose
have no API, modes, or anything that exposes anything about multithreading to
the end user. Its all in the internals.

[...] I still don't see a problem writing a super-duper heap manager on top
of malloc().

Well, that much is obvious ...

Sure but then the compiler would be more expensive to make, and I would
have to pay more $$$ for a feature I don't need.

I have written such a malloc myself (its almost much faster than typical heap
managers, and contains rich built-in debugging facilities) that I would be only
too happy to donate to the C standards committee on the condition that they
endorse my extensions. Its been well tested, and is easily portable to any
flat address implementations and whose multitasking mechanisms include one of
locks, mutexes (preferred) or semaphores.

Adding multithreading support to a language is a can of worms.

Which I am not suggesting. And its not impossible -- look at the language
ErLang. The VxWorks or QNX or OSE APIs also suggests very good abstractions
for multithreading that should be reasonably applicable to any C-style
language.

[...] I don't
think you quite know what you're suggesting here. Many difficult
semantic issues to resolve. The fun thing about C is that it's simple.

I *DO* know what I am suggesting. This heap stuff is not just off the top of
my head. I've written applications where I've *needed* these features.

Huh? The C standard describes the library.

Right, but the library doesn't specify anything about multitasking. Its only
implementations that need to deal with it.

Perhaps I don't understand. The implementation... of what?

malloc/free/calloc/realloc or any extensions on top of them.

Well, let's see your proposal then. How would you weave threads in the C
language portably? No handwaving please, you will have to come up with
rock-solid semantics.

I'm not proposing anything about threads. I am talking about the heap.
Extending the heap means you have to be aware of multithreading if your
platform supports it.

I'm eagerly awaiting your proposal that shows how to do all this in a
platform independent way.

The NULL heap is the system heap.

struct heap * newHeap (struct heap *);
- allocate a subheap from a given heap.
int destroyHeap (struct heap *);
- frees all allocated memory with respect to a given heap (including
sub-heaps), then destroy the heap.
int freeAllH (struct heap *);
- frees all allocated memory with respect to a given heap (including
sub-heaps), but don't destroy the heap.

void * mallocH (struct heap *, size_t);
- Like malloc with respect to a given heap.
void *callocH (size_t, size_t);
- Like calloc with respect to a given heap.
void * reallocH (struct heap *, void *, size_t);
- Like realloc with respect to a given heap.
void * expandH (struct heap *, void *, size_t);
- Like reallocH, but will return NULL instead of moving, and will never
move the pointer.
int freeH (struct heap *, void *);
- Like free with respect to a given heap.

int dgbIsAlloc (struct heap *, void *);
- Returns 1 if, to the best of the library's ability to determine, the
pointer is a properly allocated heap entry. Returns 0, if it can
determine the pointer to not be a properly allocated heap entry.

Notice how there is no mention of multithreading in anything there? Of course,
what is not seen, is that they in fact *ALL* require very exact considerations
for multithreading if the platform supports multithreading.

The value of this is that destroyHeap/freeAllH actually costs about the same as
a handful of frees (even if you've made tens of thousands of outstanding
mallocs.) And expandH can be used to in cases were a complete copy is not
necessary in order to move an allocation to a resized allocation (this feature
would improve the performance of the "Better String Library" in many cases,
that I cannot in any way capture with C's current heap semantics.) Although
dgbIsAlloc is probabilistic (though it would never return 0 on valid heap
pointer), in practice its exactness would be so close as to be a reasonable
realiable mechanism to ensure the correctness of an allocated pointer for
debugging purposes.

Don't worry, you can look it up in the man pages. Unlike your
user-defined operators...

You would have to look up "%b" ("%db"? -- whatever) in *UP TO DATE* man pages.
(Is *restrict* or the new *complex* type, from C99 properly documented in
*YOUR* current man pages?) My proposed plug-in mechanism would be in modules
or header files that would, one way or another, have to be in your project
somewhere. Presumably, the source would have to be available somewhere, so
there would be *NO QUESTION* as to what it does.

Same comment applies to &&&, ||| versus user defined operators.

Well, lets start with ?< ?> and ?: for now; ?: instead of |||, and I
drop the &&& requests since it is much less useful. It's already in gcc
so it's implementable. Give everybody 10 years to get used to it, then
we may add a couple more.

My claim is that so few people would use these operators, that the would be
defacto not supported. They would be less useful than bitfields.

Because I want to use them in C programs.

But I want them *not* to be used in C programs -- unless they are fully
documented in the source. Always.

More handwaving... Please humour me, and spell out a reasonable
convention for your operator-introduction.

I can't tell; I haven't seen a coding standard for operator introduction
yet.

I just gave some. What's hand waving is saying something about stack limits --
you haven't even suggested a way in which that *COULD* be possible to put down
on a piece of paper. Its *OBVIOUS* what I am talking about above. You just
want something specific to have something arbitrary to disagree with. Just
pick a convention -- if something bothers you about certain operators, spell it
out in your convention.

...Have you ever laid eyes on a moderately complicated APL program,
without knowing APL?

No, but I have looked at the results of the annual obfuscated C code constest.
I have never suggested that you *COULDN'T* abuse the idea. But you can abuse
just about *ANYTHING* in C. You don't think adding in ||| or &&& would add to
the potential for noise in those contests? I gave a very wordy description of
how to define an operator -- its kind of hard to obfuscate something completely
while you have so many hints (a keyword like _Operator, a whole function-like
declaration, etc) as to what it is. You would have to hide it behind macros to
truly obfuscate it, but you can use macros for arbitrary obfuscation
regardless.

The meanings you gave for these rang some bells with my programming
experience. It leads me to believe you have encountered similar
situations, which gives you some bonus points for credibility in my book
Seriously, I don't think such operators would be a good idea, but
I'd welcome library functions (the old fashioned ones, with names) for them.

Well I'd never use &&& or ||| as you proposed them for any reason whatsoever.
At least with the <<< and @ operators, both add just enough useful
functionality, that I *would* consider using them, knowing I would be non-
portable. That's the point of all this -- its all just opinion. With
redefinable operators someone's arbitrary opinion doesn't enter into it -- you
can make it conform to whatever opinion you have. If you disagree with someone
else's opinion, you can even *change* it, subject only to your control over the
source.

A resounding 'no'. It's an interesting thought, but for me it falls
short on technical and psychological grounds, as I previously tried to
explain.

You haven't shown me that it falls on technical grounds. You've just said that
its hard to implement in the compiler -- but that's obvious.

Well they "provide" them, if that's what you mean.

They *have* to.

Ok. So your point was, in the first place.... ?

My point is that e-commerce is not a *hardware* problem. At least, since all
hardware supports it, its not a hardware problem. Its a software problem. The
point is the *C* does not offer a solution for e-commerce. The Java people can
point and laugh and say they are better for e-commerce than C, and ironically,
they would be correct (Java has a BigNum class).

You shouldn't trivialize IEEE-754, it's one hell of a standard, probably
one of the best I have ever seen.

You missed the implied sarcasm ...

And as you are probably well aware,
there's a perfectly good reason why its only now that the C standard
committee is looking at it (only now hardware supporting it is widely
available). I really don't see your point here.

Only NOW?!??! If by *now*, you mean 1991, then sure. The first IEEE-754
implementation came out in like 1984 or so (the 8087). Within a few years,
everyone else came on board -- Alpha (even though DEC was one of the vendors
originally most vigorously opposed to IEEE-754), Sparc, MIPS, PPC, they are all
IEEE-754, of course.

[...] I guess they're pretty satisfied with the bignum
libs that exist, that provide assembly implementations for all important
platforms (and even a slow fallback for others). The reality is that
no-one seems to care except you, on this.

Click to expand...

Click to expand...

The hardware people care that it exists, and not about the form of its
existence, so long as it gets used. For anyone who wants to *USE* this
functionality, though, they are stuck with assembly, or third party libraries.

Click to expand...

What's so bad about using third-party libraries?

From a usability point of view, GMP is crap, RSA's bignum library is crap (and
its expensive). None of them are portable to platforms which are in
development (you know like might be a concern for things like *SMART CARDS*.)

Reference, please.

It was in a presentation at the Intel Developer Forum in 1999 or 2000. I only
have this presentation in print form.

"load on typical e-commerce transaction" != "load on e-commerce server".

I meant the server. Clients have cycles to burn, obviously the client
performance doesn't matter on anything less than Half-life 2.

If the transaction is simply setup session/send credit-card
information/close session, this could be right. However, a jiffy ago we
were talking SSL.

Again, you're equating server load with transaction load.

Uhh ... both sides have to do the bignum operation. You have to do it both
ways because of man-in-the-middle attacks. Server load is what we are
concerned with here, obviously (Itanium is *not* a client platform.)

Fun thing is: it handles both signed and unsigned numbers. Not-so-funny
is the fact that the x86 supports (as do most microprocessors) both an
overflow flag and a carry flag. So, armed with this, please elaborate on
the meaning.

Yes, but I never mentioned/introduced overflow. I only considered carry. For
bignum implementation, OF is not relevant. Only the CF is.

It sure is... There are libraries containing hand-written assembly that
do fine

Go look at the original purpose of C again. Its supposed to be a portable
assembler. This is a job for C.

So on adding a signed plus unsigned.... The carry gets a value "as if"
the signed value was actually unsigned?

Yes. Note that x86 does not have a version of ADC that considers the sign of
its operands in any special way that is reflected in the CF.

[...] On adding two signed numbers,
the carry gets a value "as if" both numbers were unsigned (instead of
the much more useful "overflow" indicator)? This looks like badly
designed semantics.

It might to you, but the carry is not related to the sign of the operands.
(Your understanding of x86 seems to be shakey.)

Ok, so if either is signed, they will both be signed,

No ... that is not the rule used in C. unsigned supersedes signed.

Sure, it's an important instruction.

Important enough to be ubiquitous and available in all modern CPUs.

We're still about 999,500,000 USD short of a billion.

They are willing to spend a half million in development, because they know its
going to cost them a hundred million in extra silicon, and want to make sure
its money well spent (Whoops! I am practically giving away which chip company
I am talking about). You, of course, have missed this and the obvious implied
multiplier over all the other chip architectures, over all generations that
have such a feature. A billion dollars looks like a conservative, but
obviously reasonable order of magnitude guess as to the overall cost of
implementing widening multiplies.

Sure does. I estimate the cost of this design work would be in the range
of 10^6 dollars, but not 10^9 dollars.

That's *design* cost. That's nothing compared to the *silicon* cost.

Chris Torek · Dec 8, 2003

[Someone, probably Paul Hsieh, wrote:]

Widening multpilies [ie N x N => 2N bit result] ...
transistors by the number of CPUs shipped with a widening multipliy (PPC,
x86s, Alphas, UltraSparcs, ... etc) and you easily end up in the billion
dollar range.

Click to expand...

Click to expand...

[Sidney Cadot wrote:]

Important enough to be ubiquitous and available in all modern CPUs.

Actually, UltraSparc lacks a widening multiply.

It also has a widening multiply, but not necessarily the desired one.

In particular, UltraSparc is a 64-bit architecture, with 64-bit
registers. It has a 64-bit MULX instruction, with an unsigned
variant UMULX, but this instruction does 64 x 64 -> 64 multiplies,
discarding the upper 64-bit half of the 128-bit result.

The widening multiply is the old 32-bit-compatibility instruction,
which comes in SMULcc and UMULcc variants. It multiplies the lower
32 bits of two registers with the lower-half 32-bit result going to
a third register. The upper-half 32-bit result goes into the
special "%y" register. This instruction, which is a retrofit from
the V8 SPARC, is in turn backwards-compatible with pre-V8 SPARCs
which had only multiply-step instructions. (Using the multiply-
step instructions required careful fiddling with the %y register;
one of the appendices to the V7 architecture manuals gave multiply
algorithms.)

(Mr Hseih can, of course, simply define UltraSparc as "not a modern
CPU".)

None of this is particularly relevant to C today, which also lacks
a widening multiply -- you have to simulate it with, e.g.:

unsigned long result = (unsigned long)op1 * (unsigned long)op2;

I do think the UltraSparc design was the wrong choice -- C's lack
of direct access to various instructions does not mean the instructions
are unimportant. At the same time, I do not think C *needs* access
to all instructions. There are other programming languages, and
methods of tying C code into code written in other languages. They
are not portable, and this is not important.

Sidney Cadot · Dec 8, 2003

Paul said:
Sidney Cadot said:

Arthur said:

On Sat, 6 Dec 2003, Sidney Cadot wrote:
Good comparison. Support for &&& and ||| is exactly as likely to
break existing tools, and is exactly as likely to make it into a
future version of C. [There may not be a firm cause-and-effect there,
but it is correlated.]

Click to expand...

Yes, &&& and ||| would break existing tools, but there is precedent for
that (introduction of // comments).

Click to expand...

// was added in because so many compilers had already supported them
anyway. This was in keeping with the C standards committee "codifying
standard practices" nonsense. It was already there, all over the
place.
So...?

[...] The not unimportant difference is
that adding support for &&& and ||| would be rather trivial, while
support for Paul's proposal is a complete nightmare.

As to the likelihood of either feature making it into the C standard: I
disagree. The chances of ||| and &&& being added is many orders of
magnitudes greater than that of the the "operator introduction" feature
championed by Paul. Still very close to zero, of course. One has to
approach probabilities of this magnitude logarthmically

Click to expand...

Probably true, but your suggestion is also infinitely less ambitious,
and infinitely less useful.

On the first I agree, on the second I don't, and you forgot to mention
it's infinitely easier to implement.

Your suggestion is like the complex
number of variable length macros thing in C99. They are interesting
additions which provide some kind of epsilon more functionality, but
which don't compensate with the fact that you are leaving all C89
compilers behind.

My suggestion is more like "restrict" in C99 which adds functionality
that just is not in the language in any way shape or form.

Ah, yes, restrict. Another nice example of a feature that was introduced
that breaks existing tools.

I like restrict. It is useful, plugging an obvious semantic hole that
made many optimizations impossible. On top of that, it is easy to
support (though a bit less easy to take advantage of), as per 6.7.3.1#6:

"A translator is free to ignore any or all aliasing implications of uses
of restrict."

Unfortunately, your operator introduction magic lacks both these properties.

Some people would use it no matter what -- because it adds something so
totally powerful in terms of functionality that its worth losing the
compatibility with C89.

Pardon my ignorance: What does restrict add in terms of functionality?

Best regards,

Sidney

Sidney Cadot · Dec 9, 2003

Paul said:
This isn't a rigged contest problem. This is a real world problem.

I think you forgot the capitals on Real World, there...

Imagine a
*SERVERLESS* online game topology. How do you communicate "game-moves" in
realtime between players in a way that is most efficient and with the least
perceived lag, given a limited amount of bandwidth? Solve the above problem,
and you have a solution for this problem. (Actually for games you can crank d
a little higher than 2 -- the exact problem that I am looking at is different,
and I am loathe to increase d above 2.)

Just out of curiosity: why should the graph be directed; don't the nodes
have two-way communication? And you're not considering a lag metric
other than just number of hops?

You can think about this problem all you like. If you want to know if I am
fully of it or not, see if you can get above n=10,d=2.

Perhaps I will, I'll have some time on my hands round Xmas. Sounds like
a nice problem.

Tell me how you do this in a way that's compatible with testing it against 3
different compilers (one of the *OTHER* techniques I use to go after
performance ...) two of which are very unfriendly to being commandline driven,
while still making sense from a debugging point of view?

How about generating C source that you #include from a resident file in
your project, would that be an option?

Its a big deal to me. I don't own any material on the C language that
specifies all of this.

I actually go reference source code for a C library I
happen to have access to. But, of course, these sources and a real world
programmer's sense leads me to the very wrong conclusion that isgraph(-1) just
returns false.

The C standard specifies what is UB and what's not. Grepping an
implementation is a fragile way of dedicing semantics, I would say.

[...] This is much more of a big deal:

int count_nodes(struct Tree *t)
{
return t ? count_nodes(t->left)+count_nodes(t->right)
: 0;
}

Things like this are ubiquitous in many kinds of code.

Click to expand...

Uh ... dude, that just looks like an expensive way of writing:

int count_nodes (struct Tree *t) {
return 0;
}

Hmmm yes, it is, isn't it? Now I feel all silly!

I don't suppose you did very well on these programming contests did you? Btw,
weren't you railing against trigraphs?

Held my own, thank you very much. And yes, I was railing against
trigraphs. Why do you ask? There's no trigraph in the code.

[...] Trees with a big depth are not uncommon.

Click to expand...

Assuming you actually wanted to count the *NODES* of the tree, and were a
little concerned about performance try this:

static int count_nodes_inner (const struct Tree *t) {
return (t->left ? (1 + count_nodes_inner (t->left )) : 0) +
(t->right ? (1 + count_nodes_inner (t->right)) : 0);
}

int count_nodes (const struct Tree *t) {
return t ? count_nodes_innter (t) : 0;
}

as it avoids the leaf node tail calls, which will roughly double the
performance.

Well, I'm glad that you extracted my intended meaning from the name of
the function I used. Good thing I didn't use an operator! ;-)

[...] How can I ever claim such a function is portable if I have no
guarantee whatsoever that it will work on some architectures?

Click to expand...

Huh? Are you talking about the stack thing again?

I'm _still_ talking about 'the stack thing' yes, since you still seem to
think that this is a triviality. I think it's not.

Dude: malloc (65536) is not portable by the same reasoning, and:

If you think that, then you didn't follow my reasoning. I can find out
at compile time if malloc(65536) is allowed, and if it is, I can find
out at runtime if it has succeeded. I can do no such thing with stack
overflows. The program will exhibit all the symptoms of UB without a
proper cause that can be traced down to the standard.

int fib(int n) {
if (n <= 2) return 1;
return fib(n-1) + fib(n-2);
}

isn't going to work in any relevant way in any language, for a call like
f(1000000).

What are you trying to say here? Help me out a bit.

You simply do not understand. Malloc and multithreading are intertwined in the
implementation. If you have multithreading, they you have to lock and unlock
your heap, or use other strategies to make sure one thread isn't in the middle
of freeing while another is mallocing. If you don't, it introduces something
called a "race condition". Without exact considerations malloc and free would
simply fail to function properly in a multithreaded environment.
The point is that any extensions of the heap's semantics end up having exactly
the same problem. For example, how would one write a "freeall()" on top of
malloc? (Ignoring the fact that writing it *ON TOP* of malloc/free loses all
the potential *MASSIVE* performance advantage) Well presumably you'd have some
hidden part at the head of each allocation creating a linked list of
outstanding mallocs that were removed upon frees, and the freeall() function
would just walk through them calling free one by one. In a multithreaded
environment, once again, you'd be in race condition city. Same problem for
trying to build seperated heaps (virtual or otherwise.)

Now the C language standard says nothing about this of course, because C
doesn't say anything about multithreading at all. But that doesn't make the
problem go away -- most of the commonly used C compilers today include
multitasking extensions.

Are you sure you mean "multitasking" (which deals with processes,
usually each having their own address space) rather than
"multithreading" (which deals with threads of execution within the same
process, sharing the address space)?

I still seem to be missing your point rather blatantly, I'm afraid.

The point is that there is a *HUGE* difference between pushing such
functionality into the C standard, versus trying to obtain them from 3rd
parties. If it were put into the standard, then each vendor could modify their
heap implementation (a quite doable endeavour, since each vendor has already
solved the heap lock/unlock problem.) But via 3rd parties, the library vendor
has to learn and keep up to date with the OS primitives for each platform they
support.

Ok, I'll try to summarize my understanding of the point you try to make.
Let's consider it a priori desirable that we have a super-duper heap
manager, with all kinds of nice things like seperate heaps. We agree
that this is often useful. Now let's also assume a certain platform
where multithreading/multitasking is so important that the
platform-specific library provides support for it. We can assume that it
also sports a thread-safe malloc (yes?). You'd still want the
super-duper heap manager. Your point is that mandating a better heap
manager by the standard (without even mentioning multithreading) would
force each vendor to implement these, and implement them thread-safe if
they think multithreading is important to have. Is that close?

Now, as I see it, even if _just_ the standard malloc()/free()/etc are
thread-safe, you can build a more powerful heap manager on that alone,
using just standard C features. If the thing needs to be able to run in
a multithreaded environment, you could even do that, with some extra
effort. You need some form of mutex locking, but you can resort to
Lamport's bakery algorithm for that. Surely it won't be pretty, but it
is, in principle, possible. That's my point.

Mentioning the implementation malloc implicitely brings multithreading into the
discussion. That is what you don't understand.

If the malloc returns either NULL or a pointer (i.e., it conforms
minimally to the standard) then, even in a multithreaded environment, it
is quite possible to build a heap manager on top of that. If you need
high performance even in such demanding circumstances, then, yes, you'd
need some support from preferably hardware-assisted mutexes or similar.

Because of the multithreading issue ...

[...] Its actual useful
functionality that you just can't get from the C language, and there's
no way to reliably map such functionality to the C language itself.
One is forced to know the details of the underlying platform to
implement such things. Its something that really *should* be in the
language." [Paul Hsieh, 4 posts up]

See?

Click to expand...

Yes -- the implicit answer to the "WHY" question that follows each statement,
is "because of the multithreading problem". If you were aware of the issues,
I'm sure this wouldn't be quite so bewildering to you.

Ok. Let's try to get some closure here, this is eating into my free time
too much for my taste

I would say that writing a more powerful heap manager is possible in
principle, even in a multithreaded environment, as long as you have a
well-implemented malloc() (and friends) at your disposal. It would be a
drag to do so and it would be slow, but it is possible.

I think I have a clearer bearing on your statement why it would be good
to put this stuff in the library. You do need mutexes, and if the
implementation supports them, it will work better than a shared-memory
simulated mutex. So for performance reasons, I think your idea of
extending the basic heap manager, by extending its functionality as
mandated by the standard, is a good idea.

That is, as long as I can get my msize() function as well

[ skipped a bunch ]

Don't worry, you can look it up in the man pages. Unlike your
user-defined operators...

Click to expand...

You would have to look up "%b" ("%db"? -- whatever) in *UP TO DATE* man pages.
(Is *restrict* or the new *complex* type, from C99 properly documented in
*YOUR* current man pages?)

Yes, as close as gcc is, it does. And in other cases, I have the standard.

My proposed plug-in mechanism would be in modules
or header files that would, one way or another, have to be in your project
somewhere. Presumably, the source would have to be available somewhere, so
there would be *NO QUESTION* as to what it does.

Same comment applies to &&&, ||| versus user defined operators.

My claim is that so few people would use these operators, that the would be
defacto not supported. They would be less useful than bitfields.

Ouch! That hurts.

But I want them *not* to be used in C programs -- unless they are fully
documented in the source. Always.

So there we differ. This, I think, is a matter of taste. I propose to
drop this subthread as well.

I just gave some.

I presume you mean "All new operators must be defined in a central
module named ----. Or: Only these new operators may be added as defined
by ... yada, yada, yada."

My mistake I guess, I overlooked these coding standards completely. I
will start promoting them within my place of work as of tomorrow

What's hand waving is saying something about stack limits --
you haven't even suggested a way in which that *COULD* be possible
> to put down on a piece of paper.

Ah, the "tit-for-tat" technique...!

So far, nobody has challenged me to come up with a specific phrasing for
this (which actually surprises me). Here you go. It's a bit rough, but
you get the idea.

---- Proposed addition to the standard to add stack overflows:
---- Draft

a. For an active function invocation, define an upper bound for the
amount of memory in use by this function as follows (in the
following, denote "Upper Limit on Memory Usage" by "ULMU"):

- For every parameter/local variable of basic type there is a
corresponding constant macro of type size_t:

(unsigned) char: ULMU_CHAR
(unsigned) short: ULMU_SHORT
...
double ULMU_DOUBLE
...
pointer types: ULMU_POINTER

- The ULMU for a struct shall be equal to
the sum of upper limits of its members.
- The ULMU for a union shall be equal to
the maximum of upper limits of its members.
- The ULMU for an array shall be the number
of elements, multiplied by the upper-limit
of its base type.

((the last one is certainly wasteful, but this could be refined))

The ULMU of a function invocation shall be calculated as the
sum of ULMU values of the types of its parameters,
local variables, and return type, PLUS the constant
ULMU_FUNC_OVERHEAD.

b. The implementation shall define a macro ULMU_TOTAL,
which will evaluate to a size_t with value at least
4095. This value is guaranteed to remain constant during
execution of the program.

c. A function invocation is guaranteed to succeed only if the
total summed ULMU values of all active function invocations
(including the to-be active function after invocation)
does not exceed ULMU_TOTAL.

Its *OBVIOUS* what I am talking about above. You just
want something specific to have something arbitrary to disagree with.

You have to trust me on this one: I'm not trying to coerce you into a
laborious writing-down of a precise rule-set, or anything. It's just
that I found that making something concrete (which you intuitively feel
should be possible) gives a lot of insight into the kind of troubles you
would get into when actually doing it.

Just pick a convention -- if something bothers you about certain
> operators, spell it out in your convention.

As noted before, I think it would be not a good development if different
organisations settled on a different set of operators, even if they
meticulously documented their convention. Even things as simple as a
wrong (in the sense of: other than you're used to) bracing style can
hinder your productivity for quite some time before you get used to
that. Having a different set of condoned operators at each programming
shop sounds like a small tower of Babel to me.

No, but I have looked at the results of the annual obfuscated C code constest.
I have never suggested that you *COULDN'T* abuse the idea. But you can abuse
just about *ANYTHING* in C.

I'm not talking about abuse, I'm talking about the recognition capacity
of unknown operators (which is zilch, and which takes ages to get used
to). Now that's bad enough for languages, but I think it would be a
mistake of quite monumental proportions to allow some sort of C
"dialects" to emerge with different sets of operators, "operator wars",
and so on.

You don't think adding in ||| or &&& would add to
the potential for noise in those contests? I gave a very wordy description of
how to define an operator -- its kind of hard to obfuscate something completely
while you have so many hints (a keyword like _Operator, a whole function-like
declaration, etc) as to what it is. You would have to hide it behind macros to
truly obfuscate it, but you can use macros for arbitrary obfuscation
regardless.

When reading the code that used (rather than defines) the operator, it
helps tremendously if the meaning jumps from the page, rather than
having to look it up in the Big Manual of Operators in use at your local
workplace.

Well I'd never use &&& or ||| as you proposed them for any reason whatsoever.
At least with the <<< and @ operators, both add just enough useful
functionality, that I *would* consider using them, knowing I would be non-
portable. That's the point of all this -- its all just opinion. With
redefinable operators someone's arbitrary opinion doesn't enter into it -- you
can make it conform to whatever opinion you have.

Too much freedom is a dangerous thing, in programming at least.

If you disagree with someone else's opinion, you can even *change* it,
> subject only to your control over the source.

It would make for an interesting experiment, sociologically

A small hypothetical case study. Suppose engineer A, working on nuclear
reactor core controllers, works in a team where "?<" means "minimum" and
"#<" means "x is implied by y" for some data structure involved in logic
reasoning on the state of the system. Now he switches companies; he goes
to work on electronic toys and gadgets. Here the convention is
different, but the same operators are used.

Not much hard could be done, surely. Not unless someone from the toy
company joins the nucleare reactor programming team, that is

You haven't shown me that it falls on technical grounds. You've just said that
its hard to implement in the compiler -- but that's obvious.

You haven't shown me that it can be implemented in a compiler, even in
theory. That should count for something.

> [skip]

My point is that e-commerce is not a *hardware* problem. At least, since all
hardware supports it, its not a hardware problem. Its a software problem. The
point is the *C* does not offer a solution for e-commerce. The Java people can
point and laugh and say they are better for e-commerce than C, and ironically,
they would be correct (Java has a BigNum class).

The irony would be complete if the PHB-types would fall for that...

You missed the implied sarcasm ...

There's a nice coding convention on operators that covers this, they're
called "emoticons" nowadays.

Only NOW?!??! If by *now*, you mean 1991, then sure. The first IEEE-754
implementation came out in like 1984 or so (the 8087). Within a few years,
everyone else came on board -- Alpha (even though DEC was one of the vendors
originally most vigorously opposed to IEEE-754), Sparc, MIPS, PPC, they are all
IEEE-754, of course.

In '89, support hadn't spread wide enough to merit a statement (much
less an implementation mandate) on it in the standard.

[...] I guess they're pretty satisfied with the bignum
libs that exist, that provide assembly implementations for all important
platforms (and even a slow fallback for others). The reality is that
no-one seems to care except you, on this.

Click to expand...

The hardware people care that it exists, and not about the form of its
existence, so long as it gets used. For anyone who wants to *USE* this
functionality, though, they are stuck with assembly, or third party libraries.

Click to expand...

What's so bad about using third-party libraries?

Click to expand...

From a usability point of view, GMP is crap, RSA's bignum library is crap (and
its expensive).

How's that? How are you going to improve things by adding a carry operator?

None of them are portable to platforms which are in
development (you know like might be a concern for things like *SMART CARDS*.)

Now there's an application where the economy of scale suggest doing a
hand-written assembly routine, if I ever saw one...

It was in a presentation at the Intel Developer Forum in 1999 or 2000. I only
have this presentation in print form.
>

I meant the server. Clients have cycles to burn, obviously the client
performance doesn't matter on anything less than Half-life 2.

The question remains: how much time is your typical e-commerce server
involved in an actual transaction? Howe much transactions per second can
a machine handle now? I would suppose that most of the time spent in a
transaction is actually a database lookup at a creditcard company, not a
powermod. The frontend webserver is just waiting for the bank I think.
But these are just half-educated guesses.

Uhh ... both sides have to do the bignum operation.

I don't mean "client load" when I write "transaction load", I'm still
strictly speaking about the server. I could be convinced that the amount
of time spent in powermod for a 1-shot transaction could be as high as
30%, but I'd be hard-pressed to believe a front-end webserver is busy
for more than 10% doing actual transactions. I think the latency of an

You have to do it both
ways because of man-in-the-middle attacks. Server load is what we are
concerned with here, obviously (Itanium is *not* a client platform.)

Yes, but I never mentioned/introduced overflow. I only considered carry. For
bignum implementation, OF is not relevant. Only the CF is.

But for signed numbers, carry is almost meaningless. So what remains of
the semantics in this case?

Go look at the original purpose of C again. Its supposed to be a portable
assembler. This is a job for C.

I would usually agree on statements like this, but here's an exception.
I still have trouble envisioning carry semantics.

Yes. Note that x86 does not have a version of ADC that considers the sign of
its operands in any special way that is reflected in the CF.

Of course, but you have to know whether your operands are signed or
unsigned to see whether the CF or the OF is the important flag.

[...] On adding two signed numbers,
the carry gets a value "as if" both numbers were unsigned (instead of
the much more useful "overflow" indicator)? This looks like badly
designed semantics.

Click to expand...

It might to you, but the carry is not related to the sign of the operands.
(Your understanding of x86 seems to be shakey.)

My x86 is not too strong, but I am quite well-versed in 6502, 68k and
PowerPC, which enables me to think about these things. They all share
carry and overflow bits in their designs, with little difference. After
performing an add-with-carry, both CF and OF are set (although the V
flag may not be reset, as in the 6502), but the CF generally does not
provide useful information when adding two numbers that are signed. So I
don't see how

c +< v = a + b

would provide useful information in c, when both a and b are signed, if
it is to copy the carry status.

No ... that is not the rule used in C. unsigned supersedes signed.

Ok. That just leaves the case of two signed values, then.

I'm still wondering about the high_order_value() and low_order_value()
semantics I mentioned in my previous post, actually the part which I was
most curious about.

About the "billions of dollars" cost: I've snipped that part if you
don't mind. I like the technical issues much better

Let's agree that
there's a lot of money at stake.

Best regards,

Sidney

glen herrmannsfeldt · Dec 9, 2003

Chris Torek wrote:
(snip)

In particular, UltraSparc is a 64-bit architecture, with 64-bit
registers. It has a 64-bit MULX instruction, with an unsigned
variant UMULX, but this instruction does 64 x 64 -> 64 multiplies,
discarding the upper 64-bit half of the 128-bit result.

The widening multiply is the old 32-bit-compatibility instruction,
which comes in SMULcc and UMULcc variants. It multiplies the lower
32 bits of two registers with the lower-half 32-bit result going to
a third register. (snip)
None of this is particularly relevant to C today, which also lacks
a widening multiply -- you have to simulate it with, e.g.:

unsigned long result = (unsigned long)op1 * (unsigned long)op2;

I do think the UltraSparc design was the wrong choice -- C's lack
of direct access to various instructions does not mean the instructions
are unimportant. At the same time, I do not think C *needs* access
to all instructions. There are other programming languages, and
methods of tying C code into code written in other languages. They
are not portable, and this is not important.

At some point, one of the thinks I look at in a new architecture
is its implementation of (n)*(n)=(2n) multiply, and (2n)/(n)=(n) divide.

I am not so convinced anymore, though. They are primarily used to
write multiple precision arithmetic routines. It isn't that much
harder to write them with the 32 bit versions as the 64 bit versions.

The amount of work necessary to make the hardware is not necessarily
worthwhile for the small advantage that it supplies.

When IBM first added the extended precision (128 bit) floating point
instructions to S/360 and S/370 they did not include a divide
instruction. They found that it was not worth the cost, but instead
supplied a software simulation of it. They also supplied a simulator
for all the extended precision instructions for those processors that
needed it.

If Sun provided either a simulator or subroutine that would do those
operations, that would seem useful enough to me.

-- glen

Paul Hsieh · Dec 9, 2003

Just out of curiosity: why should the graph be directed; don't the nodes
have two-way communication?

Yes, but bandwidth is also an issue. If you connect to more and more peers and
just push data out to all of them, then your upload bandwidth just gets split
more and more. d=1 leads to a trivial solution but with unacceptable lag. So
d=2 is the minimum divisor that gives rich enough possibilities for topologies
to try to solve this problem in interesting ways.

So although actually you will, on average, have 4 connections established (two
out, and two in), you are transmitting on two of them, and receiving on the
other two.

[...] And you're not considering a lag metric other than just number of
hops?

Solving variants of Travelling Salesman is next on my list of things that I'll
spend my most masochistic moments on.

Perhaps I will, I'll have some time on my hands round Xmas. Sounds like
a nice problem.

Its nice to think about, not so nice if you actually have to solve it.
Dijkstra's weighted shortest path algorithms were nice for client-server based
problems, but the future is peer 2 peer. I believe that this problem and
problems like it are going to be at the center of a lot of internet technology
going forward.

How about generating C source that you #include from a resident file in
your project, would that be an option?

I don't know that MSVC's project mechanisms can be finagled to compile and
execute one problem just to build source code for another program that you
compile up. With makefiles what you are saying is easy, and I've done it in
the past, but you can see how this kind of thing puts pressure on your
development process. A sufficiently useful preprocessor mechanism would allow
one to write such code generators in a completely portable way.

The C standard specifies what is UB and what's not. Grepping an
implementation is a fragile way of dedicing semantics, I would say.

Yes, and what percentage of C programmers in the world do you think have ever
had the standard in their hands that they actually refer to?

Well, I'm glad that you extracted my intended meaning from the name of
the function I used. Good thing I didn't use an operator! ;-)

Yeah, well I noticed that you didn't catch *my* bug? (I missed one detail.)

I'm _still_ talking about 'the stack thing' yes, since you still seem to
think that this is a triviality. I think it's not.

Dude: malloc (65536) is not portable by the same reasoning, and:

If you think that, then you didn't follow my reasoning. I can find out
at compile time if malloc(65536) is allowed, and if it is, I can find
out at runtime if it has succeeded. I can do no such thing with stack
overflows. The program will exhibit all the symptoms of UB without a
proper cause that can be traced down to the standard.

What are you trying to say here? Help me out a bit.

I'm trying to say that *ALL* languages have this issue. The resources required
to retain a call stack are unknown and often difficult to determine from any
reasonable code analysis. Something like "counting the depth" is not very
useful since there are no language provisions from actually counting out to
such a depth. Some compilers implement tail recursion, though doing in all
cases is equivalent to the halting problem -- so you can't even know for sure
when its happening.

If you want a really good case, look up the Ackermann function. It looks very
simple, with the only concrete constant operation being a "+1". Knowing that
there is some kind of stack bound is totally useless for this function. Most
computations of the Ackermann function blow out any fixed sized stack, and even
a fully addressable 64bit machine would not have the required memory to
implement a stack that could compute any of its non-trivial range. The point
being there is no kind of reasonable metric for measuring the stack
requirements that can be generally applied.

My objection to your proposal can simply be restated as "all machines have
limited resources", deal with it. At best you could insist that upon reaching
some limited stack depth that the problem should immediately halt or something
(with stack checking turned on, most x86 compilers provide for this in the form
of a runtime error.)

Ok, I'll try to summarize my understanding of the point you try to make.
Let's consider it a priori desirable that we have a super-duper heap
manager, with all kinds of nice things like seperate heaps. We agree
that this is often useful. Now let's also assume a certain platform
where multithreading/multitasking is so important that the
platform-specific library provides support for it. We can assume that it
also sports a thread-safe malloc (yes?). You'd still want the
super-duper heap manager. Your point is that mandating a better heap
manager by the standard (without even mentioning multithreading) would
force each vendor to implement these, and implement them thread-safe if
they think multithreading is important to have.

Basically, yes.

Now, as I see it, even if _just_ the standard malloc()/free()/etc are
thread-safe, you can build a more powerful heap manager on that alone,
using just standard C features.

This is completely and utterly untrue. *THAT* is my point. None of the
functions I proposed as extensions can be implemented (even in prototypical
way) on top of just malloc()/free()/etc and retain their intended semantics.
(Of course, you could implement them with memory leaks out the wazoo -- i.e.,
just never free anything, but that's clearly not acceptable.)

[...] If the thing needs to be able to run in
a multithreaded environment, you could even do that, with some extra
effort. You need some form of mutex locking, but you can resort to
Lamport's bakery algorithm for that.

I had to look this one up -- that's one of the classic useless non-solutions
(especially if you want to call malloc more than 4 billion times, for example),
which explains why I've never committed it to memory. You still need to be
able to identify process instances, which *BY ITSELF* is not portable. Process
instances identifiers would count as something *BEYOND* just malloc/free/etc.

[...] Surely it won't be pretty, but it is, in principle, possible. That's
my point.

Not its not. Its not portable, that's for sure.

If the malloc returns either NULL or a pointer (i.e., it conforms
minimally to the standard) then, even in a multithreaded environment, it
is quite possible to build a heap manager on top of that.

Why don't you just try to do it. I mean without race conditions, and without
hooking out or using any of the spawn/mutex/etc primitives that you'll find in
the multithreading API. That ought to be a good trick.

[...] If you need
high performance even in such demanding circumstances, then, yes, you'd
need some support from preferably hardware-assisted mutexes or similar.

High performance is just something I realized you could get as a nice side
effect from such extensions. But the functionality enchancements alone are
worth it. *BUT* whether or not this is properly implemented with respect to
the existence of multithreading has no relevance to its performance -- that's
just a question of correctness.

Ok. Let's try to get some closure here, this is eating into my free time
too much for my taste

I would say that writing a more powerful heap manager is possible in
principle, even in a multithreaded environment, as long as you have a
well-implemented malloc() (and friends) at your disposal.

*BZZZT!!!* Wrong.

It would be a drag to do so and it would be slow, but it is possible.

Impossible. Slow, fast or otherwise. Its just impossible.

I think I have a clearer bearing on your statement why it would be good
to put this stuff in the library. You do need mutexes, and if the
implementation supports them, it will work better than a shared-memory
simulated mutex. So for performance reasons, I think your idea of
extending the basic heap manager, by extending its functionality as
mandated by the standard, is a good idea.

That is, as long as I can get my msize() function as well

Well ok, msize() (implemented as _msize() in WATCOM C/C++) does not require
multithreading support. That would be a good addition as well. One has to be
very careful about what kind of implementation impact this has. For example,
if an implementation has an extension which has an heap compatible
automatically growable allocation (for example touching the adjacent memory
page, which would be "paged out" by default, might automatically append it to a
"growable" allocation), then knowing its size might not make a lot of sense.

Yes, as close as gcc is, it does. And in other cases, I have the standard.

But you said you were just going look in your man pages. With my proposal,
there would be no question -- the source would have to be in the project
somewhere. If someone closed their source, well, presumably your going to
expect documentation anyway, if not, you've got the same problem that you
always expect from closed source libraries.

Ah, the "tit-for-tat" technique...!

So far, nobody has challenged me to come up with a specific phrasing for
this (which actually surprises me). Here you go. It's a bit rough, but
you get the idea.

---- Proposed addition to the standard to add stack overflows:
---- Draft

a. For an active function invocation, define an upper bound for the
amount of memory in use by this function as follows (in the
following, denote "Upper Limit on Memory Usage" by "ULMU"):

And what if the local/parameters are not stored, or storable on the stack?
What if an implementation uses multiple stacks depending on the *type* of the
parameter?

- For every parameter/local variable of basic type there is a
corresponding constant macro of type size_t:

You missed all the implicit temporaries required. Homework exercise: for any
fixed number n, create a function which requires at least n implicit
temporaries (not explicitely declared) to implement.

(unsigned) char: ULMU_CHAR
(unsigned) short: ULMU_SHORT
...
double ULMU_DOUBLE
...
pointer types: ULMU_POINTER

- The ULMU for a struct shall be equal to
the sum of upper limits of its members.

This doesn't sound right with implementation defined struct alignment issues.

- The ULMU for a union shall be equal to
the maximum of upper limits of its members.

Doesn't that assume that the implementation implements overlapping union
members? (It makes sense not to overlap floating point with integer if, for
example, the memory units in the CPU internally tag the type of the contents of
the memory in order to allow the FPU and Integer units to execute totally
asynchronously even when both are accessing memory, if say that tag switching
had a high cost.)

- The ULMU for an array shall be the number
of elements, multiplied by the upper-limit
of its base type.

((the last one is certainly wasteful, but this could be refined))

The ULMU of a function invocation shall be calculated as the
sum of ULMU values of the types of its parameters,
local variables, and return type, PLUS the constant
ULMU_FUNC_OVERHEAD.

And what about the number of alloca() calls? alloca() is a common extension
which just eats extra space off the stack dynamically -- the big advantage
being that you don't have to call free or anything like it, for it to be
properly cleaned up. It will be cleaned up upon the function's return (the
simplest implementation of a "freeall".)

Would it not make more sense to remove all these details, and just say that
each function creates a macro named _STACK_USAGE_<functionname> which is not
available inside the scope of any function or prior to the definition of the
function? (minus alloca() calls) This would then allow you to track the stack
usage without (incorrectly, I might add) trying to figure it out from details
that are implementation specific?

Regardless, none of this helps if you have multiple stacks, for example. In
such cases, if *any* stack runs out you are in trouble. But its obvious that
your intention is assume a *unified* stack implementation. The math won't
translate in a portable way.

b. The implementation shall define a macro ULMU_TOTAL,
which will evaluate to a size_t with value at least
4095. This value is guaranteed to remain constant during
execution of the program.

But in implementations that *I* am aware of, the stack size is set at *LINK*
time. So it can't *BE* a macro available at compile time without giving up on
this flexibility. This wouldn't work for me at all, since I *use* this feature
in my own projects. At the very least make this a global variable -- that *IS*
settable at link time.

c. A function invocation is guaranteed to succeed only if the
total summed ULMU values of all active function invocations
(including the to-be active function after invocation)
does not exceed ULMU_TOTAL.

You have to trust me on this one: I'm not trying to coerce you into a
laborious writing-down of a precise rule-set, or anything. It's just
that I found that making something concrete (which you intuitively feel
should be possible) gives a lot of insight into the kind of troubles you
would get into when actually doing it.

But *YOU* could have written up a possible coding standard, and explain why you
think such a thing could not work in practice. There are as many coding
standards as programming projects -- picking one is always controvertial and
instantly makes you open for attack.

As noted before, I think it would be not a good development if different
organisations settled on a different set of operators, even if they
meticulously documented their convention. Even things as simple as a
wrong (in the sense of: other than you're used to) bracing style can
hinder your productivity for quite some time before you get used to
that. Having a different set of condoned operators at each programming
shop sounds like a small tower of Babel to me.

But I don't understand how this is any different from free form function names.

I'm not talking about abuse, I'm talking about the recognition capacity
of unknown operators (which is zilch, and which takes ages to get used
to). Now that's bad enough for languages, but I think it would be a
mistake of quite monumental proportions to allow some sort of C
"dialects" to emerge with different sets of operators, "operator wars",
and so on.

You mean like function-name wars? Like string library wars (actually that sort
of is happening ...)? The point is that there's no chance that anyone would
mistake an *extensible* operator as being anything other than it is. To
understand it you have to look up its definition, just like a function.

When reading the code that used (rather than defines) the operator, it
helps tremendously if the meaning jumps from the page, rather than
having to look it up in the Big Manual of Operators in use at your local
workplace.

You are completely ignoring that large population of people who happily use
operator overloading today. You don't understand that what you are saying is
nothing more than your very narrow opinion. And you're using your opinion to
try to trump a proposal whose merits lean very heavily on its technical
features.

Too much freedom is a dangerous thing, in programming at least.

Yeah we should all rename our functions as func0000, func0001, etc ...

It would make for an interesting experiment, sociologically

This "experiment" is already ongoing -- notice how BSD has been forked out the
wazoo, how there are about a million regex implementations, and how Microsoft
has "winsock" as an alternative of UNIX's sockets. The world hasn't ended
because of any of this.

A small hypothetical case study. Suppose engineer A, working on nuclear
reactor core controllers, works in a team where "?<" means "minimum" and
"#<" means "x is implied by y" for some data structure involved in logic
reasoning on the state of the system. Now he switches companies; he goes
to work on electronic toys and gadgets. Here the convention is
different, but the same operators are used.

Or suppose a programmer uses clock() all his life in DOS/Windows as a higher
resolution realtime, yet portable, timer. This programmer then moves to UNIX
and is shocked to find that all mutexes turn in a clock() measurement of 0
microseconds regardless of how long they block! Even Sleep(1000) measures 0
microseconds!

The example you've made up has to do with basic comprehension of the engineer -
- since "?<" would clearly be one of the "extended operators", the programmer
is supposed to look up the its definition. Just like they would have to look
up logMsg() or any other function name that might show up in more than one
project. Just like they are supposed to magically know how clock() behaves on
different OS'es.

You haven't shown me that it can be implemented in a compiler, even in
theory. That should count for something.

I don't think there's a credible assertion that my idea is impossible. You
want me to describe a whole compiler implementation just to support my idea?
As long as one can create a well-defined definition, why should you expect it
would be impossible? Have I created a halting problem or some kind of NP-
complete algorithm or something? -- I don't think so.

[...] I guess they're pretty satisfied with the bignum
libs that exist, that provide assembly implementations for all important
platforms (and even a slow fallback for others). The reality is that
no-one seems to care except you, on this.

The hardware people care that it exists, and not about the form of its
existence, so long as it gets used. For anyone who wants to *USE* this
functionality, though, they are stuck with assembly, or third party libraries.

What's so bad about using third-party libraries?

Click to expand...

Click to expand...

From a usability point of view, GMP is crap, RSA's bignum library is crap (and
its expensive).

Click to expand...

How's that? How are you going to improve things by adding a carry operator?

In any way I choose -- that's the point. Right now the barrier to entry is
very high, because I would have to implement a competitive library in assembly
language. I'm not kidding when I tell you these libraries are pure excrement.
Not in terms of performance -- they both perform reasonably well -- I mean in
terms of how the heck do I use these things?

You see, with language support, an intrepid developer could sacrifice *some*
performance for portability and demonstrate a much nicer and more easily usable
interface. This *competition* might *force* these popular vendors for fix
their interfaces so that they could deliver both the performance and usability
benefits. But because both of these have wads of assembly language for various
architectures, is very difficult and expensive to compete with them. Even if I
made a single x86 implementation, it would be difficult for me to beat their
performance by any measurable margin, and my benefit of having possibly a
better interface would have to be balanced against the fact that I have no
portability, while GMP does (by virtue of having been ported.)

Now there's an application where the economy of scale suggest doing a
hand-written assembly routine, if I ever saw one...

About a year ago, Slumberger gave an amazing talk about implementing pico-java
bytecode into a smart card. It was incredible, because it was obvious that
they would be able to use all the Java tools/libraries, they could simulate it
almost perfectly in software -- mostly inheriting the pico-java architecture,
and because of Java's bignum class, it meant that exposing a widening multiply
and carry propogating add functionality in their smart card would be enough to
automatically have all the software they need for RSA all ready to go.

Do you understand? Although C compiler porting frameworks exist, the lack of a
standard for widening multiply and carry propogating add means they would have
to expose a mechanism themselves, then write all the software themselves just
get to the point of having a bignum library (which Java basically comes with)
before proceeding to their RSA implementation.

The question remains: how much time is your typical e-commerce server
involved in an actual transaction? Howe much transactions per second can
a machine handle now? I would suppose that most of the time spent in a
transaction is actually a database lookup at a creditcard company, not a
powermod.

No. Its the powermod. One instruction -- 30% of all processor time. The rest
-- including hitting a disk or talking to some authentication mechanism over a
network fits into that the 70% (though obviously you would expect some amount
of DMA overlapping.) Get it? Powermod is slower than disk swapping -- that's
what Intel was measuring, and from my own calculations, their claim is quite
credible. As I recall, you only need a about a hundred simultaneous
transactions for this to show up. Remember, Intel's marketing point is that
they claim Itanium will save you money by using fewer systems per number of
maximum sustained transactions. In a very real sense, their claim is very
credible, based solely on the fact that they have dual fully pipelined widening
multipliers.

[...] The frontend webserver is just waiting for the bank I think.
But these are just half-educated guesses.

No -- servers obviously amortize this over multiple threads, and performing
queued up block transfers. While these things take a long time, they are all
queued, blocked, buffered etc. Serial solutions are not exceptable. Your
guess is way off the mark. This wouldn't take any appreciable webserver CPU
power at all.

I don't mean "client load" when I write "transaction load", I'm still
strictly speaking about the server. I could be convinced that the amount
of time spent in powermod for a 1-shot transaction could be as high as
30%, but I'd be hard-pressed to believe a front-end webserver is busy
for more than 10% doing actual transactions.

No. You misunderstood. 30% of *THE WHOLE THING* is in just the powermod. I'm
talking about serving static webpages while some consumer is staring at
advertising or whatever else before they decide to buy whatever. Just their
action of pressing the "checkout" button to buy the stuff in their shopping
card just pegs the server's CPU.

I'm not at all surprised you have a hard time believing this, as your grasp of
multitasking seems tenuous at best.

[...] I think the latency of an
e-commerce transaction is mostly caused by frontend<->bank
communication, and intra-bank database lookup. But I cannot back this up.

The *LATENCY* is something you measure from the client. The latency is not
relevant to the real bottlenecks in the webserver. The client sees the latency
of waiting for the bank, but the server does *NOT* have code like:

while (HeyBankAreYouDoneYet()) {
runSetiAtHome();
}

in it. At the very least there is a blocking operation like "Sleep (1)" in the
place of "runSetiAtHome()". (Hint: the blocking is when a thread comes in
where powermod takes over and spikes the CPU usage.)

But for signed numbers, carry is almost meaningless. So what remains of
the semantics in this case?

Why is the carry meaningless? Ok, you obviously don't know the details of how
bignum is implemented. Ok, a large 2s complement integer is filled with an
array of unsigned sub-integers, except for the one at the top. *That* one is
signed, and that's how you know if the bignum integer is negative or not. The
problem is that the act of addition can cause a roll around at the top word
which requires an expansion of bignum by an extra integer in size. The logic
you use to determine this is just related to examining the carry flag.

It all just comes down to the carry flag. The other flags are not necessary
for bignum libraries. Maybe there's a way to transform the OF into something
that can be used in a bignum library, but I am not aware of it.

But if you want the OF, then why not have another operator for that?

Of course, but you have to know whether your operands are signed or
unsigned to see whether the CF or the OF is the important flag.

OF is never the important flag. Its not relevant at all. I believe that
taking into account the signedness of the result and/or the operands and one of
these flags is redudant with the other flag (I'm not 100% sure, but the last
time I cared to look at OF, I believe I resolved never to look at it again for
precisely this reason). So while you could map between them, one suddenly
doesn't become more important in different contexts. I *KNOW* the solution for
implementing bignum with a CF alone (and I believe its the standard way
everyone does it), so OF is just not relevant.

[...] On adding two signed numbers,
the carry gets a value "as if" both numbers were unsigned (instead of
the much more useful "overflow" indicator)? This looks like badly
designed semantics.

Click to expand...

Click to expand...

It might to you, but the carry is not related to the sign of the operands.
(Your understanding of x86 seems to be shakey.)

Click to expand...

My x86 is not too strong, but I am quite well-versed in 6502, 68k and
PowerPC, which enables me to think about these things. They all share
carry and overflow bits in their designs, with little difference. After
performing an add-with-carry, both CF and OF are set (although the V
flag may not be reset, as in the 6502), but the CF generally does not
provide useful information when adding two numbers that are signed. So I
don't see how

c +< v = a + b

would provide useful information in c, when both a and b are signed, if
it is to copy the carry status.

In a bignum library, the signedness of each sub-word is *dynamic* (the top word
is signed, the rest are unsigned.) So regardless of the declaration of the
you want the carry flag of an "as if the operands were unsigned".

Ok. That just leaves the case of two signed values, then.

See above. Carry is not a signedness sensitive semantic.

I'm still wondering about the high_order_value() and low_order_value()
semantics I mentioned in my previous post, actually the part which I was
most curious about.

I wasn't able to suss out a coherent question from your post, so I just deleted
this discussion. In this case, I am just copying the x86's MUL -> EDX:EAX or
AMD64's MUL -> RDX:RAX semantics. But other RISC CPUs usually have MULHI and
MULLO instructions -- they usually use some trickery by which if you issue
these instructions back to back with the same source parameters, then its
actually computed in one widening ALU operation (so that its comparable,
hardware-wise, to what the x86 does.)

Sidney Cadot · Dec 9, 2003

Paul said:
Yes, but bandwidth is also an issue. If you connect to more and more peers and
just push data out to all of them, then your upload bandwidth just gets split
more and more. d=1 leads to a trivial solution but with unacceptable lag. So
d=2 is the minimum divisor that gives rich enough possibilities for topologies
to try to solve this problem in interesting ways.

So although actually you will, on average, have 4 connections established (two
out, and two in), you are transmitting on two of them, and receiving on the
other two.
Ok.

[...] And you're not considering a lag metric other than just number of
hops?

Click to expand...

Solving variants of Travelling Salesman is next on my list of things that I'll
spend my most masochistic moments on.

Its nice to think about, not so nice if you actually have to solve it.
Dijkstra's weighted shortest path algorithms were nice for client-server based
problems, but the future is peer 2 peer. I believe that this problem and
problems like it are going to be at the center of a lot of internet technology
going forward.

I don't know that MSVC's project mechanisms can be finagled to compile and
execute one problem just to build source code for another program that you
compile up. With makefiles what you are saying is easy, and I've done it in
the past, but you can see how this kind of thing puts pressure on your
development process. A sufficiently useful preprocessor mechanism would allow
one to write such code generators in a completely portable way.
Agreed.

Yes, and what percentage of C programmers in the world do you think have ever
had the standard in their hands that they actually refer to?

I think there are relatively few excuses for any professional C
programmer not to have access to the standard (and reading it once or
twice wouldn't hurt either). It's only 18 bucks in electronic form, and
you don't even have to go out to get it.

Yeah, well I noticed that you didn't catch *my* bug? (I missed one detail.)

I'm trying to say that *ALL* languages have this issue. The resources required
to retain a call stack are unknown and often difficult to determine from any
reasonable code analysis. Something like "counting the depth" is not very
useful since there are no language provisions from actually counting out to
such a depth. Some compilers implement tail recursion, though doing in all
cases is equivalent to the halting problem -- so you can't even know for sure
when its happening.

If you want a really good case, look up the Ackermann function. It looks very
simple, with the only concrete constant operation being a "+1". Knowing that
there is some kind of stack bound is totally useless for this function. Most
computations of the Ackermann function blow out any fixed sized stack, and even
a fully addressable 64bit machine would not have the required memory to
implement a stack that could compute any of its non-trivial range. The point
being there is no kind of reasonable metric for measuring the stack
requirements that can be generally applied.

My objection to your proposal can simply be restated as "all machines have
limited resources", deal with it.

Then the standard would have to specify what it means exactly by a
"resource". Standards are no places for vague terminology.

At best you could insist that upon reaching
some limited stack depth that the problem should immediately halt or something
(with stack checking turned on, most x86 compilers provide for this in the form
of a runtime error.)

Ok, I'll try to summarize my understanding of the point you try to make.
Let's consider it a priori desirable that we have a super-duper heap
manager, with all kinds of nice things like seperate heaps. We agree
that this is often useful. Now let's also assume a certain platform
where multithreading/multitasking is so important that the
platform-specific library provides support for it. We can assume that it
also sports a thread-safe malloc (yes?). You'd still want the
super-duper heap manager. Your point is that mandating a better heap
manager by the standard (without even mentioning multithreading) would
force each vendor to implement these, and implement them thread-safe if
they think multithreading is important to have.

Click to expand...

Basically, yes.

Now, as I see it, even if _just_ the standard malloc()/free()/etc are
thread-safe, you can build a more powerful heap manager on that alone,
using just standard C features.

Click to expand...

This is completely and utterly untrue. *THAT* is my point. None of the
functions I proposed as extensions can be implemented (even in prototypical
way) on top of just malloc()/free()/etc and retain their intended semantics.
(Of course, you could implement them with memory leaks out the wazoo -- i.e.,
just never free anything, but that's clearly not acceptable.)

[...] If the thing needs to be able to run in
a multithreaded environment, you could even do that, with some extra
effort. You need some form of mutex locking, but you can resort to
Lamport's bakery algorithm for that.

Click to expand...

I had to look this one up -- that's one of the classic useless non-solutions
(especially if you want to call malloc more than 4 billion times, for example),
which explains why I've never committed it to memory.

We have long longs nowadays. 2^32 is not an important limit.

You still need to be
able to identify process instances, which *BY ITSELF* is not portable. Process
instances identifiers would count as something *BEYOND* just malloc/free/etc.

Why do you need to distinguish process instances? In your API, each call
carries the heap it refers to as a parameter.

[...] Surely it won't be pretty, but it is, in principle, possible. That's
my point.

Click to expand...

Not its not. Its not portable, that's for sure.

If the malloc returns either NULL or a pointer (i.e., it conforms
minimally to the standard) then, even in a multithreaded environment, it
is quite possible to build a heap manager on top of that.

Click to expand...

Why don't you just try to do it. I mean without race conditions, and without
hooking out or using any of the spawn/mutex/etc primitives that you'll find in
the multithreading API. That ought to be a good trick.

[...] If you need
high performance even in such demanding circumstances, then, yes, you'd
need some support from preferably hardware-assisted mutexes or similar.

Click to expand...

High performance is just something I realized you could get as a nice side
effect from such extensions. But the functionality enchancements alone are
worth it. *BUT* whether or not this is properly implemented with respect to
the existence of multithreading has no relevance to its performance -- that's
just a question of correctness.

Ok. Let's try to get some closure here, this is eating into my free time
too much for my taste

I would say that writing a more powerful heap manager is possible in
principle, even in a multithreaded environment, as long as you have a
well-implemented malloc() (and friends) at your disposal.

Click to expand...

*BZZZT!!!* Wrong.

It would be a drag to do so and it would be slow, but it is possible.

Click to expand...

Impossible. Slow, fast or otherwise. Its just impossible.

If you can explain to me why you need to explicitly distinguish threads
in the memory manager, I'll concede the point.

Well ok, msize() (implemented as _msize() in WATCOM C/C++) does not require
multithreading support. That would be a good addition as well. One has to be
very careful about what kind of implementation impact this has. For example,
if an implementation has an extension which has an heap compatible
automatically growable allocation (for example touching the adjacent memory
page, which would be "paged out" by default, might automatically append it to a
"growable" allocation), then knowing its size might not make a lot of sense.

But you said you were just going look in your man pages. With my proposal,
there would be no question -- the source would have to be in the project
somewhere. If someone closed their source, well, presumably your going to
expect documentation anyway, if not, you've got the same problem that you
always expect from closed source libraries.

The standard is the canonical place to look for how something should
work. I can't believe I would actually have to argue over that. In fact:
- I won't.

And what if the local/parameters are not stored, or storable on the stack?

How do you mean? They have to be stored. And I'm presuming they're
stored on the stack.

What if an implementation uses multiple stacks depending on the *type* of the
parameter?

The hypothetical implementation that does this will be out of luck. Not
because the proposal wouldn't work, mind you, but because the ULMU is a
very loose bound. But hey, it beats having no bound whatsoever.

You missed all the implicit temporaries required. Homework exercise: for any
fixed number n, create a function which requires at least n implicit
temporaries (not explicitely declared) to implement.

Ah yes! So it seems. An interesting problem. Have to think about that
for a bit.

This doesn't sound right with implementation defined struct alignment issues.

ULMU_xxx can be increased to account for that. We're trying to establish
an upper bound here.

Doesn't that assume that the implementation implements overlapping union
members?

Yes. This is a safe assumption (standard quote):

6.7.2.1 #14 The size of a union is sufficient to contain the largest of
its members. The value of at most one of the members can be stored in a
union object at anytime. A pointer to a union object, suitably
converted, points to each of its members (or if a member is a bitfield,
then to the unit in which it resides), and vice versa.

(It makes sense not to overlap floating point with integer if, for
example, the memory units in the CPU internally tag the type of the contents of
the memory in order to allow the FPU and Integer units to execute totally
asynchronously even when both are accessing memory, if say that tag switching
had a high cost.)

And what about the number of alloca() calls? alloca() is a common extension
which just eats extra space off the stack dynamically -- the big advantage
being that you don't have to call free or anything like it, for it to be
properly cleaned up. It will be cleaned up upon the function's return (the
simplest implementation of a "freeall".)

Ok, we can add alloca'ed space is each function invocation, plus an
overhead constant per active alloca.

Would it not make more sense to remove all these details, and just say that
each function creates a macro named _STACK_USAGE_<functionname> which is not
available inside the scope of any function or prior to the definition of the
function? (minus alloca() calls) This would then allow you to track the stack
usage without (incorrectly, I might add) trying to figure it out from details
that are implementation specific?

A sound idea.

Regardless, none of this helps if you have multiple stacks, for example. In
such cases, if *any* stack runs out you are in trouble. But its obvious that
your intention is assume a *unified* stack implementation. The math won't
translate in a portable way.

Unless you allow that I'm trying to define an upper bound of total use,
which may be loose as far as I am concerned. The idiomatic use would be
to compare this to the ULMU_TOTAL to see if it fits. This may yield the
number of bytes on the smallest check for all I care. On those silly
platforms, you will grossly underestimate stack capacity now and then.
Still beats the current situation.

But in implementations that *I* am aware of, the stack size is set at *LINK*
time. So it can't *BE* a macro available at compile time without giving up on
this flexibility. This wouldn't work for me at all, since I *use* this feature
in my own projects. At the very least make this a global variable -- that *IS*
settable at link time.

My wording was chosen explicitly to allow for this. Where am I saying
ULMU_TOTAL evaluates to a constant at compile time?

But *YOU* could have written up a possible coding standard, and explain why you
think such a thing could not work in practice. There are as many coding
standards as programming projects -- picking one is always controvertial and
instantly makes you open for attack.

But I don't understand how this is any different from free form function names.

Well, what more can I say.

You mean like function-name wars? Like string library wars (actually that sort
of is happening ...)? The point is that there's no chance that anyone would
mistake an *extensible* operator as being anything other than it is. To
understand it you have to look up its definition, just like a function.

You are completely ignoring that large population of people who happily use
operator overloading today. You don't understand that what you are saying is
nothing more than your very narrow opinion. And you're using your opinion to
try to trump a proposal whose merits lean very heavily on its technical
features.

My opionion is determined largely by the psychological objections (to
which you don't subscribe). My technical problem is that I can see no
way that this could be implemented.

Yeah we should all rename our functions as func0000, func0001, etc ...

This "experiment" is already ongoing -- notice how BSD has been forked out the
wazoo, how there are about a million regex implementations, and how Microsoft
has "winsock" as an alternative of UNIX's sockets. The world hasn't ended
because of any of this.

It's good to keep some perspective...

Or suppose a programmer uses clock() all his life in DOS/Windows as a higher
resolution realtime, yet portable, timer. This programmer then moves to UNIX
and is shocked to find that all mutexes turn in a clock() measurement of 0
microseconds regardless of how long they block! Even Sleep(1000) measures 0
microseconds!

The example you've made up has to do with basic comprehension of the engineer -
- since "?<" would clearly be one of the "extended operators", the programmer
is supposed to look up the its definition. Just like they would have to look
up logMsg() or any other function name that might show up in more than one
project. Just like they are supposed to magically know how clock() behaves on
different OS'es.

There is a possibility of error caused by bad design (which isn't
trimmed to the type of mistakes we fallible humans make) in your
operator introduction scheme. This is already the case in C++ operator
overloading, your proposal makes it a couple of orders of magnitude worse.

But this is turning into a rehashing exercise. Let's agree to disagree.

I don't think there's a credible assertion that my idea is impossible. You
want me to describe a whole compiler implementation just to support my idea?

I listed some specific objections with regards to disambiguation of the
grammer that the parser would have to be able to handle. I don't expect
an implementation, no... But you haven't even addressed these very basic
questions.

As long as one can create a well-defined definition, why should you expect it
would be impossible? Have I created a halting problem or some kind of NP-
complete algorithm or something? -- I don't think so.

Look up "ambiguous grammer" in the Dragon book, or any other book on
compilers. That should make you think once or twice.

[...] I guess they're pretty satisfied with the bignum
libs that exist, that provide assembly implementations for all important
platforms (and even a slow fallback for others). The reality is that
no-one seems to care except you, on this.

The hardware people care that it exists, and not about the form of its
existence, so long as it gets used. For anyone who wants to *USE* this
functionality, though, they are stuck with assembly, or third party libraries.

What's so bad about using third-party libraries?

Click to expand...

From a usability point of view, GMP is crap, RSA's bignum library is crap (and
its expensive).

Click to expand...

How's that? How are you going to improve things by adding a carry operator?

Click to expand...

In any way I choose -- that's the point. Right now the barrier to entry is
very high, because I would have to implement a competitive library in assembly
language. I'm not kidding when I tell you these libraries are pure excrement.
Not in terms of performance -- they both perform reasonably well -- I mean in
terms of how the heck do I use these things?

Can't speak for RSA's library, but I don't see what your problem is with
GMP.

You see, with language support, an intrepid developer could sacrifice *some*
performance for portability and demonstrate a much nicer and more easily usable
interface. This *competition* might *force* these popular vendors for fix
their interfaces so that they could deliver both the performance and usability
benefits. But because both of these have wads of assembly language for various
architectures, is very difficult and expensive to compete with them. Even if I
made a single x86 implementation, it would be difficult for me to beat their
performance by any measurable margin, and my benefit of having possibly a
better interface would have to be balanced against the fact that I have no
portability, while GMP does (by virtue of having been ported.)

About a year ago, Slumberger gave an amazing talk about implementing pico-java
bytecode into a smart card. It was incredible, because it was obvious that
they would be able to use all the Java tools/libraries, they could simulate it
almost perfectly in software -- mostly inheriting the pico-java architecture,
and because of Java's bignum class, it meant that exposing a widening multiply
and carry propogating add functionality in their smart card would be enough to
automatically have all the software they need for RSA all ready to go.

....So the smartcard presumes it's inserted into a machine that has these
classes available? They don't reside on the card itself.... Well, they
should be using Java then, I suppose. No need to augment C.

Do you understand? Although C compiler porting frameworks exist, the lack of a
standard for widening multiply and carry propogating add means they would have
to expose a mechanism themselves, then write all the software themselves just
get to the point of having a bignum library (which Java basically comes with)
before proceeding to their RSA implementation.

The question remains: how much time is your typical e-commerce server
involved in an actual transaction? Howe much transactions per second can
a machine handle now? I would suppose that most of the time spent in a
transaction is actually a database lookup at a creditcard company, not a
powermod.

Click to expand...

No. Its the powermod. One instruction -- 30% of all processor time. The rest
-- including hitting a disk or talking to some authentication mechanism over a
network fits into that the 70% (though obviously you would expect some amount
of DMA overlapping.) Get it? Powermod is slower than disk swapping -- that's
what Intel was measuring, and from my own calculations, their claim is quite
credible. As I recall, you only need a about a hundred simultaneous
transactions for this to show up. Remember, Intel's marketing point is that
they claim Itanium will save you money by using fewer systems per number of
maximum sustained transactions. In a very real sense, their claim is very
credible, based solely on the fact that they have dual fully pipelined widening
multipliers.

[...] The frontend webserver is just waiting for the bank I think.
But these are just half-educated guesses.

Click to expand...

No -- servers obviously amortize this over multiple threads, and performing
queued up block transfers. While these things take a long time, they are all
queued, blocked, buffered etc. Serial solutions are not exceptable. Your
guess is way off the mark. This wouldn't take any appreciable webserver CPU
power at all.

I don't mean "client load" when I write "transaction load", I'm still
strictly speaking about the server. I could be convinced that the amount
of time spent in powermod for a 1-shot transaction could be as high as
30%, but I'd be hard-pressed to believe a front-end webserver is busy
for more than 10% doing actual transactions.

Click to expand...

No. You misunderstood. 30% of *THE WHOLE THING* is in just the powermod. I'm
talking about serving static webpages while some consumer is staring at
advertising or whatever else before they decide to buy whatever. Just their
action of pressing the "checkout" button to buy the stuff in their shopping
card just pegs the server's CPU.

I'm not at all surprised you have a hard time believing this, as your grasp of
multitasking seems tenuous at best.

Hmmm... sweet

[...] I think the latency of an
e-commerce transaction is mostly caused by frontend<->bank
communication, and intra-bank database lookup. But I cannot back this up.

Click to expand...

The *LATENCY* is something you measure from the client. The latency is not
relevant to the real bottlenecks in the webserver. The client sees the latency
of waiting for the bank, but the server does *NOT* have code like:

while (HeyBankAreYouDoneYet()) {
runSetiAtHome();
}

in it. At the very least there is a blocking operation like "Sleep (1)" in the
place of "runSetiAtHome()". (Hint: the blocking is when a thread comes in
where powermod takes over and spikes the CPU usage.)
Ok.

But for signed numbers, carry is almost meaningless. So what remains of
the semantics in this case?

Click to expand...

Why is the carry meaningless? Ok, you obviously don't know the details of how
bignum is implemented.

You're really better off not inferring what I know and don't know

Ok, a large 2s complement integer is filled with an
array of unsigned sub-integers, except for the one at the top. *That* one is
signed, and that's how you know if the bignum integer is negative or not. The
problem is that the act of addition can cause a roll around at the top word
which requires an expansion of bignum by an extra integer in size. The logic
you use to determine this is just related to examining the carry flag.

Great. Now what has all this to do with defining a sensible signed <+
semantics for use in C (you know, a general purpose language, not tied
to your particular pet peeve problem).

It all just comes down to the carry flag. The other flags are not necessary
for bignum libraries. Maybe there's a way to transform the OF into something
that can be used in a bignum library, but I am not aware of it.

Because, we're talking about expanding the C language here (not about
your fairy-tale operator introduction hocus-pocus). You need decent
semantics.

But if you want the OF, then why not have another operator for that?
?

OF is never the important flag. Its not relevant at all.

I guess that's why everything since the 4004 supports it.

You must really consider that there's more to life than multiplying big
numbers (and that's coming from one of the few people that sees the fun
in it). We're not going to extend C with an operator to fit this (or
ANY) one purpose, however important it may be.

I believe that
taking into account the signedness of the result and/or the operands and one of
these flags is redudant with the other flag (I'm not 100% sure, but the last
time I cared to look at OF, I believe I resolved never to look at it again for
precisely this reason). So while you could map between them, one suddenly
doesn't become more important in different contexts. I *KNOW* the solution for
implementing bignum with a CF alone (and I believe its the standard way
everyone does it), so OF is just not relevant.

[...] On adding two signed numbers,
the carry gets a value "as if" both numbers were unsigned (instead of
the much more useful "overflow" indicator)? This looks like badly
designed semantics.

Click to expand...

It might to you, but the carry is not related to the sign of the operands.
(Your understanding of x86 seems to be shakey.)

Click to expand...

My x86 is not too strong, but I am quite well-versed in 6502, 68k and
PowerPC, which enables me to think about these things. They all share
carry and overflow bits in their designs, with little difference. After
performing an add-with-carry, both CF and OF are set (although the V
flag may not be reset, as in the 6502), but the CF generally does not
provide useful information when adding two numbers that are signed. So I
don't see how

c +< v = a + b

would provide useful information in c, when both a and b are signed, if
it is to copy the carry status.

Click to expand...

In a bignum library, the signedness of each sub-word is *dynamic* (the top word
is signed, the rest are unsigned.) So regardless of the declaration of the
you want the carry flag of an "as if the operands were unsigned".

Ok. That just leaves the case of two signed values, then.

Click to expand...

See above. Carry is not a signedness sensitive semantic.

I'm still wondering about the high_order_value() and low_order_value()
semantics I mentioned in my previous post, actually the part which I was
most curious about.

Click to expand...

I wasn't able to suss out a coherent question from your post, so I just deleted
this discussion.

....And there I was thinking you had trouble answering it. What was I
thinking?

In this case, I am just copying the x86's MUL -> EDX:EAX or
AMD64's MUL -> RDX:RAX semantics.

So again (as with the carry vs. overflow), you're going to perform an
instruction that works on UNsigned integers even on signed integers.
Pardon my French, but that's just plain stupid. Perhaps you should be a
little less confident about your own understanding of bit-twiddling...

But other RISC CPUs usually have MULHI and
MULLO instructions -- they usually use some trickery by which if you issue
these instructions back to back with the same source parameters, then its
actually computed in one widening ALU operation (so that its comparable,
hardware-wise, to what the x86 does.)

Best regards,

Sidney

Paul Hsieh · Dec 11, 2003

Sidney said:
I think there are relatively few excuses for any professional C
programmer not to have access to the standard (and reading it once or
twice wouldn't hurt either). It's only 18 bucks in electronic form,
and you don't even have to go out to get it.

Care to take a quick survey of comp.lang.c patrons who own their own copy of
the standard?

Then the standard would have to specify what it means exactly by a
"resource". Standards are no places for vague terminology.

You mean vague terminologies like "stack"?

[...] If the thing needs to be able to run in
a multithreaded environment, you could even do that, with some
extra effort. You need some form of mutex locking, but you can
resort to Lamport's bakery algorithm for that.

Click to expand...

I had to look this one up -- that's one of the classic useless
non-solutions (especially if you want to call malloc more than 4
billion times, for example), which explains why I've never
committed it to memory.

Click to expand...

We have long longs nowadays. 2^32 is not an important limit.

And long long's are portable?!?! Are you even following your own
train of thought?

Why do you need to distinguish process instances?

Reread the Lamport's bakery algorithm. One of the crucial steps is
"for each process, check their number, and pick the max, then add 1".
Another is "for each process that has a lower number, wait in line
behind them." This means you have a way of knowing how many processes
there are, and how to index them. This means you have to hook out or
wrap whatever "spawn" mechanism you have in your language (or use some
other platform specific mechanism to iterate through all running
threads.)

[...] In your API, each call carries the heap it refers to as a
parameter.

Yes, but if two threads simultaneously call heap functions on the same
heap (heaps are not restricted to being distinct across threads) then
there is a race condition.

If you can explain to me why you need to explicitly distinguish threads
in the memory manager, I'll concede the point.

Go look up the source code for *ANY* multithreading supporting
malloc/free. I absolutely guarantee that every single one of them
uses some kind of platform specific exclusion object to make sure that
two simultaneous calls to the heap do not collide.

One way or another the heap is some kind of quasi-linked list of memory
entries. You should have learned from your OS classes (assuming you passed it
and/or comprehended the material) that all ADTs, including linked-lists have to
be protected by some sort of critical section like mechanism if more than one
thread has access to it. If you do not understand this, then you don't
understand multithreading.

How do you mean? They have to be stored. And I'm presuming they're
stored on the stack.

Most microprocessors have these temporary storage elements called
"registers". They are often not part of any stack structure. Many modern
compilers will map local variables and temporaries to these "registers", thus
not requiring stack storage for them.

What if an implementation uses multiple stacks depending on the
*type* of the parameter?

Click to expand...

The hypothetical implementation [...]

That the ANSI C standard currently does not have any problem with ...

[...] that does this will be out of luck. Not because the proposal
wouldn't work, mind you, but because the ULMU is a very loose bound.
But hey, it beats having no bound whatsoever.

No it doesn't. Dealing with the stack is an area that clearly should be
platform specific. For example, it could be that an embedded system has
fast and slow ram. Certain data types might be best suited to being mapped
into fast RAM, whether they are local or not, but you might have a lot more
slow RAM, so you'd put all the rest of the stuff in there.

Ah yes! So it seems. An interesting problem. Have to think about that
for a bit.

Why don't you think about the implication of the results of this
exercise with respect to your proposal while you are at it.

Ok, we can add alloca'ed space is each function invocation, plus an
overhead constant per active alloca.

The alloca() calls can be in a loop. Knowing how much memory a set of
alloca's will perform in any function is equivalent to solving the
halting problem.

Unless you allow that I'm trying to define an upper bound of total use,
which may be loose as far as I am concerned. The idiomatic use would be
to compare this to the ULMU_TOTAL to see if it fits. This may yield the
number of bytes on the smallest check for all I care. On those silly
platforms, you will grossly underestimate stack capacity now and then.
[...] Still beats the current situation.

This versus just living with runtime stack checking? I'll take the
runtime stack checking.

My opionion is determined largely by the psychological objections (to
which you don't subscribe.)

Says who? You see, unlike some people, by idea isn't driven by an
opinion. Its driven by a consideration for design. Since you *CAN'T*
add just any old operator to satisfy everyone, then how do you satisfy
the demand for more operators? Some variation on edefinable
operators or operator overloading is the logical conclusion -- just
fix its problems (since type-index functions don't really exist in C,
so you just add in a bunch of them so that you don't lose any of the
ones you have). For all I know, I might not ever use such a feature,
even if it *WERE* to be endorsed by the standard.

My technical problem is that I can see no way that this could be
implemented.

Well I don't know what to make of this -- you can't see a race
condition when its staring you in the face, and you can't imagine
problems with knowing the size of a function's stack usage.

There is a possibility of error caused by bad design (which isn't
trimmed to the type of mistakes we fallible humans make) in your
operator introduction scheme.

Same is true with free form function names, of course.

I listed some specific objections with regards to disambiguation of
the grammer that the parser would have to be able to handle.

Even without a full grammar being specified? That's a pretty good trick.

...So the smartcard presumes it's inserted into a machine that has these
classes available?

No ...

[...] They don't reside on the card itself....

Yes they do. That's the point of creating a pico-java assembly
language. Its so that you just have to upload a minimal java kernel to
it, and all of a sudden you have the whole world's Java libraries at your
disposal.

[...] Well, they
should be using Java then, I suppose. No need to augment C.

You misunderstand the point. Most of this kind of development
actually *does* happen in C + assembly language. Slumberger just
demonstrated that using Java is far superior, for the more general
reason that they get to inherit the entire Java infrastructure to help
them design their smart card software. But hidden in all of this is
the fact that C is further penalized quite specifically on the issue
of bignums which become a huge pain in the ass to deal with. There is
*NO* infrastructure for porting bignum functionality in C, using
embedded frameworks, gcc, or otherwise. Its all just "roll your own"
territory.

Great. Now what has all this to do with defining a sensible signed <+
semantics for use in C (you know, a general purpose language, not tied
to your particular pet peeve problem).

Oh yeah my particular pet peeve problem, which only every serious CPU
company on the planet is only too willing to solve for me.

Because, we're talking about expanding the C language here (not about
your fairy-tale operator introduction hocus-pocus). You need decent
semantics.

CPU companies have laid out the decent semantics. I and others have
written bignum routines -- the carry flag is where its at as far as
this is concerned.

I guess that's why everything since the 4004 supports it.

That's because OF is used for <,>,<=,>= comparisons. Outside of
this, I have never seen OF being used for any purpose. I.e., I would
surmise that the only relevant usage of OF is already encoded in the
language semantics of C (and just about any other language with
integers.) But I could be wrong -- can you name at least one other

purpose which isn't really just an encoding of said:
You must really consider that there's more to life than multiplying big
numbers (and that's coming from one of the few people that sees the fun
in it). We're not going to extend C with an operator to fit this (or
ANY) one purpose, however important it may be.

Carry flag comes up time and again. Just look it up on google:

http://www.google.com/search?hl=en&lr=&ie=ISO-8859-
1&safe=off&c2coff=1&q=jc+ret+mov

Off the top of my head, there's chain shifting, bit counting, and
predication/masking.

...And there I was thinking you had trouble answering it. What was I
thinking?

So again (as with the carry vs. overflow), you're going to perform an
instruction that works on UNsigned integers even on signed integers.
Yes.

Pardon my French, but that's just plain stupid. Perhaps you should be a
little less confident about your own understanding of bit-twiddling...

Perhaps you should consider that there's a reason every assembly
language in existence associated signedness with particular
*OPERATIONS* not the *OPERANDS*. Having signed and non-signed versions of
every operator may be something that fit C when it was first designed, but that
doesn't make it necessarily correct for all relevant computations.

CBFalconer · Dec 11, 2003

Paul said:
That's because OF is used for <,>,<=,>= comparisons. Outside of
this, I have never seen OF being used for any purpose. I.e., I
would surmise that the only relevant usage of OF is already
encoded in the language semantics of C (and just about any other
language with integers.) But I could be wrong -- can you name
at least one other purpose which isn't really just an encoding
of <,>,<=,>= ?

How about its named purpose, detection of overflow? I have
written x86 code generators that emitted INTO after every signed
integer operation.

Instead of these monstrous articles dealing with every aspect of
everything under the sun, I suggest you cut out single themes,
adjust the subject to suit, and deal with (or argue about) them
one at a time. I see something like seven or eight articles here
between 500 and 1000 lines long, which is ridiculous.

Randy Howard · Dec 11, 2003

Care to take a quick survey of comp.lang.c patrons who own their own copy of
the standard?

I bought the 18 buck pdf version quite some time ago, and more recently
the hardback version including the rationale and TC1 published by Wiley
& Sons. Most the people I work with have at least seen it, own one of
the preliminary drafts, or have sprung for the $18. If you are
claiming to be a professional C programmer that can't afford $18,
you're lying to yourself.

And long long's are portable?!?! Are you even following your own
train of thought?

You are right about that, given the lack of gcc everywhere, or a
C99 compiler proper. However, it is indeed trivial to use a couple
typedef's in a header file, say "bigtypes.h" or something, that
define U64 and I64 in a platform-dependent way *IFF* it's not a
C99 implementation that does not support long long directly. A
simple #ifdef chain with an error directive at the end if the
platform isn't yet supported will make the porting effort for a
new platform (which has a local way to support 64-bit types) a
matter of a few minutes, if not seconds.

Chris Torek · Dec 11, 2003

(e-mail address removed) says...

I bought the 18 buck pdf version quite some time ago, and more recently
the hardback version including the rationale and TC1 published by Wiley
& Sons. ...

I still have not paid for the C99 standard, but do not need it yet
as we only claim to support C89 (not even "C95", with the various
addenda). I did buy the dead-tree edition of the 1989 C standard
some time in the early 1990s, back when it cost $75 and came with
the Rationale, and before the various "TC"s and "NA"s.

(I use an out-of-date C99 draft for online "quick reference" as
needed. I really should just get the $18 PDF version, but I find
on-line editions of technical documentation far less useful, as
yet, than printed editions. PDF is not as portable as it should
be; displays are just not high enough resolution to make reading
easy; marking pages does not work as well; and they are not
comfortable to take to the restrooms.

)

Sidney Cadot · Dec 11, 2003

Paul Hsieh wrote:

(I'm snipping superfluous bits, I propose we concentrate on the
important things and leave some other things aside, to keep this
manageable...)

Sidney said:
Sidney said:

Paul Hsieh wrote:
[snip]

I think there are relatively few excuses for any professional C
programmer not to have access to the standard (and reading it once or
twice wouldn't hurt either). It's only 18 bucks in electronic form,
and you don't even have to go out to get it.

Click to expand...

Care to take a quick survey of comp.lang.c patrons who own their own copy of
the standard?

Sure, I think you'd be surprised.

What is your excuse for not having the standard? A free draft (N869) is
also available.

You mean vague terminologies like "stack"?

No, I meant "resource".

There doesn't have to be any mention of "stack" in what I try to
propose. Obviously, active function invocations consume memory - that's
what I would like to see formalized.

And long long's are portable?!?! Are you even following your own
train of thought?

You really seem a bit out of touch with current events... Since you
don't seem to have access to the C99 standard, here's the relevant quote:

6.2.5 clause #4:

"There are five standard signed integer types, designated as signed
char, short int, int, long int, and long long int. (These and other
types may be designated in several additional ways, as described in
6.7.2.) There may also be implementation-defined extended signed integer
types. The standard and extended signed integer types are collectively
called signed integer types."

(somewhat further down, the unsigned long long int is defined).

Both must be able to represent at least 64 bit values (-2^63..2^63-1,
0..2^64-1, respectively).

Reread the Lamport's bakery algorithm. One of the crucial steps is
"for each process, check their number, and pick the max, then add 1".
Another is "for each process that has a lower number, wait in line
behind them." This means you have a way of knowing how many processes
there are, and how to index them. This means you have to hook out or
wrap whatever "spawn" mechanism you have in your language (or use some
other platform specific mechanism to iterate through all running
threads.)

Could you provide a reference to the description of the algorithm you're
using? I'm not saying it's incorrect or anything, it's just that I'd
like to respond in the same terminology.

[...] In your API, each call carries the heap it refers to as a
parameter.

Click to expand...

Yes, but if two threads simultaneously call heap functions on the same
heap (heaps are not restricted to being distinct across threads) then
there is a race condition.

So we need to put a mutex around the critical section, no big deal. All
hinges now on whether the Lamport algorithm needs process id's.
Fortunately, there's no subjective element in that, we should be able to
sort that out.

Go look up the source code for *ANY* multithreading supporting
malloc/free. I absolutely guarantee that every single one of them
uses some kind of platform specific exclusion object to make sure that
two simultaneous calls to the heap do not collide.

Sure. Mutexes would suffice and I contend the bakery algorithm can
provide them. I feel rather silly getting dragged in this line of
discussion (multithreading has little to do with the C execution model)
but we'll see where this leads us.

One way or another the heap is some kind of quasi-linked list of memory
entries. You should have learned from your OS classes (assuming you passed it
and/or comprehended the material)

Could you refrain from the silly ad hominems? I am trying to take you
seriously, even though you seem to deny the importance of standards, and
even though you are advocating an (in my opinion) confused "operator
introduction" feature. Maintaining a basic level of respect helps in any
discussion.

Most microprocessors have these temporary storage elements called
"registers". They are often not part of any stack structure. Many modern
compilers will map local variables and temporaries to these "registers", thus
not requiring stack storage for them.

Not an issue, as I try to establish an upper bound. The fact that the
implementation can do better by using registers is of no concern.

What if an implementation uses multiple stacks depending on the
*type* of the parameter?

Click to expand...

The hypothetical implementation [...]

Click to expand...

That the ANSI C standard currently does not have any problem with ...

[...] that does this will be out of luck. Not because the proposal
wouldn't work, mind you, but because the ULMU is a very loose bound.
But hey, it beats having no bound whatsoever.

Click to expand...

No it doesn't. Dealing with the stack is an area that clearly should be
platform specific. For example, it could be that an embedded system has
fast and slow ram. Certain data types might be best suited to being mapped
into fast RAM, whether they are local or not, but you might have a lot more
slow RAM, so you'd put all the rest of the stuff in there.

To what does "no it doesn't" refer?

How does your example invalidate the "upper bound" idea?

Why don't you think about the implication of the results of this
exercise with respect to your proposal while you are at it.

If no way can be found to address this, we can no longer give an upper
bound, and the proposal is dead in the water. However, I'm not conviced
it can't be handled. For one thing, your proposal to make macro's
available outside the function scope that give the functions stack usage
upper bound may come in handy.

The alloca() calls can be in a loop. Knowing how much memory a set of
alloca's will perform in any function is equivalent to solving the
halting problem.

So what? Why is this relevant? Most of the time you can still give an
upper bound. In cases where you can't (few and far inbetween) you don't
have the guarantee, which is a pity; the code can no longer be written
in a way that's fully portable. It would still be much better than what
happens now (standard-compliant programs segfaulting without so much as
a hint in the standard).

Unless you allow that I'm trying to define an upper bound of total use,
which may be loose as far as I am concerned. The idiomatic use would be
to compare this to the ULMU_TOTAL to see if it fits. This may yield the
number of bytes on the smallest check for all I care. On those silly
platforms, you will grossly underestimate stack capacity now and then.
[...] Still beats the current situation.

Click to expand...

This versus just living with runtime stack checking? I'll take the
runtime stack checking.

Then define a proposal to formalize "runtime stack checking" in
standard-like language.

[...]
My opionion is determined largely by the psychological objections (to
which you don't subscribe.)

Click to expand...

Says who? You see, unlike some people, by idea isn't driven by an
opinion. Its driven by a consideration for design. Since you *CAN'T*
add just any old operator to satisfy everyone, then how do you satisfy
the demand for more operators? Some variation on redefinable
operators or operator overloading is the logical conclusion -- just
fix its problems (since type-index functions don't really exist in C,
so you just add in a bunch of them so that you don't lose any of the
ones you have). For all I know, I might not ever use such a feature,
even if it *WERE* to be endorsed by the standard.

I fail to see the point you want to make here.

Well I don't know what to make of this -- you can't see a race
condition when its staring you in the face, and you can't imagine
problems with knowing the size of a function's stack usage.

Be careful not to presume things that are not there. Other than that,
make of it what you like.

[...]
There is a possibility of error caused by bad design (which isn't
trimmed to the type of mistakes we fallible humans make) in your
operator introduction scheme.

Click to expand...

Same is true with free form function names, of course.

The risks of errors are minute, in comparison.

Even without a full grammar being specified? That's a pretty good trick.

Yes, it's called "abstract thinking". It's a cousin of "thought experiment".

[smartcards]
Yes they do. That's the point of creating a pico-java assembly
language. Its so that you just have to upload a minimal java kernel to
it, and all of a sudden you have the whole world's Java libraries at your
disposal.

How much would these combined libraries consume, in terms of memory on
the smartcard? I mean, the bignum-multiply code has to reside somewhere.

[...]
Great. Now what has all this to do with defining a sensible signed <+
semantics for use in C (you know, a general purpose language, not tied
to your particular pet peeve problem).

Click to expand...

Oh yeah my particular pet peeve problem, which only every serious CPU
company on the planet is only too willing to solve for me.

You haven't answered the question.

CPU companies have laid out the decent semantics. I and others have
written bignum routines -- the carry flag is where its at as far as
this is concerned.

That's fine. Now here's a big idea: hardware design and general-purpose
language design are two different things.

[...]
OF is never the important flag. Its not relevant at all.

Click to expand...

I guess that's why everything since the 4004 supports it.

Click to expand...

That's because OF is used for <,>,<=,>= comparisons. Outside of
this, I have never seen OF being used for any purpose. I.e., I would
surmise that the only relevant usage of OF is already encoded in the
language semantics of C (and just about any other language with
integers.) But I could be wrong -- can you name at least one other
purpose which isn't really just an encoding of <,>,<=,>= ?

Adding, subtracting, multiplying, dividing. Basically anything
arithmetic you can do that can map the result of an operation on signed
ints to a value that doesn't correspond to the mathematical
interpretation of what the result should be.

Carry flag comes up time and again. Just look it up on google:

http://www.google.com/search?hl=en&lr=&ie=ISO-8859-
1&safe=off&c2coff=1&q=jc+ret+mov

That seems to yield some results concerning intel assembly. Interesting,
but how does that address my point?

Off the top of my head, there's chain shifting, bit counting, and
predication/masking.

You don't have to convince me that carry flags are important. At times I
have found myself in want of a way to access them myself, from C.
However, the point is that you would have to find a good semantics, that
fits in with the rest of C, not some ad-hoc operator to make some bignum
implementors happy.

[...]

In this case, I am just copying the x86's MUL -> EDX:EAX or
AMD64's MUL -> RDX:RAX semantics.

Click to expand...

So again (as with the carry vs. overflow), you're going to perform an
instruction that works on UNsigned integers even on signed integers.

Click to expand...

Yes.

That doesn't rhyme well with C. There are many subtle issues with
signed/unsigned representation and operations, and the standard is
careful to find a compromise; not assuming too much on the architecture
side, while at the same time maintaining a sound semantics for both
signed and unsigned operations. Your semantics just doesn't fit in.

Perhaps you should consider that there's a reason every assembly
language in existence associated signedness with particular
*OPERATIONS* not the *OPERANDS*.

Are you aware that the C standard allows for other representations than
2-complement? Assume 1-complement representation for a moment (the
standard is specifically worded to allow this).

10000001 - 00000001 yields 10000010 if '-' is a signed operator
10000001 - 00000001 yields 10000000 if '-' is an unsigned operator

Now I don't know assembly for any 1-complement architecture, but from
this it seems to me that there will have to be different signed and
unsigned versions of 'SUB'.

Having signed and non-signed versions of
every operator may be something that fit C when it was first designed, but that
doesn't make it necessarily correct for all relevant computations.

Well, if (and that's a big if) something resembling your wide-mul would
ever make it in the standard, I'd be disappointed if it didn't for
instance allow me to multiply two big signed integers. It would be messy
to only support unsigned ones.

Best regards,

Sidney

Ben Pfaff · Dec 11, 2003

Chris Torek said:
(I use an out-of-date C99 draft for online "quick reference" as
needed. I really should just get the $18 PDF version, but I find
on-line editions of technical documentation far less useful, as
yet, than printed editions. PDF is not as portable as it should
be; displays are just not high enough resolution to make reading
easy; marking pages does not work as well; and they are not
comfortable to take to the restrooms. )

I use a C99 PDF converted to text format most of the time. The
formatting isn't perfect, but it is good enough for most purposes
most of the time, and it is easily searchable.

Paul Hsieh · Dec 12, 2003

Sidney said:
Paul said:

Sidney said:

Paul Hsieh wrote:
[snip]
I think there are relatively few excuses for any professional C
programmer not to have access to the standard (and reading it once
or twice wouldn't hurt either). It's only 18 bucks in electronic
form, and you don't even have to go out to get it.

Click to expand...

Care to take a quick survey of comp.lang.c patrons who own their
own copy of the standard?

Click to expand...

Sure, I think you'd be surprised.

Well the few people in here who have the standard have chimed in. Now
how do you supposed we convince everyone else in the group to chime
about the fact that they don't have it? (Look up the term "push poll"
if you don't understand what I mean.)

What is your excuse for not having the standard? A free draft (N869)
is also available.

The standard does not tell me how to do the job of coding. Although
it may cause the occasional portability problem, having the raw source
of the C library is surprisingly more useful. For example, please
locate for me where in the standard it says that fgets() ignores '\0'
that are consumed -- then locate where in the standard it says its the
only string function which does this. Then tell me where in the
standard it says that strtok is not re-enterable.

You really seem a bit out of touch with current events... Since you
don't seem to have access to the C99 standard, here's the relevant
quote:

Excuse me?!?! *C99* is not portable.

Could you provide a reference to the description of the algorithm
you're using?

Typed it into google and felt lucky. Its the first hit. Notice
language like "Process i", and statements like for (j=0; j < n; j++),
where each j indexes a process id, and n is the number of processes.
I tried to think up a way of implementing that algorithm without
specifically requiring an enumeration of processes, but it seems that
that is central to the way the algorithm works. Which is only *one*
of the reasons why that algorithm is worthless.

[...] I'm not saying it's incorrect or anything, it's just that I'd
like to respond in the same terminology.

[...] In your API, each call carries the heap it refers to as a
parameter.

Click to expand...

Yes, but if two threads simultaneously call heap functions on the same
heap (heaps are not restricted to being distinct across threads) then
there is a race condition.

Click to expand...

So we need to put a mutex around the critical section, no big deal.

It *is* a big deal. Mutexes, and other similar multithreading objects
are not expressible is a portable way in language like C or C++.

[...] All hinges now on whether the Lamport algorithm needs process
id's. Fortunately, there's no subjective element in that, we should
be able to sort that out.\0

What I don't understand is how is it that *YOU* brought up the
algorithm, and yet you are unaware of its fundamental functionality.
It only took me only 30 seconds to understand it.

I would say that writing a more powerful heap manager is possible in
principle, even in a multithreaded environment, as long as you have a
well-implemented malloc() (and friends) at your disposal. [...]
It would be a drag to do so and it would be slow, but it is possible.

Impossible. Slow, fast or otherwise. Its just impossible.

If you can explain to me why you need to explicitly distinguish threads
in the memory manager, I'll concede the point.

Click to expand...

Click to expand...

Go look up the source code for *ANY* multithreading supporting
malloc/free. I absolutely guarantee that every single one of them
uses some kind of platform specific exclusion object to make sure that
two simultaneous calls to the heap do not collide.

Click to expand...

Sure. Mutexes would suffice and I contend the bakery algorithm can
provide them.

Well, but that's not my contention, and I would appreciate that you not put
words in my mouth, or follow the discussion (whichever one is causing
you do write things like this) -- my contention is that Lamport's
Bakery algorithm is not portable (I mean relative to C, which doesn't
expose and multithreading functionality, even if implementations do.)
Then again, once you open it up to non-portable code, of course,
nobody will touch Lamport's Bakery algorithm with a 10 ft pole -- its
worthless when you have far superior mechanisms for creating Mutexes
at your disposal.

[...] I feel rather silly getting dragged in this line of
discussion (multithreading has little to do with the C execution
model) but we'll see where this leads us.

Then let me review the facts for you. Multithreading is inextricably
intertwined with the implementation of malloc. Thus if a compiler
vendor wants to provide a multithreading extension to C (which they
pretty much all do) discussions of implementation of malloc, or
malloc-like functions have to at least acknowledge this as a
consideration. There is no such thing as a portable mutex in pure C
that actually encodes all the expected notions of a mutex unless tied
specifically to the extension mechanisms that are used to provide the
multitasking in the first place. The inevitable conclusion is that
its impossible to make a portable library for extending the heap
functions that are multithreading safe.

Could you refrain from the silly ad hominems? I am trying to take you
seriously, even though you seem to deny the importance of standards,
and even though you are advocating an (in my opinion) confused
"operator introduction" feature. Maintaining a basic level of
respect helps in any discussion.

Well, if you are sensitive about being admonished in public, then can
I ask that you have a little bit of intellectual honesty? I am trying
to put forward hard core analytical ideas, and I have made efforts to
distill your own proposals is a very serious way. The response is not
supposed to be "oh but you can't prove its impossible to implement".
This is like Ari Fleischer telling the anti-war sympathizers that its
up to them to prove that there are no WMDs in Iraq. If you have a
problem with what I am saying why can't you limit it to an analytical
response?

See the problem with your suggestion of using the Lamport's bakery
algorithm, is not just that it obviously doesn't satisfy the
conditions of the original problem (portably extending malloc/etc),
but that the next thing you are going to do is throw some other
algorithm at it that also won't work in some vain attempt to throw me
off yet again. You did it full well expecting that I wouldn't chase
it down and find the flaw. Are you going to do it again, or are you
going at least take my arguments more seriously?

You see its not *UP* to me to analyze bogus algorithms to prove that
you can't implement a mutex portably. That's retarded -- even if I do,
its not actual proof of anything. If you have a serious contention
against my very technical claim, then don't you think that its at
least somewhat incumbant upon your to wade through the details
yourself?

What if an implementation uses multiple stacks depending on the
*type* of the parameter?

The hypothetical implementation [...]

Click to expand...

That the ANSI C standard currently does not have any problem with ...

[...] that does this will be out of luck. Not because the proposal
wouldn't work, mind you, but because the ULMU is a very loose bound.
But hey, it beats having no bound whatsoever.

Click to expand...

No it doesn't. Dealing with the stack is an area that clearly
should be platform specific. For example, it could be that an
embedded system has fast and slow ram. Certain data types might
be best suited to being mapped into fast RAM, whether they are
local or not, but you might have a lot more slow RAM, so you'd put
all the rest of the stuff in there.

Click to expand...

To what does "no it doesn't" refer?

The last statement "... it beats having no bound ..."

How does your example invalidate the "upper bound" idea?

Look, if you have two stacks, then the right way to bound it is to
have two bounds. If one of your stacks has very limited amount of
memory (fast RAM is likely to be more expensive) then in order to
properly implement a single bound (you proposal) you need to assume
the worst case and say that you only have as much stack space of the
smaller stack (otherwise it wouldn't be a correct bound.) This is
completely worthless to systems that want to expose really really deep
call stacks but fairly shallow floating point stacks, for example
(look up the Ackerman function again, and try to imagine attempts at
mathematical simplifications, to be able to output at least *some* of
the reasonable computable values in reasonable time.)

If no way can be found to address this, we can no longer give an upper
bound, and the proposal is dead in the water. However, I'm not conviced
it can't be handled.

I can see once again I'll be trying to prove that the are no WMDs in
Iraq ...

[...] For one thing, your proposal to make macro's available outside
the function scope that give the functions stack usage upper bound
may come in handy.

See how that worked out? You had a proposal, I found something wron
with it, so I offered something *CONSTRUCTIVE* to try to save it,
rather than pretending it would fail because of an issue that I can
work around even if you might have missed it. Its called intellectual
honesty.

So what? Why is this relevant? Most of the time you can still give an
upper bound.

Perhaps you do not know the true nature of the halting problem. You
can't know if a piece of code ends, which is equivalent to knowing if
it takes one branch or another, or how many times it executes a given
code fragment. There's no shortage of open mathematical problems for
which the output of a given function is just completely unboundable,
calculatable, or estimatable (the famous "+3 on odd, /2 if even"
problem which I can't recall the details of comes to mind) with any
current knowledge.

[...] In cases where you can't (few and far inbetween) you don't
have the guarantee, which is a pity; the code can no longer be
written in a way that's fully portable. It would still be much
better than what happens now (standard-compliant programs
segfaulting without so much as a hint in the standard).

It means you are expecting compilers to come as close to solving the
halting problem as they can. And you think variable length operators
based on a grammar is an impossible, intractible problem in
practice?!?!

Unless you allow that I'm trying to define an upper bound of total
use, which may be loose as far as I am concerned. The idiomatic
use would be to compare this to the ULMU_TOTAL to see if it fits.
This may yield the number of bytes on the smallest check for all I
care. On those silly platforms, you will grossly underestimate
stack capacity now and then. [...] Still beats the current
situation.

Click to expand...

This versus just living with runtime stack checking? I'll take the
runtime stack checking.

Click to expand...

Then define a proposal to formalize "runtime stack checking" in
standard-like language.

The implementation may exit or take any other defined action which halts
the execution on any function call. But it may not create undefined
behaviour from the action of performing the call itself alone.

There's a proposal that's compatible with many implementations today
(if we presume that turning stack checking *off* would be an
extension, and that leaving it on is the default conforming behavior)
and should be easy to understand and implement.

[...]
My opionion is determined largely by the psychological objections (to
which you don't subscribe.)

Click to expand...

Click to expand...

Says who? You see, unlike some people, by idea isn't driven by an
opinion. Its driven by a consideration for design. Since you *CAN'T*
add just any old operator to satisfy everyone, then how do you satisfy
the demand for more operators? Some variation on redefinable
operators or operator overloading is the logical conclusion -- just
fix its problems (since type-index functions don't really exist in C,
so you just add in a bunch of them so that you don't lose any of the
ones you have). For all I know, I might not ever use such a feature,
even if it *WERE* to be endorsed by the standard.

Click to expand...

I fail to see the point you want to make here.

Just that I am trying analyze problems, and you are just spouting
opinion.

Well I don't know what to make of this -- you can't see a race
condition when its staring you in the face, and you can't imagine
problems with knowing the size of a function's stack usage.

Click to expand...

Be careful not to presume things that are not there. [...]

Yeah, thanks for the warning. I'll be sure to purchase some duct tape
to deal with my neighbors who might be terrorists too.

The race is there. I reduced it to the standard notions that you are
supposed to learn about in an OS class. I've explained everything you
need to know to understand its there. Any research you do into the
subject will confirm what I'm saying. But I'm supposed to worry that
I might be wrong based on your flimsy non-arguments.

The risks of errors are minute, in comparison.

An opinion you will back up how ... ?

Yes, it's called "abstract thinking". It's a cousin of "thought
experiment".

Yeah, well believing that there are parsable grammars doesn't take very much
"abstract thinking", there millions of examples.

[smartcards]
Yes they do. That's the point of creating a pico-java assembly
language. Its so that you just have to upload a minimal java kernel to
it, and all of a sudden you have the whole world's Java libraries at your
disposal.

Click to expand...

How much would these combined libraries consume, in terms of memory on
the smartcard? I mean, the bignum-multiply code has to reside somewhere.

If you have a widening multiply, a bignum multiply is tiny. As a WAG,
maybe 30 instructions or so? Obviously the don't *include* every
library -- I imagine they need the Java GUI for a smart card. Look
I'm not going to defend them, *they* did it, not me. Go look this stuff up
yourself:

http://www.google.com/search?hl=en&ie=UTF-
8&c2coff=1&safe=off&q=Michael+Montgomery+Java+smart+card&spell=1

That's fine. Now here's a big idea: hardware design and general-purpose
language design are two different things.

And so which category do you think "bignum libraries" falls into? I
mean given that every platform supports it, and that in fact an
implementation (the GMP) purports to be a software library with a
portable interface up to software, supporting just about every popular
desktop/server/workstation platform under the sun.

[...]
OF is never the important flag. Its not relevant at all.
I guess that's why everything since the 4004 supports it.

Click to expand...

Click to expand...

That's because OF is used for <,>,<=,>= comparisons. Outside of
this, I have never seen OF being used for any purpose. I.e., I would
surmise that the only relevant usage of OF is already encoded in the
language semantics of C (and just about any other language with
integers.) But I could be wrong -- can you name at least one other
purpose which isn't really just an encoding of <,>,<=,>= ?

Click to expand...

Adding, subtracting, multiplying, dividing. Basically anything
arithmetic you can do that can map the result of an operation on signed
ints to a value that doesn't correspond to the mathematical
interpretation of what the result should be.

No, we all understand what produces an OF, I mean, name an actual
agorithm that uses it, that can't extract the semantics from one of

[...]
In this case, I am just copying the x86's MUL -> EDX:EAX or
AMD64's MUL -> RDX:RAX semantics.

So again (as with the carry vs. overflow), you're going to perform an
instruction that works on UNsigned integers even on signed integers.

Click to expand...

Yes.

Click to expand...

That doesn't rhyme well with C.

Well that's right -- *bignums* don't rhyme well with C. *STRINGS*
don't rhyme well with C. *BITS* don't rhyme well with C. That
doesn't mean its not worth having them.

[...] There are many subtle issues with
signed/unsigned representation and operations, and the standard is
careful to find a compromise; not assuming too much on the
architecture side, while at the same time maintaining a sound
semantics for both signed and unsigned operations. Your semantics
just doesn't fit in.

There's your deep analysis again "... just doesn't fit in ...". I
just can't argue with that. Wow, I guess I am just wrong.

Are you aware that the C standard allows for other representations
than 2-complement? Assume 1-complement representation for a moment
(the standard is specifically worded to allow this).

Yes, but the semantics of what a carry is, is not crucially tied to
this. Even if you use ones complement, or something else, you can
still *compute* the carry, even if you don't automatically have it from
the built-in representation.

Well, if (and that's a big if) something resembling your wide-mul
would ever make it in the standard, I'd be disappointed if it didn't
for instance allow me to multiply two big signed integers. It would
be messy to only support unsigned ones.

But I didn't precisely say that. In fact losing the (un)signedness
semantics is really something you need for the carry flag idea. The
"almost like widening multiplies", that exist in C retain the sign as
you would expect, and obviously you probably wouldn't want to change
that in an extension to C.

pete · Dec 12, 2003

Paul said:
Sidney said:

Paul said:

Sidney Cadot wrote:
Paul Hsieh wrote:
[snip]
I think there are relatively few excuses for any professional C
programmer not to have access to the standard (and reading it once
or twice wouldn't hurt either). It's only 18 bucks in electronic
form, and you don't even have to go out to get it.

Care to take a quick survey of comp.lang.c patrons who own their
own copy of the standard?

Click to expand...

Sure, I think you'd be surprised.

Click to expand...

Well the few people in here who have the standard have chimed in. Now
how do you supposed we convince everyone else in the group to chime
about the fact that they don't have it? (Look up the term "push poll"
if you don't understand what I mean.)

What is your excuse for not having the standard? A free draft (N869)
is also available.

Click to expand...

That document gets referenced here from time to time.

Searched Groups for N869 group:comp.lang.c.*
Results 1 - 10 of about 2,200. Search took 1.30 seconds.

The standard does not tell me how to do the job of coding.
Although
it may cause the occasional portability problem, having the raw source
of the C library is surprisingly more useful. For example, please
locate for me where in the standard it says that fgets() ignores '\0'
that are consumed -- then locate where in the standard it says its the
only string function which does this. Then tell me where in the
standard it says that strtok is not re-enterable.

N869
7.1.4 Use of library functions
[#4] The functions in the standard library are not
guaranteed to be reentrant and may modify objects with
static storage duration.

Richard Heathfield · Dec 12, 2003

Paul said:
Well the few people in here who have the standard have chimed in.

Don't be ridiculous.

CBFalconer · Dec 12, 2003

Paul said:
Sidney said:

[snip]

Click to expand...

Click to expand...

What is your excuse for not having the standard? A free draft
(N869) is also available.

Click to expand...

The standard does not tell me how to do the job of coding.
Although it may cause the occasional portability problem, having
the raw source of the C library is surprisingly more useful. For
example, please locate for me where in the standard it says that
fgets() ignores '\0' that are consumed -- then locate where in
the standard it says its the only string function which does
this. Then tell me where in the standard it says that strtok is
not re-enterable.

Once more, please cut these monstrous postings down to size and
deal with one thing at a time, using suitable subjects.

All those characteristics are not mandated. However it is hard to
see what use any data would be after the end marking '\0', nor any
prohibition against a re-enterable strtok if you can devise a way
to do so. You are told that it saves a pointer, and that the way
to avoid reinitializing that pointer is to pass in NULL. Unless
you have a DWIM machine I fail to see any other basis for
restoring the saved pointer.

Mark F. Haigh · Dec 12, 2003

Paul Hsieh wrote:

Well the few people in here who have the standard have chimed in. Now
how do you supposed we convince everyone else in the group to chime
about the fact that they don't have it? (Look up the term "push poll"
if you don't understand what I mean.)

The standard does not tell me how to do the job of coding.

I have an idea: why don't you quit being such a whiner and go buy a
copy? Or, for God's sake, simply download a *free* draft of it?

> Although
> it may cause the occasional portability problem, having the raw source
> of the C library is surprisingly more useful. For example, please
> locate for me where in the standard it says that fgets() ignores '\0'
> that are consumed -- then locate where in the standard it says its the
> only string function which does this. Then tell me where in the
> standard it says that strtok is not re-enterable.

If you had a copy of the standard then you wouldn't need to ask anybody
else to look things up for you, would you?

Mark F. Haigh
(e-mail address removed)

Is C99 C?	46	Aug 3, 2009
Weird Behavior with Rays in C and OpenGL	4	Feb 13, 2024
What's going on with C Compilers and C99??	30	Mar 28, 2007
What is AI programming to us non-bigtech programmers?	4	Jun 1, 2023
Drawing missing in bitmap in a pure C win32 program	4	Jun 3, 2023
"The Unreasonable Effectiveness of C"	12	May 30, 2014
The Underhanded C Contest is back	2	Apr 2, 2013
Copy string from 2D array to a 1D array in C	1	Nov 1, 2023

Is C99 the final C?

Paul Hsieh

Paul Hsieh

Paul Hsieh

Chris Torek

Sidney Cadot

Sidney Cadot

glen herrmannsfeldt

Paul Hsieh

Sidney Cadot

Paul Hsieh

CBFalconer

Randy Howard

Chris Torek

Sidney Cadot

Ben Pfaff

Paul Hsieh

pete

Richard Heathfield

CBFalconer

Mark F. Haigh

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads