Is C99 the final C?

G

goose

Sidney Cadot said:
Well, I used a and b as stand-ins for "any two expressions".

(using indentation to make it easier to read)

<nit> :)
this should be here
said:
> By the way, got any funny looks talking to people today?
>
> (You forgot the </nit>)

then said:
>
> Best regards,
>
> Sidney

</nit>

</nit> (closing <nit> of previous post :)

goose,
the guys here at the office are already used
to me quoting parts of the standard at them.
 
P

Paul Hsieh

Sidney Cadot said:
That looks like a decent first stab at a proposed syntax. Some question
though:

- What would be the constraints on acceptable operator names?

As I said it would be defined in some sort of grammar. I am not a
parsable language expert but basically it has to allow for things
like:

<<<, ?<, ===, ^^, |||, &&&, etc

but disallow things like:

+++, ---, ~~, !!, **, etc

due to parsing ambiguities with existing semantics.

Perhaps as an operator gains unary attributes, tacking things onto the
end of it becomes a syntax error as well.
- In what way will you handle the possible introduction of ambiguity in
the parser, that gets to parse the new tokens?

By defining a good grammar.
- What if I want a ?< to work both on int and on double types?

specific type overloading, as is done in C++?
- How are you going to define left, right, or lack of associativity?

- does (and if so, how) does your syntax allow the introduction of unary
prefix operators (such as !), binary infix operators that may have
compile-time identifiers as a parameter (such as ->), n-ary operators
(such as the ternary a?b:c or your proposed quaternary carry/add
operator), and operators that exist both in unary and binary form (+, -)?

The easiest way to handle these issues is to allow an operator inherit
the attributes of a previously defined operator. So we add in the
"like" clause:

int _Operator ?< after + like + (int x, int y) { /* max */
if (x > y) return x;
return y;
}

I would propose that -> or . not be considered operators as they have
a really different nature to them that one could not map to function
definition semantics in a sensible way. So as to the question of
whether or not ? : or the quaternary operators that I proposed -- the
idea would be to first *ADD IN* the quaternary base operators that I
have proposed, then just use the "like" clause as I've described
above. So to reduce ambiguity again, I would also not consider the
dereferencing meaning of * or the address taking meaning of & to be
considered operators either.

int _Operator ?> : after ? : like ? : (int x, int y, int z) {
if (x >= 0) return y;
return z;
}
My gut feeling is that this would effectively force the compiler to
maintain a dynamic parser on-the-fly while scanning through the source,
which would be wildly complex.

Yes it complexifies the parser quite a bit. I don't dispute that.
[...] You mentioned that actual languages exist
that do this sort of thing; are they also compiled languages like C, or
are they interpreted languages of the functional variety?

No, actually I am unaware of any language that has infinite operators.
Someone mentioned mathematica, but I am unaware of its language
semantics. I mentioned ML as the inspiration of quaternary operators,
but that's about it.
 
P

Paul Hsieh

Keith Thompson said:
There's ample precedent in other languages (Pascal and Ada at
least) for packed structures. [...] You can't sensible take the
address of packed_obj.i. [...] The simplest approach would be to
forbid taking the address of a member of a packed structure (think of
the members as fat bit fields). [...]

Then what would be the point of even calling it a "struct"? This is
what I am saying -- it leads to bus errors because of the rest of
the language concepts like taking the address of any value that is
stored in a memory location.

Surely it's no worse than calling a struct with bit fields a "struct".

Yeah, but this is one of the *weakness* of the language.
That's the same problem you have with any function that returns a
string. There are numerous solutions; programmers reinvent them all
the time.

Homey saywhat? You have to do the free *after* the printf. Yet you
still need to account for multitasking. You would have to LOCK before
the printf, then FREEALL / UNLOCK after the printf just to make it
work. Which can have a massive performance impact.
If you can come up with a specification for an enhanced printf that
can produce arbitrary user-defined output for arbitrary user-defined
types, we can discuss whether it's better than "%s" with an image
function.

Well, in this case I don't have to. As I've said there are snprintf
format extension mechanism in existence already out there. Just pick
up one of them.
 
D

Dan Pop

In said:
A single vendor?!?! Ooooh ... try not to set your standards too high.
Obviously, its well known that the gnu C++ people are basically converging
towards C99 compliance and are most of the way there already.

OTOH, the fact that they didn't make any progress in this area for
the last two years doesn't sound very encouraging.

http://gcc.gnu.org/c99status.html seems to be quite frozen.

Dan
 
C

CBFalconer

Paul said:
Homey saywhat? You have to do the free *after* the printf. Yet
you still need to account for multitasking. You would have to
LOCK before the printf, then FREEALL / UNLOCK after the printf
just to make it work. Which can have a massive performance impact.

Nonsense. printf() simply has to make a private copy of its data
before returning. This is much easier in languages that use
references. Systems have been managing buffers for some time now.
 
A

Arthur J. O'Dwyer

As I said it would be defined in some sort of grammar. I am not a
parsable language expert but basically it has to allow for things
like:

<<<, ?<, ===, ^^, |||, &&&, etc

but disallow things like:

+++, ---, ~~, !!, **, etc

due to parsing ambiguities with existing semantics.

Nitpick: &&& should be in the same category as !! or **, since it
currently also has useful semantics.
The basic rule for "new operator" candidacy is simple: If it ends
with a unary prefix operator, throw it out. If it begins with a
unary postfix operator, throw it out. If it contains the consecutive
characters "/*", "*/", or "//", throw it out. Everything else should
be acceptable, unless I'm missing a case.
By defining a good grammar.

The ambiguity won't be in the parser; it'll be in the lexer.
And the *semantics* of the code will affect the output of the
lexer, if you allow new operators to evolve on the fly. Which
will make this new language practically impossible to implement
using lex/yacc techniques anymore.
Yes it complexifies the parser quite a bit. I don't dispute that.

Not the parser so much as the lexer. Is "a %^&%*!^ b" three tokens,
four, five, six, seven, eight, or nine? It depends on the semantics
of the code we've already translated.
Note that this is *NOT*, repeat *NOT*, an idea that will ever make
it into the C programming language, for this reason -- as expressed
in this subthread, it would break almost *every* C-parsing tool on
the market.
[...] You mentioned that actual languages exist
that do this sort of thing; are they also compiled languages like C, or
are they interpreted languages of the functional variety?

No, actually I am unaware of any language that has infinite operators.
Someone mentioned mathematica, but I am unaware of its language
semantics. I mentioned ML as the inspiration of quaternary operators,
but that's about it.

ISTR that I brought up the topic in comp.lang.misc a year ago or
thereabouts, but I forget if anything interesting got said except
that which I've repeated here.

-Arthur
 
P

Paul Hsieh

CBFalconer said:
... snip stuff about multitasking ...

Nonsense. printf() simply has to make a private copy of its data
before returning. This is much easier in languages that use
references. Systems have been managing buffers for some time now.

Excuse me, but someone has to *free* the memory. Explain how this is
done without some kind of lock or handle grabbing operation prior to
the printf (in which case you might as well do you own seperate printf
for each -> string operation, and free each result by hand) and
without modifying printf.
 
S

Sidney Cadot

Keith said:
Hey, this is Usenet! You're not supposed to admit mistakes here. The
least you could do is sneakily change the subject and start being
personally abusive. :cool:}

Once more, you're right. A very stupid lapse on my side :)

Best regards,

Sidney
 
C

CBFalconer

Paul said:
Excuse me, but someone has to *free* the memory. Explain how this is
done without some kind of lock or handle grabbing operation prior to
the printf (in which case you might as well do you own seperate printf
for each -> string operation, and free each result by hand) and
without modifying printf.

printf (or anything else) secures the necessary memory, copies
things into it, and can now return to its caller while letting a
separate task process the data. When done, it releases that
memory.
 
S

Sidney Cadot

A single vendor?!?! Ooooh ... try not to set your standards too high.

One has to be conservative when engaging in bets.
Obviously, its well known that the gnu C++ people are basically converging
towards C99 compliance and are most of the way there already. That's not my
point. My point is that will Sun, Microsoft, Intel, MetroWerks, etc join the
fray so that C99 is ubiquitous to the point of obsoleting all previous C's for
all practical purposes for the majority of developers?

I think they will. Could take a couple of years though.
Maybe the Comeau guy
will join the fray to serve the needs of the "perfect spec compliance" market
that he seems to be interested in.

If not, then projects that have a claim of real portability will never
embrace C99 (like LUA, or Python, or the JPEG reference implementation, for
example.) Even the average developers will forgo the C99 features for fear
that someone will try compile their stuff on an old compiler.

Sure, there'll be market inertia, but this also happened with the
transition of K&R -> ANSI fifteen years ago.
Look, nobody uses K&R-style function declarations anymore. The reason is
because the ANSI standard obsoleted them, and everyone picked up the ANSI
standard. That only happened because *EVERYONE* moved forward and picked up
the ANSI standard. One vendor is irrelevant.

Ok. Can't speak for MW, but I think that by the end of 2007 we'll have
near-perfect C99 compilers from GNU, Sun, Microsoft, and Intel. Odds now
down to at 5:1; you're in?
It would be, but only if you have the packed structure mechanism. Other
people have posted indicating that in fact _Packed is more common that I
thought, so perhaps my suggestion is not necessary.

Ok. I agree with you on extending the capabilities of the preprocessor
in general, although I can't come up with actual things I miss in
everyday programming.
I'm saying that trying to fix C's intrinsic problems shouldn't start or end
with some kind of resolution of call stack issues. Anyone who understands
machine architecture will not be surprised about call stack depth limitations.

It's the task of a standard to spell these out, I think.
There are far more pressing problems in the language that one would like to
fix.

Yes, but most things that relate to the "encouraging of writing
extremely unsound and poor code", as you describe C, would be better
fixed by using another language. A lot of the inherently unsafe things
in C are sometimes needed, when doing low-level stuff.
[powerful heap manager]
But a third party library can't do this portably.\0

I don't see why not?

Explain to me how you implement malloc() in a *multithreaded* environment
portably. You could claim that C doesn't support multithreading, but I highly
doubt your going to convince any vendor that they should shut off their
multithreading support based on this argument.

Now your shifting the goalposts.
> By dictating its existence in
the library, it would put the responsibility of making it work right in the
hands of the vendor without affecting the C standards stance on not
acknowledging the need for multithreading.

Obviously, you cannot write a multithreading heap manager portably if
there is no portable standard on multithreading (doh). If you need this
you can always presume POSIX, and be on your way.

I disagree. POSIX is for things like this.
See my multithreading comment above. Also, efficient heaps are usually
written with a flat view of memory in mind. This kind of is impossible in
non-flat memory architectures (like segmented architectures.)

What does the latter remark have to do with C's suitability for doing it?
[...] I want this more for reasons of orthogonality in design than anything
else.
You want orthogonality in the C language? You must be joking ...

Not at all.
Well, I'm a programmer, and I don't care about binary output -- how does your
proposal help me decide what I think is useful to me?

It already did, it seems - You just stated your decision, with regard to
binary output. Fine by me.
[...] . Without being bellingerent: why not use that if you want this
kind of thing?

Well, when I am programming in C++ I will use it. But I'm not going to move
all the way to using C++ just for this single purpose by itself.

I used "%x" as an example of a format specifier that isn't defined ('x'
being a placeholder for any letter that hasn't been taken by the
standard). The statement is that there'd be only 15 about letters left
for this kind of thing (including 'x' by the way -- it's not a hex
specifier). Sorry for the confusion, I should've been clearer.

Well what's wrong with %@, %*, %_, %^, etc?

%* will clash with already legal format specifiers like %*d. All the
others are just plain ugly :)
But a string has variable length. If you allow strings to be mutable, then
the actual sequence of characters has to be put into some kind of dynamic
storage somewhere. Either way, the base part of the string would in some way
have to be the storable into, say a struct. But you can copy a struct via
memcpy or however. But this then requires a count increment since there is
now an additional copy of the string. So how is memcpy supposed to know that
its contents contain a string that it needs to increase the ref count for?
Similarly, memset needs to know how to *decrease* such a ref count.

It's not very practical, is it..... Hmmmm. Instead of thinking of
perverse ways of circumventing these rather fundamental problems, I'll
just concede the point. Good show :)
Hey, its not me -- apparently its people like you who wants more operators.

Just a dozen or so! But _standardized_ ones.

Seriously, though, your "operation introduction" idea is something
different than "operator overloading" alltogether. We should try not to
mix up these two.
My point is that no matter what operators get added to the C language, you'll
never satisfy everyone's appetites. People will just want more and more,
though almost nobody will want all of what could be added.

My solution solves the problem once and for all. You have all the operators
you want, with whatever semantics you want.

That's too much freedom for my taste. If I would want this kind of
thing, I would yank out lex and yacc and code my own language.
Yes, but if instead of actual operator overloading you only allow redefinition
of these new operators, there will not be any of the *surprise* factor.

I don't know if you've ever experienced the mispleasure of having to
maintain code that's not written by yourself, but it's difficult enough
as it is. Adding new operators might be interesting from a theoretical
perspective, but it surely is a bad idea from a broader software
engineering perspective.
If you see one of these new operators, you can just view it like you view an
unfamilliar function -- you'll look up its definition obviously.

There is an important difference: functions have a "name" that has a
mnemonic function. Operators are just a bunch of pixels with no link to
anything else. It's only by a lot of repetition that you get used to
weird things like '<' and '>'. I don't know about you, But I used to do
Pascal before I switched to C. It took me quite some time before I got
used to "!=".
And allowing people to define their own functions with whatever names they
like doesn't lead to unreadable code? Its just the same thing.

Nope. See above.
What makes your code readable is adherence to an agreed upon coding
> standard that exists outside of what the language defines.

There are several such standards for identifier names. No such standard
exists for operator names, except: use familiar ones; preferably, steal
them from other languages. The common denominator of all the identifier
standards is: "use meaningful names". I maintain that there is no
parallel for operators; there's no such thing as a "meaningful"
operator, except when you have been drilled to know their meaning. Your
proposal is in direct collision with this rather important fact of how
the human mind seems to work.
It was just in the notes to some meeting Bjarne had in the last year or so to
discuss the next C++ standard. His quote was something like that: while
adding a feature for C++ can have value, removing one would have even more
value. Maybe someone who is following the C++ standardization threads can
find a reference -- I just spent a few minutes on google and couldn't find it.

Ok. I appreciate the effort.
But so do user definable function names. Yet, functionally they are almost
the same.

"names" refer to (often tangible) objects, whereas "operators" refer to
abstract ideas. I'm no psychologist, but I would guess they could back
up my claim that it's easier for us to handle names than symbols. For
one thing, I have yet to see the first 2-year old that utters "greater
than" as first words.
You missed the "etc., etc., etc." part.

In a sense, I truly missed it. Your suggestions were rather interesting! ;-)
I could keep coming up with them
until the cows come home: a! for factorial, a ^< b for "a choose b" (you want
language supposed for this because of overflow concerns of using the direct
definition) <-> a for endian swapping, $% a for the fractional part of a
floating point number, a +>> b for the average (there is another overflow
issue), etc., etc.

Golly! You truly are good at this :)
No because I want *MORE* operators -- not just the ability to redefine the
ones I've got (and therefore lose some.)

Ok. Your opinion on this is quite clear. I disagree for technical
(implementability) and psychological (names versus symbols) reasons. We
could just leave it at that.
[snipped a bit...]
I find the idea freaky, yet interesting. I think C is not the place for
this (really, it would be too easy to compete in the IOCCC) but perhaps
in another language... Just to follow your argument for a bit, what
would an "operator definition" declaration look like for, say, the "?<"
min operator in your hypothetical extended C?

This is what I've posted elsewhere:

int _Operator ?< after + (int a, int b) {
if (a > b) return a;
return b;
}

I already saw that and reacted. Will come to that in another post.
In two values of the widest type -- just like how just about every
microprocessor which has a multiply does it:

high *% low = a * b;

Hmmm. I hate to do this again, but could you provide semantics? Just to
keep things manageable, I'd be happy to see what happens if high, low,
a, and b are any possible combinations of bit-widths and signedness.
Could you clearly define the meaning of this?
Its not me -- its Intel, IBM, Motorola, Sun and AMD who seem to be obsessed
with these instructions.

I don't see them working the committees to get these supported in
non-assembly languages. I guess they're pretty satisfied with the bignum
libs that exist, that provide assembly implementations for all important
platforms (and even a slow fallback for others). The reality is that
no-one seems to care except you, on this.
Of course Amazon, Yahoo and Ebay and most banks are
kind of obsessed with them too, even if they don't know it.

I think you would find that bignum operations are a small part of the
load on e-commerce servers. All RSA-based protocols just do a small
amount of bignum work at session-establishment time to agree to a key
for a shared-secret algorithm.
How about:

- carry is set to either 1 or 0, depending on whether or not a + b overflows
(just follow the 2s complement rules of one of a or b is negative.)

Hang on, are we talking about "overflow" or "carry" here? These are two
different things with signed numbers.

What happens if a is signed and b is unsigned?
- var is set to the result of the addition; the remainder if a carry occurs.

What happens if the signedness of var, a, and b are not equal?

What happens if the bit-widths of var, a, and b are not equal?
- The whole expression (if you put the whole thing in parenthesese) returns
the result of carry.

..... So this would presume the actual expression is: "+< var = a + b" .
There's no need to introduce a mandatory "carry" variable, then.
In fact, if is were only interested in the carry, I'd be out of luck:
still need the 'var'. That's a bit ugly.

Basically, this is a C-esque syntax for a tuple assignment which
unfortunately is lacking in C:

(carry, value) = a+b
+< would not be an operator in of itself -- the whole syntax is required.
For example: c +< v = a * b would just be a syntax error. The "cuteness" was
stolen from an idea I saw in some ML syntax. Obviously +< - would also be
useful.

I would think you don't need the "c" as well, to make a valid
expression. But I would still need to know what happens with all the
bit-widths and signedness issues.
You can find a binary gcd algorithm that I wrote here:

http://www.pobox.com/~qed/32bprim.c

That's not the "binary GCD algorithm", that's just Knuths version that
avoids modulos. Below is a binary GCD.

unsigned bgcd(unsigned a, unsigned b)
{
unsigned c,e;
if (c=a|b)
{
for(e=0;c%2==0;e++) c/=2;
a>>=e;
b>>=e;

while(a%2==0) a/=2;
while(b%2==0) b/=2;

while (a!=b)
{
if (a<b)
{
b-=a;
do b/=2; while (b%2==0);
}
else
{
a-=b;
do a/=2; while (a%2==0);
}
}
c=a<<e;
}
return c;
}
You will notice how I don't use or care about carries coming out of a right
shift. There wouldn't be enough of a savings to matter.

Check bgcd().
Widening multpilies cost transistor on the CPU. The hardware algorithms are
variations of your basic public school multiply algorithm -- so it takes n^2
transistors to perform the complete operation, where n is the largest bit
word that the machine accepts for the multiplier. If the multiply were not
widened they could save half of those transistors. So multiply those extra
transistors by the number of CPUs shipped with a widening multipliy (PPC,
x86s, Alphas, UltraSparcs, ... etc) and you easily end up in the billion
dollar range.

This is probably the most elaborate version of "yes, I made these
numbers up from thin air" I've ever came across :)

Yup. And it is used too. From machine language.
And that's the problem. They have to be hand written in assembly. Consider
just the SWOX Gnu multiprecision library. When the Itanium was introduced,
Intel promised that it would be great for e-commerce.

Correction: the Intel marketing department promised that it would be
great for e-commerce.
The problem is that the SWOX guys were having a hard time with IA64 assembly
>language (as apparently lots of people are.)

Yes, it's close to a VLIW architecture. Hard to code manually.
So they projected performance results for
the Itanium without having code available to do what they claim. So people
who wanted to consider using an Itanium system based on its performance for
e-commerce were stuck -- they had no code, and had to believe Intel's claims,
or SWOX's as to what the performance would be.

The only thing your example shows is that a marketing angle sometimes
doesn't rhyme well with technical realities.
OTOH, if instead, the C language had exposed a carry propogating add, and a
widening multiply in the language, then it would just be up to the Intel
*compiler* people to figure out how to make sure the widening multiply was
used optimally, and the SWOX/GMP people would just do a recompile for baseline
results at least.

I would guess that Intel, being both a compiler maker and the IA64
manufacturer, could have introduced a macro widemul(hr,lr,a,b) to do
this, and help the SWOX guys out a bit?

My guess is that they have problems with raw performance and/or compiler
technique. I have some experience with a VLIW compiler, and these things
need a compiler pass to do instruction to execution-pipeline allocation.
This is a very active area of research, and notoriously difficult. My
guess is that there are inherent problems of getting high performance
out of IA64 for this kind of algorithms. VLIW and VLIW-like
architectures can do wonders on high-troughput, low-branching type of
work, but they tend to break down on some very simple algorithms, if
there is a lot of branching.

I don't know SWOX; what do they use for bignum multiplication?
Karatsuba's algorithm?

Best regards, Sidney
 
S

Sidney Cadot

Paul said:
As I said it would be defined in some sort of grammar. I am not a
parsable language expert but basically it has to allow for things
like:

<<<, ?<, ===, ^^, |||, &&&, etc

but disallow things like:

+++, ---, ~~, !!, **, etc

due to parsing ambiguities with existing semantics.

I'm sorry, but that's not much of an answer. We are now both stuck, not
knowing whether it is possible even in principle to do this. And the
onus is on you to do so, I would think.
Perhaps as an operator gains unary attributes, tacking things onto the
end of it becomes a syntax error as well.

That statement does not mean a lot without some definitions.
By defining a good grammar.

I seriously doubt this is possible even in principle.
specific type overloading, as is done in C++?

With or without implicit conversions?
The easiest way to handle these issues is to allow an operator inherit
the attributes of a previously defined operator. So we add in the
"like" clause:

int _Operator ?< after + like + (int x, int y) { /* max */
if (x > y) return x;
return y;
}

What would the rules look like to know if such a thing yields a
non-ambiguous tokenizer/parser. This is just to assess whether it could
be made to work _even in theory_. (That's the domain this idea is bound
to anyway).
I would propose that -> or . not be considered operators as they have
a really different nature to them that one could not map to function
definition semantics in a sensible way.

Why not? What if I want an operator a@^&@b that gives me the address of
the field following a.b in struct a?
So as to the question of
whether or not ? : or the quaternary operators that I proposed -- the
idea would be to first *ADD IN* the quaternary base operators that I
have proposed, then just use the "like" clause as I've described
above.

So we're back to square one: more operators in the core language?
So to reduce ambiguity again, I would also not consider the
dereferencing meaning of * or the address taking meaning of & to be
considered operators either.

That's a pretty arbitrary limit.
int _Operator ?> : after ? : like ? : (int x, int y, int z) {
if (x >= 0) return y;
return z;
}


Yes it complexifies the parser quite a bit. I don't dispute that.

....Beyond the complexity of a C++ compiler, for one thing. And quite a
bit beyond it as well.
[...] You mentioned that actual languages exist
that do this sort of thing; are they also compiled languages like C, or
are they interpreted languages of the functional variety?
No, actually I am unaware of any language that has infinite operators.

I think APL has a number of operators that could be described as "close
to infinite" ... :)
Someone mentioned mathematica, but I am unaware of its language
semantics.

That was me. It provides a lot of free operators that you can assign
meaning to (but not an extensible set). But in Matematica, all this is
just syntactic sugar. Your proposal looks moor like syntactic vinegar to
me :)
I mentioned ML as the inspiration of quaternary operators,
but that's about it.

Pity. That could yield some useful lessons.

Best regards, Sidney
 
S

Sidney Cadot

Arthur said:
Nitpick: &&& should be in the same category as !! or **, since it
currently also has useful semantics.
The basic rule for "new operator" candidacy is simple: If it ends
with a unary prefix operator, throw it out. If it begins with a
unary postfix operator, throw it out. If it contains the consecutive
characters "/*", "*/", or "//", throw it out. Everything else should
be acceptable, unless I'm missing a case.

Yes, that would take care of tokens, I guess.
The ambiguity won't be in the parser; it'll be in the lexer.

Until you consider the precedence and associativeness of the new
operator. Surely, this will impact the parser as well.
And the *semantics* of the code will affect the output of the
lexer, if you allow new operators to evolve on the fly. Which
will make this new language practically impossible to implement
using lex/yacc techniques anymore.

On the contrary... In a way, the language Paul proposes already exists:
it's called bison input files, with a C grammer pre-installed.
Not the parser so much as the lexer. Is "a %^&%*!^ b" three tokens,
four, five, six, seven, eight, or nine? It depends on the semantics
of the code we've already translated.

I would say the lexer is the least of the problems. It already has to
maintain dynamic tokens for typedefs, I guess it could handle this as
well. But the dynamic parser.... That would be monster.

Best regards,

Sidney
 
P

Paul Hsieh

(e-mail address removed) says...
Not the parser so much as the lexer. Is "a %^&%*!^ b" three tokens,
four, five, six, seven, eight, or nine? It depends on the semantics
of the code we've already translated.
Note that this is *NOT*, repeat *NOT*, an idea that will ever make
it into the C programming language, for this reason -- as expressed
in this subthread, it would break almost *every* C-parsing tool on
the market.

So would adding &&& and |||. Remember that my premise is that if one is
motivated to add one of those, why not add in something more general?
 
P

Paul Hsieh

printf (or anything else) secures the necessary memory, copies
things into it, and can now return to its caller while letting a
separate task process the data. When done, it releases that
memory.

Which part of "without modifying printf" did you miss? It own output string is
not at issue. What's at issue is that you have just created a function which
returns an string. If the allocation is static, then you have multitasking/re-
entrancy problem. If its dynamic then you have a memory leak issue to contend
with. If you use a ref counting system, you have to same problem as the memory
leak issue -- who decrements the count when printf is done?
 
S

Sidney Cadot

Paul said:
(e-mail address removed) says...



So would adding &&& and |||. Remember that my premise is that if one is
motivated to add one of those, why not add in something more general?

Bad comparison. Support for &&& and ||| would be infinitely easier to
add than support for what you propose.

Regards,

Sidney
 
C

CBFalconer

Paul said:
CBFalconer <[email protected]> wroe:
.... snip ...

Which part of "without modifying printf" did you miss? It own
output string is not at issue. What's at issue is that you have
just created a function which returns an string. If the allocation
is static, then you have multitasking/re- entrancy problem. If its
dynamic then you have a memory leak issue to contend with. If you
use a ref counting system, you have to same problem as the memory
leak issue -- who decrements the count when printf is done?

printf etc are system procedures. If we are building them we
should be free to make them work, or to use suitable lower level
functions that are called by it. ISO C is not a multitasking
system, so to run such programs correctly in a multi-tasking
system requires that the various interfaces be carefully and
correctly designed.

One of the principles to be observed is that data storage not be
released before it is used.

You seem to be arguing about nothing at all.
 
P

Paul Hsieh

Ok. Can't speak for MW, but I think that by the end of 2007 we'll have
near-perfect C99 compilers from GNU, Sun, Microsoft, and Intel. Odds now
down to at 5:1; you're in?

With the *and* in there? Sure. Since Microsoft alone will not do it
(otherwise why not back port MFC to C?), the GNU people may decide that the
last 10% just isn't worth it, Sun has two other languages to worry about that
take precidence in terms of development resources, and once C++0x emerges, C99
development by any of these vendors will almost certainly be halted. I'd have
to be wrong on all of these guys to lose this.
Ok. I agree with you on extending the capabilities of the preprocessor
in general, although I can't come up with actual things I miss in
everyday programming.

I run into this every now and then. For example, I was recently trying to
solve the following problem: Create a directed graph on n points, each with an
out degree of d (I am concerned with very small d, like 2), such that the path
length between any pair of points is minimized.

Turns out that this is a problem whose complexity grows super-exponentially
with very small values of n. Despite my best efforts, I don't know the answer
for n=11, d=2 for example (I know its either 3 or 4, but I can't prove that its
not 3). More startling is the possibility that n=12,d=2 might have a smaller
latency (but I don't know that either)!

Anyhow, the point is that the only way to have a chance to squeeze enough
computational juice out of my PC to solve this, I had to hard code huge amounts
of the code, and use a lot of bit twiddling tricks just for each special case.
I used the preprocessor macros as much as possible to make my code manageable,
but in order to change n, I actually have to *modify* the code by adding and
changing *n* lines of code. There's not much I can do about this, I would have
no chance to solve this otherwise.

With a more powerful preprocessor, or a code generator, I could actually make
this so that I could modify one #define, or even possibly make it run time
settable (though this would make the code much larger.)
It's the task of a standard to spell these out, I think.


Yes, but most things that relate to the "encouraging of writing
extremely unsound and poor code", as you describe C, would be better
fixed by using another language. A lot of the inherently unsafe things
in C are sometimes needed, when doing low-level stuff.

Why is UB in isgraph(-1) needed? Why is UB in gets() needed? Why is the fact
that fgets() skips over '\0' characters needed? Why is a non-portable right
shift on signed integers needed (especially considering the one on unsigned
*is* portable)?
[powerful heap manager]
But a third party library can't do this portably.

I don't see why not?

Explain to me how you implement malloc() in a *multithreaded* environment
portably. You could claim that C doesn't support multithreading, but I highly
doubt your going to convince any vendor that they should shut off their
multithreading support based on this argument.

Now your shifting the goalposts.

How so? The two are related. Writing a heap manager requires that you are
aware of multitasking considerations. If I want to extend the heap manager, I \0
have to solve them each one by one in different ways on different platforms.
And, of course, what sort of expectation of portability would an end user have
knowing that the library itself had to be carefully hand ported?

Compare this to the string library that I wrote (http://bstring.sf.net/). Its
totally portable. Although I don't have access to MAC OS X, and don't use gnu
C on Linux, in order to test to make sure, I know for a fact that end users
from both platforms have downloaded and have/are actively using it. I know it
works in 16 bit and 32 bit DOS/Windows environments, etc., etc. Its totally
portable, in the realworld sense of portable (semantically, as well as
syntactically.) The point is that the end users know that there is absolutely
0 risk of losing portability or having porting problems because of the use of
this library.

Third party tools for advanced heaps makes little sense. It would only be
worth consideration if it were actively ported to many platforms -- which
increases its cost. I.e., for whatever set of platforms I am considering using
such a library, I am paying for the development cost of every playform that it
would be ported to in the price of using it. Its also highly useless for new
hardware platforms which are in development. Even having access to the source
is of lesser value if there are platform specifics in each instance that
requires munging just to port it.

The point is, if the C standard were simply to add this functionality straight
into the library, then it would be each compiler vendor's responsibility to add
this functionality into the compiler. And the functionality would be
inherently portable as a result. The requirement of multitasking support would
then be pushed back into the vendors lap -- i.e., the people who have
introduced this non-portable feature in the first place.
Obviously, you cannot write a multithreading heap manager portably if
there is no portable standard on multithreading (doh). If you need this
you can always presume POSIX, and be on your way.

You do not understand. The *IMPLEMENTATION* needs to be multithreading aware.
From a programmer's point of view, the multithreading support is completely
transparent. This has no end user impact from a POSIX specification point of
view.
I disagree. POSIX is for things like this.


What does the latter remark have to do with C's suitability for doing it?

I am saying that I don't disagree with you, but I think you are missing the
point. By simply adding the features/function into C, that would make it
defacto portable from the point of view that matters -- programmers of the C
language.

For most flat-memory architectures, its actually very straightforward to add in
all the features that I am requesting. I know this because I've written my own
heap manager, which of course uses platform specific behaviour for the one
platform I am interested in. (It only gets complicated for platforms with
unusual memory, like segmented architectures, which have correspondingly
complicated heap managers today.) This is in stark contrast with the
incredibly high bar of platform-specific complications set by trying to do this
outside of the language.
It already did, it seems - You just stated your decision, with regard to
binary output. Fine by me.

Not fine by me. Because some other programmer who's code I have to look at
will use it, and I won't have any idea what it is.
Just a dozen or so! But _standardized_ ones.

And you don't see the littlest problem with this proposal? If you add a dozen,
you'd better be sure that they are last dozen that anyone could possibly want
to add to the language. Their value add would have be worth the pain of having
everyone learn about 12 new symbols.
That's too much freedom for my taste. If I would want this kind of
thing, I would yank out lex and yacc and code my own language.

Well how is that different from adding just &&& and |||? If you *REALLY
REALLY* want them, then why don't you yank out lex and yacc and code up a new
language?
I don't know if you've ever experienced the mispleasure of having to
maintain code that's not written by yourself, but it's difficult enough
as it is.

Do it all the time.
[...] Adding new operators might be interesting from a theoretical
perspective, but it surely is a bad idea from a broader software
engineering perspective.

Its no worse than trying to add in &&& or ||| today.
There is an important difference: functions have a "name" that has a
mnemonic function.

But the name may be misleading -- as is the case, more often than not, just
reflecting the thought of the original instance by the original programmer who
maybe cut and paste it from somewhere else.
[...] Operators are just a bunch of pixels with no link to
anything else. It's only by a lot of repetition that you get used to
weird things like '<' and '>'. I don't know about you, But I used to do
Pascal before I switched to C. It took me quite some time before I got
used to "!=".

And how long will it take the rest of us to get used to your weird &&& or |||
operators?
There are several such standards for identifier names. No such standard
exists for operator names, except: use familiar ones; preferably, steal
them from other languages.

Sounds like a reasonable convention to me. How about: All new operators must
be defined in a central module named ----. Or: Only these new operators may be
added as defined by ... yada, yada, yada. The coding standards are just
different.
[...] The common denominator of all the identifier
standards is: "use meaningful names". I maintain that there is no
parallel for operators; there's no such thing as a "meaningful"
operator, except when you have been drilled to know their meaning. Your
proposal is in direct collision with this rather important fact of how
the human mnd seems to work.

Just like freeform variable names, there is the same incumberance of the
programmers managing the meaning of the symbols for more generic operators.
You have not made a sufficient case to convince me that there is a real
difference between the two.

BTW -- odd how you actually thought that these were "nice". Do you think you
would have a hard to remembering them? Would it wrankle on your brain because
they were odd and unfamilliar operators that are new to the language? Would
you have a tendancy to write abusive code because of the existence of these new
operators?

Do you think perhaps its possible to use an arbitrarily extendable operator
mechanism in order to *clarify* or make code actually more maintainable?
Hmmm. I hate to do this again, but could you provide semantics? Just to
keep things manageable, I'd be happy to see what happens if high, low,
a, and b are any possible combinations of bit-widths and signedness.
Could you clearly define the meaning of this?

a, b, high and low must be integers. The signedness of the result of a * b (as
if it were not widened) dictates the result signedness. Coercion will happen
as necessary when storing to high and low. The whole expression will have a
side effect of returning high.
I don't see them working the committees to get these supported in
non-assembly languages.

That's because they don't care about how the it goes down once it hits
software. Someone has to pay something for a platform specific library? -- who
cares, as long as they sell their hardware. These same people didn't get
together to define the C standard did they? Why would they bother with an
extension just for widening multiplies?

Instead the hardware people waste their time on trivial little specifications
like IEEE-754, which the C standards idiots don't bother to look at until 15
years later.
[...] I guess they're pretty satisfied with the bignum
libs that exist, that provide assembly implementations for all important
platforms (and even a slow fallback for others). The reality is that
no-one seems to care except you, on this.

The hardware people care that it exists, and not about the form of its
existence, so long as it gets used. For anyone who wants to *USE* this
functionality, though, they are stuck with assembly, or third party libraries.
I think you would find that bignum operations are a small part of the
load on e-commerce servers.

According to a paper by Intel, widening multiply accounts for something like
30% of the load on typical e-commerce transactions (typing in your credit card
over the net in a way that can't be snooped.) One single assembly instruction
(one *bundle* on Itanium) holds a whole server down for 30% of its computation,
versus the total 100K line e-commerce software required to the do the rest.
That's why HW manufacturers are keen on the number of transistors they spend
making this one opeartion reasonably fast.
[...] All RSA-based protocols just do a small
amount of bignum work at session-establishment time to agree to a key
for a shared-secret algorithm.

This is only useful for much larger secure transations like ssh, or an
encrypted phone call or something. E-commerce is a much smaller, one shot
transaction, where the RSA computation dominates.
Hang on, are we talking about "overflow" or "carry" here? These are two
different things with signed numbers.

What happens if a is signed and b is unsigned?

My intend was for the operation to follow the semantics of the x86 ADC assembly
instruction. The point is that this instruction is known to be proper for
doing correct bignum additions.
What happens if the signedness of var, a, and b are not equal?

It just behaves like the ADC x86 assembly instruction, the details of which I
will not regurgitate here.
What happens if the bit-widths of var, a, and b are not equal?

The bit-widths would be converted as if the (a + b) operation were happening in
isolation, to match C language semantics.
.... So this would presume the actual expression is: "+< var = a + b" .
There's no need to introduce a mandatory "carry" variable, then.

True. Is this a problem? Perhaps you would like it to return var as a side
effect instead to avoid this redundancy? I don't have that strong of a feeling
on it.
In fact, if is were only interested in the carry, I'd be out of luck:
still need the 'var'. That's a bit ugly.

You could just omit it as a degenerate form:

+< = a + b
Basically, this is a C-esque syntax for a tuple assignment which
unfortunately is lacking in C:

(carry, value) = a+b

Yeah see, but the problem is that this encompasses existing C-language forms.
For all I know this might be legal C syntax already (I wouldn't know, I just
don't use the "," operator in this way) in which case we are kind of already
dead with backwards compatibility. There's also nothing in that syntax to
indicate some new thing was happening that is capturing the carry.
That's not the "binary GCD algorithm", that's just Knuths version that
avoids modulos. Below is a binary GCD.

Sorry, a previous version that I never put out on the web used the binary
algorithm. I tested Knuths as much faster and thus updated it, and forgot that
I had done this.
This is probably the most elaborate version of "yes, I made these
numbers up from thin air" I've ever came across :)

But I didn't. I used to work with one of these companies. People spend time
and consideration on this one instruction. You could just feel the impact that
this one instruction was going to have, and the considerations for the cost of
its implementation. I could easily see a quarter million just in design
effort, then some quarter million in testing, not to mention the cost of the
extra die area once it shipped -- and this is inside of *ONE* of these
companies, for *ONE* chip generation.

For example, inside of Intel, they decided that they were going to reuse their
floating point multiplier for their widening integer add for Itanium. But that
meant that the multiplier had to be able to do 128 bit multiplies (as opposed
to 82 bits, which is all the Itanium would have apparently needed) and
couldn't run a floating point and integer multiply at the same time. This has
non-trivial layout and design impact on the chip.
Yup. And it is used too. From machine language.

And *only* machine language. That's the point.
Correction: the Intel marketing department promised that it would be
great for e-commerce.

I'll be sure to go track down the guy who gave an 2-hour long presentation
showing the guts of a clever 56/64 bit carry avoiding bignum multiply algorithm
that Intel was pushing for Itanium and SSE-2, that he's really just a marketing
guy. Intel's claims were real -- they had working code in-house.
The only thing your example shows is that a marketing angle sometimes
doesn't rhyme well with technical realities.

No, it shows that without having a proper path, technology can be bogged down
by inane weaknesses in standards. Intel had its *C* compilers ready to go for
the Itanium *LONG* before any of this happened. Even to this day, we are
*STILL* waiting for the code to make it into GMP:

http://www.swox.com/gmp/gmp-speed.html

The grey bars indicate where they *think* the performance will be (because they
can't get their hands on the platform, or because they are still hacking on the
code) and the pink bars are actual delivered performance. So is Itanium really
fast or really slow at this? From that chart its impossible to tell for sure.
I would guess that Intel, being both a compiler maker and the IA64
manufacturer, could have introduced a macro widemul(hr,lr,a,b) to do
this, and help the SWOX guys out a bit?

They could have. But what kind of relationship do you think a proprietary
company like Intel has with a bunch of GPL geeks?
I don't know SWOX; what do they use for bignum multiplication?
Karatsuba's algorithm?

I think they have an option for that. But from my recollection of having
looked into this, by the time Karatsuba is useful, more advanced methods like
Toom Cook or straight to FFTs become applicable as well.
 
A

Arthur J. O'Dwyer

Bad comparison. Support for &&& and ||| would be infinitely easier to
add than support for what you propose.

Good comparison. Support for &&& and ||| is exactly as likely to
break existing tools, and is exactly as likely to make it into a
future version of C. [There may not be a firm cause-and-effect there,
but it is correlated.]

Remember *my* premise: *If* one were motivated to add one of those,
why not add in wings to pigs? :)

-Arthur
 
S

Sidney Cadot

Arthur said:
Paul said:
(e-mail address removed) says...
[...]

So would adding &&& and |||. Remember that my premise is that if one is
motivated to add one of those, why not add in something more general?

Bad comparison. Support for &&& and ||| would be infinitely easier to
add than support for what you propose.
Good comparison. Support for &&& and ||| is exactly as likely to
break existing tools, and is exactly as likely to make it into a
future version of C. [There may not be a firm cause-and-effect there,
but it is correlated.]

Yes, &&& and ||| would break existing tools, but there is precedent for
that (introduction of // comments). The not unimportant difference is
that adding support for &&& and ||| would be rather trivial, while
support for Paul's proposal is a complete nightmare.

As to the likelihood of either feature making it into the C standard: I
disagree. The chances of ||| and &&& being added is many orders of
magnitudes greater than that of the the "operator introduction" feature
championed by Paul. Still very close to zero, of course. One has to
approach probabilities of this magnitude logarthmically :)
Remember *my* premise: *If* one were motivated to add one of those,
why not add in wings to pigs? :)

This did not go down well at all at sci.bio.genetic-engineering.pigs. It
seems that lack of vision is not limited to c.l.c ;-)


Best regards,

Sidney
 
S

Sidney Cadot

Paul said:
With the *and* in there? Sure. Since Microsoft alone will not do it
(otherwise why not back port MFC to C?), the GNU people may decide that the
last 10% just isn't worth it, Sun has two other languages to worry about that
take precidence in terms of development resources, and once C++0x emerges, C99
development by any of these vendors will almost certainly be halted. I'd have
to be wrong on all of these guys to lose this.

That's why the odds are down to 1:5 :) Four years is quite a long time...
I run into this every now and then. For example, I was recently trying to
solve the following problem: Create a directed graph on n points, each with an
out degree of d (I am concerned with very small d, like 2), such that the path
length between any pair of points is minimized.

Turns out that this is a problem whose complexity grows super-exponentially
with very small values of n. Despite my best efforts, I don't know the answer
for n=11, d=2 for example (I know its either 3 or 4, but I can't prove that its
not 3). More startling is the possibility that n=12,d=2 might have a smaller
latency (but I don't know that either)!

I used to do programming contests in university (both as a participant
and as a judge later on), so I quite like this sort of problem. Care to
provide some background (via regular e-mail), I think I'd like to think
about this for a bit.
Anyhow, the point is that the only way to have a chance to squeeze enough
computational juice out of my PC to solve this, I had to hard code huge amounts
of the code, and use a lot of bit twiddling tricks just for each special case.
I used the preprocessor macros as much as possible to make my code manageable,
but in order to change n, I actually have to *modify* the code by adding and
changing *n* lines of code. There's not much I can do about this, I would have
no chance to solve this otherwise.

Ok, that's a case. But wouldn't it have been better then just to make a
program generator (written in C or something else)? Or perhaps use the
m4 macro processor?
Why is UB in isgraph(-1) needed? Why is UB in gets() needed? Why is the fact
that fgets() skips over '\0' characters needed? Why is a non-portable right
shift on signed integers needed (especially considering the one on unsigned
*is* portable)?

These are side cases, that are at least mentioned in the standard. No
big deal, IMHO. This is much more of a big deal:

int count_nodes(struct Tree *t)
{
return t ? count_nodes(t->left)+count_nodes(t->right)
: 0;
}

Things like this are ubiquitous in many kinds of code. Trees with a big \0
depth are not uncommon. How can I ever claim such a function is portable
if I have no guarantee whatsoever that it will work on some architectures?
[powerful heap manager]

But a third party library can't do this portably.

I don't see why not?

Explain to me how you implement malloc() in a *multithreaded* environment
portably. You could claim that C doesn't support multithreading, but I highly
doubt your going to convince any vendor that they should shut off their
multithreading support based on this argument.
Now your shifting the goalposts.

Since in your original statement there was no mention of multithreading.
Allow me to refresh your memory:

"But a third party library can't do this portably. Its actual useful
functionality that you just can't get from the C language, and there's
no way to reliably map such functionality to the C language itself.
One is forced to know the details of the underlying platform to
implement such things. Its something that really *should* be in the
language." [Paul Hsieh, 4 posts up]

See?
The two are related. Writing a heap manager requires that you are
aware of multitasking considerations.

Nonsense. There's many real hardware out there where multitasking is a
non-issue.
If I want to extend the heap manager, I have to solve them each one by one
> in different ways on different platforms

Nonsense. Unless you need support for something inherently
platform-specific like multithreading.
And, of course, what sort of expectation of portability would an end user have
knowing that the library itself had to be carefully hand ported?

"None whatsoever", I suspect the answer to this rhetorical question
would be? I still don't see a problem writing a super-duper heap manager
on top of malloc().
Compare this to the string library that I wrote (http://bstring.sf.net/). Its
totally portable. Although I don't have access to MAC OS X, and don't use gnu
C on Linux, in order to test to make sure, I know for a fact that end users
from both platforms have downloaded and have/are actively using it. I know it
works in 16 bit and 32 bit DOS/Windows environments, etc., etc. Its totally
portable, in the realworld sense of portable (semantically, as well as
syntactically.) The point is that the end users know that there is absolutely
0 risk of losing portability or having porting problems because of the use of
this library.
Third party tools for advanced heaps makes little sense. It would only be
worth consideration if it were actively ported to many platforms -- which
increases its cost. I.e., for whatever set of platforms I am considering using
such a library, I am paying for the development cost of every playform that it
would be ported to in the price of using it. Its also highly useless for new
hardware platforms which are in development. Even having access to the source
is of lesser value if there are platform specifics in each instance that
requires munging just to port it.
The point is, if the C standard were simply to add this functionality straight
into the library, then it would be each compiler vendor's responsibility to add
this functionality into the compiler.

Sure but then the compiler would be more expensive to make, and I would
have to pay more $$$ for a feature I don't need.
> And the functionality would be
inherently portable as a result. The requirement of multitasking support would
then be pushed back into the vendors lap -- i.e., the people who have
introduced this non-portable feature in the first place.

Adding multithreading support to a language is a can of worms. I don't
think you quite know what you're suggesting here. Many difficult
semantic issues to resolve. The fun thing about C is that it's simple.

Huh? The C standard describes the library.
You do not understand. The *IMPLEMENTATION* needs to be multithreading aware.

Perhaps I don't understand. The implementation... of what?

From a programmer's point of view, the multithreading support is completely
transparent.

If you know a way of doing that, you'll make big headlines in the
computer science world. I have yet to see a multithreading model that is
not either severely limited in what it can do or hard to use.
This has no end user impact from a POSIX specification point of view.

Sorry, I don't understand this remark.
I am saying that I don't disagree with you, but I think you are missing the
point. By simply adding the features/function into C, that would make it
defacto portable from the point of view that matters -- programmers of the C
language.

Well, let's see your proposal then. How would you weave threads in the C
language portably? No handwaving please, you will have to come up with
rock-solid semantics.
For most flat-memory architectures, its actually very straightforward to add in
all the features that I am requesting. I know this because I've written my own
heap manager, which of course uses platform specific behaviour for the one
platform I am interested in. (It only gets complicated for platforms with
unusual memory, like segmented architectures, which have correspondingly
complicated heap managers today.) This is in stark contrast with the
incredibly high bar of platform-specific complications set by trying to do this
outside of the language.

I'm eagerly awaiting your proposal that shows how to do all this in a
platform independent way.
Not fine by me. Because some other programmer who's code I have to look at
will use it, and I won't have any idea what it is.

Don't worry, you can look it up in the man pages. Unlike your
user-defined operators... :)
And you don't see the littlest problem with this proposal? If you add a dozen,
you'd better be sure that they are last dozen that anyone could possibly want
to add to the language. Their value add would have be worth the pain of having
everyone learn about 12 new symbols.

Well, lets start with ?< ?> and ?: for now; ?: instead of |||, and I
drop the &&& requests since it is much less useful. It's already in gcc
so it's implementable. Give everybody 10 years to get used to it, then
we may add a couple more.
Well how is that different from adding just &&& and |||? If you *REALLY
REALLY* want them, then why don't you yank out lex and yacc and code up a new
language?

Because I want to use them in C programs.
I don't know if you've ever experienced the mispleasure of having to
maintain code that's not written by yourself, but it's difficult enough
as it is.

Do it all the time.
[...] Adding new operators might be interesting from a theoretical
perspective, but it surely is a bad idea from a broader software
engineering perspective.

Its no worse than trying to add in &&& or ||| today.

I disagree. Let's leave it at that, shall we?
But the name may be misleading -- as is the case, more often than not, just
reflecting the thought of the original instance by the original programmer who
maybe cut and paste it from somewhere else.

Yes, well, you can practice bad coding style in any language. Names at
least have a fighting chance of being self-explanatory, which is not
true for operators.
[...] Operators are just a bunch of pixels with no link to
anything else. It's only by a lot of repetition that you get used to
weird things like '<' and '>'. I don't know about you, But I used to do
Pascal before I switched to C. It took me quite some time before I got
used to "!=".
And how long will it take the rest of us to get used to your weird &&& or |||
operators?

A couple of weeks?
Sounds like a reasonable convention to me. How about: All new operators must
be defined in a central module named ----. Or: Only these new operators may be
added as defined by ... yada, yada, yada.

More handwaving... Please humour me, and spell out a reasonable
convention for your operator-introduction.
The coding standards are just different.

I can't tell; I haven't seen a coding standard for operator introduction
yet.
[...] The common denominator of all the identifier
standards is: "use meaningful names". I maintain that there is no
parallel for operators; there's no such thing as a "meaningful"
operator, except when you have been drilled to know their meaning. Your
proposal is in direct collision with this rather important fact of how
the human mind seems to work.
Just like freeform variable names, there is the same incumberance of the
programmers managing the meaning of the symbols for more generic operators.
You have not made a sufficient case to convince me that there is a real
difference between the two.

....Have you ever laid eyes on a moderately complicated APL program,
without knowing APL? Or perhaps a Forth program, without knowing Forth?
That would give you some clue as to what a program looks like when it
doesn't use a lot of names, but does use a lot of unknown symbols. The
effect is similar to what written Arabic looks like to people used to
the Roman alphabet, and vice versa.
BTW -- odd how you actually thought that these were "nice". Do you think you
would have a hard to remembering them? Would it wrankle on your brain because
they were odd and unfamilliar operators that are new to the language? Would
you have a tendancy to write abusive code because of the existence of these new
operators?

The meanings you gave for these rang some bells with my programming
experience. It leads me to believe you have encountered similar
situations, which gives you some bonus points for credibility in my book
:) Seriously, I don't think such operators would be a good idea, but
I'd welcome library fuctions (the old fashioned ones, with names) for them.
Do you think perhaps its possible to use an arbitrarily extendable operator
mechanism in order to *clarify* or make code actually more maintainable?

A resounding 'no'. It's an interesting thought, but for me it falls
short on technical and psychological grounds, as I previously tried to
explain.
a, b, high and low must be integers.

I'm with you so far ... ;-)
> The signedness of the result of a * b (as if it were not widened) dictates
> the result signedness.

Ok.... This would fit, I'll give you that.
Coercion will happen as necessary when storing to high and low.

Ok. Let's call the "as if it were not widened" result X; X=a*b (in a
mathematical sense).

Now before coercion can begin, we'll have to split X; that's the crucial
step of the semantics; pseudo-code:

high = coerce( high_order_value(X) )
low = coerce( low_order_value (X) )

Now all we need is a good definition of high_order_value() and
low_order_value(), based on the signedness and bit-width of X, and
possibly the signedness and bit-width of high/low. How would they look?

The whole expression will have a side effect of returning high.

I'd formulate this as "have a value", otherwise it looks ok. I'm curious
how you are going to define the high_order_value and low_order_value
semantics.

Well they "provide" them, if that's what you mean.
That's because they don't care about how the it goes down once it hits
software. Someone has to pay something for a platform specific library? -- who
cares, as long as they sell their hardware. These same people didn't get
together to define the C standard did they? Why would they bother with an
extension just for widening multiplies?

Ok. So your point was, in the first place.... ?
Instead the hardware people waste their time on trivial little specifications
like IEEE-754, which the C standards idiots don't bother to look at until 15
years later.

You shouldn't trivialize IEEE-754, it's one hell of a standard, probably
one of the best I have ever seen. And as you are probably well aware,
there's a perfectly good reason why its only now that the C standard
committee is looking at it (only now hardware supporting it is widely
available). I really don't see your point here.
[...] I guess they're pretty satisfied with the bignum
libs that exist, that provide assembly implementations for all important
platforms (and even a slow fallback for others). The reality is that
no-one seems to care except you, on this.
The hardware people care that it exists, and not about the form of its
existence, so long as it gets used. For anyone who wants to *USE* this
functionality, though, they are stuck with assembly, or third party libraries.

What's so bad about using third-party libraries?
According to a paper by Intel, widening multiply accounts for something like
30% of the load on typical e-commerce transactions (typing in your credit card
over the net in a way that can't be snooped.)

Reference, please.

"load on typical e-commerce transaction" != "load on e-commerce server".

If the transaction is simply setup session/send credit-card
information/close session, this could be right. However, a jiffy ago we
were talking SSL.
One single assembly instruction (one *bundle* on Itanium) holds a whole server
> down for 30% of its computation,

Again, you're equating server load with transaction load.
versus the total 100K line e-commerce software required to the do the rest.
That's why HW manufacturers are keen on the number of transistors they spend
making this one opeartion reasonably fast.

I'm happy they do, really. I love speed of third-party bignum libraries
nowadays.
[...] All RSA-based protocols just do a small
amount of bignum work at session-establishment time to agree to a key
for a shared-secret algorithm.

This is only useful for much larger secure transations like ssh, or an
encrypted phone call or something. E-commerce is a much smaller, one shot
transaction, where the RSA computation dominates.

I'm sure there are situations where this arises, although it's not
likely to happen when I order something with my credit card. More
inter-bank kind of stuff.

Now why would you not simply use a library for this?
My intend was for the operation to follow the semantics of the x86 ADC assembly
instruction.

Fun thing is: it handles both signed and unsigned numbers. Not-so-funny
is the fact that the x86 supports (as do most microprocessors) both an
overflow flag and a carry flag. So, armed with this, please elaborate on
the meaning.
The point is that this instruction is known to be proper for
doing correct bignum additions.

It sure is... There are libraries containing hand-written assembly that
do fine :)
It just behaves like the ADC x86 assembly instruction, the details of which I
will not regurgitate here.

So on adding a signed plus unsigned.... The carry gets a value "as if"
the signed value was actually unsigned? On adding two signed numbers,
the carry gets a value "as if" both numbers were unsigned (instead of
the much more useful "overflow" indicator)? This looks like badly
designed semaantics.
The bit-widths would be converted as if the (a + b) operation were happening in
isolation, to match C language semantics.

Ok, so if either is signed, they will both be signed, and you get back a
useless carry bit. Doesn't sound very good to me.
True. Is this a problem? Perhaps you would like it to return var as a side
effect instead to avoid this redundancy? I don't have that strong of a feeling
on it.

No, carry/overflow will do fine, it's what you're most interested in,
most of the time.
You could just omit it as a degenerate form:

+< = a + b

Two operators sitting next to each other. I think it's ugly, but there
you go.
Yeah see, but the problem is that this encompasses existing C-language forms.
For all I know this might be legal C syntax already (I wouldn't know, I just
don't use the "," operator in this way) in which case we are kind of already
dead with backwards compatibility.

Unfortunately, there is no tuple assignment in C. It was just a piece of
pseudo-code to elucidate the meaning.
There's also nothing in that syntax to
indicate some new thing was happening that is capturing the carry.

Suppose we have tuples, we could write (c,v) = a <+ v, where the <+
operator is defined to return a tuple expression. Now tuples, that would
be _really_ nice. Finally I could swap with (a,b)=(b,a).
Sorry, a previous version that I never put out on the web used the binary
algorithm. I tested Knuths as much faster and thus updated it, and forgot that
I had done this.

In assembly they are quite evenly matched :)
But I didn't.

Oh, but you did. Billion is 1,000,000,000 which is quite a huge number.
The only way we humans can handle this is by doing some calculations...
It doesn't help to just throw numbers of this magnitude around without a
reference.
I used to work with one of these companies. People spend time
and consideration on this one instruction. You could just feel the impact that
this one instruction was going to have, and the considerations for the cost of
its implementation.

Sure, it's an important instruction.
I could easily see a quarter million just in design
effort, then some quarter million in testing, not to mention the cost of the
extra die area once it shipped -- and this is inside of *ONE* of these
companies, for *ONE* chip generation.

We're still about 999,500,000 USD short of a billion.
For example, inside of Intel, they decided that they were going to reuse their
floating point multiplier for their widening integer add for Itanium. But that
meant that the multiplier had to be able to do 128 bit multiplies (as opposed
to 82 bits, which is all the Itanium would have apparently needed) and
couldn't run a floating point and integer multiply at the same time. This has
non-trivial layout and design impact on the chip.

Sure does. I estimate the cost of this design work would be in the range
of 10^6 dollars, but not 10^9 dollars.
And *only* machine language. That's the point.

....Which is fine, as far as I am concerned.
I'll be sure to go track down the guy who gave an 2-hour long presentation
showing the guts of a clever 56/64 bit carry avoiding bignum multiply algorithm
that Intel was pushing for Itanium and SSE-2, that he's really just a marketing
guy. Intel's claims were real -- they had working code in-house.

Sounds like a technical guy to me, who was given the nice task of
thinking about all this. Probably even had a sheet about "the importance
for e-commerce" as a rationale for all this work. That's all perfectly
fine, except that I still fail to see why it is so important to code
this kind of high-performance code in C instead of assembly.

It sounds like Intel should've contributed some hand-written assembly to
SWOX, then.
No, it shows that without having a proper path, technology can be bogged down
by inane weaknesses in standards. Intel had its *C* compilers ready to go for
the Itanium *LONG* before any of this happened. Even to this day, we are
*STILL* waiting for the code to make it into GMP:

http://www.swox.com/gmp/gmp-speed.html
The grey bars indicate where they *think* the performance will be (because they
can't get their hands on the platform, or because they are still hacking on the
code) and the pink bars are actual delivered performance. So is Itanium really
fast or really slow at this? From that chart its impossible to tell for sure.

It sounds like Intel should contribute some hand-written assembly to
SWOX, then.
They could have. But what kind of relationship do you think a proprietary
company like Intel has with a bunch of GPL geeks?

Not a very good one, it seems. Perhaps some PHB should reconsider. These
"GPL geeks" have delivered the gcc compiler, and all. But perhaps they
are also trying to sell their own compiler. Oh, the complexities of
managing... As a technical guy, I say they should just contribute to GMP
and don't make a lot of fuss about it. Perhaps even spin it that they
are pro open-source.
I think they have an option for that. But from my recollection of having
looked into this, by the time Karatsuba is useful, more advanced methods like
Toom Cook or straight to FFTs become applicable as well.

I think Karatsuba is best for the ranges dealt with by RSA.

Best regards,

Sidney
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,159
Messages
2,570,879
Members
47,414
Latest member
GayleWedel

Latest Threads

Top