So what Standard are we working off?

M

Mark F. Haigh

C is a language whose problems are paramount and staring everyone in
the face. But C99 basically addresses none of these problems over C89.
The rise of the importance of cryptography as a field points out two
problems with the C language: 1) No simple way to express a high-word
multiply operation (even though the vast majority of CPU architectures
do support this with direct, and often highly accelerated hardware
support), [...]

Practically speaking, this is a non-issue. People just insert the
machine instruction into the code via platform-specific inline assembly
support.

Some machines have a fast population count instruction. Does that mean
C needs simply-expressed support for it? Of course not.
[...] and 2) Non-determinable integer scalar sizes, that are not
enforced to 2s complement. The crypto community responds simply by
ignoring the standard and instead uses de facto standards (long is
exactly 32 bits, all integers are 2s complement, right shift retains
the sign of signed integers, etc.) C99 did not address this problem.
Java and Python do.

Long is exactly 32 bits? A couple of the boxes at my desk disagree
with that "de facto standard" (and yet still manage to do crypto).

7.8.1.1 Exact-width integer types

1 The typedef name intN_t designates a signed integer type with
width N, no padding bits, and a two's complement representation. Thus,
int8_t denotes a signed integer type with a width of exactly 8 bits.
But the big elephant in the room is the extreme fragility of the C
language with respect to erroneous behaviour. The language is littered
with undefined behavior, and basically embodies buffer overflows as
practically standard usage. The standard library includes functions
that are not practically implementable in re-enterable ways (many of
these have been addressed in TR 24731 (and also in the latest
Bstrlib)). Any library function which takes multiple pointers assumes
no aliasing, otherwise leading to UB (which C99 did nothing but
reaffirm more explicitely through syntax).

Huh? Any library function which takes multiple pointers __assumes
aliasing__ between types allowed to alias, not the other way around.
What do you think 'restrict' was for?

<snip>

Mark F. Haigh
(e-mail address removed)
 
K

Keith Thompson

C is a language whose problems are paramount and staring everyone in
the face. But C99 basically addresses none of these problems over C89.
The rise of the importance of cryptography as a field points out two
problems with the C language: 1) No simple way to express a high-word
multiply operation (even though the vast majority of CPU architectures
do support this with direct, and often highly accelerated hardware
support),

I'm not aware of any other languages that support this.
and 2) Non-determinable integer scalar sizes, that are not
enforced to 2s complement.

I.e., the standard can be implemented on architectures that don't use
2's complement.
The crypto community responds simply by
ignoring the standard and instead uses de facto standards (long is
exactly 32 bits, all integers are 2s complement, right shift retains
the sign of signed integers, etc.) C99 did not address this problem.
Java and Python do.

Code that assumes long is exactly 32 bits will break on several of the
platforms I use every day.

[...]
3) variable
length arrays, which is just redundant with the common practice of
using the struct hack (why not just expand the standard to say that the
struct hack is not a violation of the standard?)

I think you're thinking of flexible array members, not variable length
arrays (they're two different things).
4) arbitrary
positioned variable declarations, which serves no relevant purpose
other than some typing convenience or cosmetic effect.

Allowing declarations and statements to be mixed can make it easier to
declare variables whose initializations depend on previous statements
You can also do that by introducing a new scope, but that can be
awkward.
 
W

websnarf

Keith said:
I'm not aware of any other languages that support this.

Python and Java both have bignum classes. I don't know how Java's
implementation works, but Python uses GMP as its back-end. That means
that it includes all the assembly paths that have been encoded into
GMP. (It also means Python's back-end will never support
multithreading, since GMP is not multithreading safe.) Bignum classes
are basically a superset of high-word multiplication, and actually is
the primary purpose for having a high-word multiplication capability.

It means that *pure* Python code will outperform pure C code on large
bignum operations. This is not hyperbole. Python in general is at
least 10 times slower than C on nearly any kind of operation -- but on
RSA crypto, or bignum the only chance C has is to use libraries like
GMP which include platform specific assembly, and have various warts
like breaking multi-threading. In the limiting cases, pure Python
versus pure C will be a 4 to 1 speed advantage to Python.

Obviously C would never include a bignum library, but the mere
inclusion of high-word multiplication alone, means the only barrier to
implementing bignum libraries in pure C that are at near-optimal
machine speed is just writing a fairly straightforward library to do
so. So while demanding that C keep up with other languages on variable
length integers might be a bit much to ask for, asking for a single
operation, namely high-word multiply, is not. And it would be more in
keeping with C's minimalist approach. A bignum library might implement
just another fixed size integer type that is larger, or it might be
dynamically resizable, and there are issues with making such a library
go fast *and* be multithreaded (though I believe its possible to do
both, though the GMP people have not yet figured it out) -- so it makes
sense that C programmers be able to make that choice themselves.
I.e., the standard can be implemented on architectures that don't use
2's complement.

Right; and there was a time where this might have made sense. It was
certainly long before 1999.

*I* as a programmer can will never ever be in a position to test the
validity of code on a non-2s complement machine. I can't realistically
buy such a machine. I can't gain access to one, and I don't personally
know of anyone who has access to such a machine. And 99.999% of all
programmers are in the same position. Cryptographers have decided they
don't know how to deal with those 0.001% and so they just push on ahead
assuming 2s complement so they can do useful things like cryptography.
Code that assumes long is exactly 32 bits will break on several of the
platforms I use every day.

That's nice. Do you also use C99 compilers on those platforms every
day? For your marginal platforms you already have C89.
Allowing declarations and statements to be mixed can make it easier to
declare variables whose initializations depend on previous statements
You can also do that by introducing a new scope, but that can be
awkward.

Some people call that "structure". It has some arguable reason to be
like that in C++ because declaration and construction typically happens
at the same time (though I am personally not that impressed with this
compromise in C++, since it doesn't have a way of similarly controlling
points of destruction). There is no comparable argument for this to be
in C. As I said, its purely cosmetic.
 
W

websnarf

Mark said:
C is a language whose problems are paramount and staring everyone in
the face. But C99 basically addresses none of these problems over C89.
The rise of the importance of cryptography as a field points out two
problems with the C language: 1) No simple way to express a high-word
multiply operation (even though the vast majority of CPU architectures
do support this with direct, and often highly accelerated hardware
support), [...]

Practically speaking, this is a non-issue. People just insert the
machine instruction into the code via platform-specific inline assembly
support.

Uh ... right, that's what they do. But then its not C code anymore --
its assembly. So why doesn't Python and Java simply call out to a
external library that is outside of their language (though native
methods or shell calls or whatever) to do this same thing? Obviously,
these are things for which are naturally covered by those languages
directly (in their embodiment of bignums, of course.)

The case for adding this in C is simple -- it makes efficient bignum
libraries implementable completely in standard C, it *can* be simulated
(though slowly) with all current C implementations, there is very
significant hardware support for it (all modern CPUs, especially if you
exclude UltraSparc from being a modern CPU), and other languages are
basically starting to include it with a view to enticing developers.
Some machines have a fast population count instruction. Does that mean
C needs simply-expressed support for it? Of course not.

Oh! You were so close. You meant to say: "Of course it does". I
personally endorse the inclusion of pop-count operations in the C
standard library as well. Also bit scanning instructions. Following
the impeccable reasoning you just presented.

The standard should be simply that you should demonstrate an
interesting and potentially useful algorithm that can exploit such
functionality to great effect and for which there is at least some
platform/CPU support and which can be easily emulated in pure C on
other platforms. *This* is how you make a "high-level assembler".
This is also a way to stay relevant -- developers will have more of a
"I have to have this" functionality consideration that goes into their
thinking.
[...] and 2) Non-determinable integer scalar sizes, that are not
enforced to 2s complement. The crypto community responds simply by
ignoring the standard and instead uses de facto standards (long is
exactly 32 bits, all integers are 2s complement, right shift retains
the sign of signed integers, etc.) C99 did not address this problem.
Java and Python do.

Long is exactly 32 bits? A couple of the boxes at my desk disagree
with that "de facto standard" (and yet still manage to do crypto).

Well they manage to do so with a lot of extra redundant work. Unless
they are an exacly integer multiple of 32 bits, they are also going to
end up being a lot slower because of it.
7.8.1.1 Exact-width integer types

1 The typedef name intN_t designates a signed integer type with
width N, no padding bits, and a two's complement representation. Thus,
int8_t denotes a signed integer type with a width of exactly 8 bits.

Hmmm ... this appears to be different from my version of the
documentation (presumably some free older draft). "and a two's
complement representation" isn't there in my version of the
documentation. Obviously this makes a huge difference.
Huh? Any library function which takes multiple pointers __assumes
aliasing__ between types allowed to alias, not the other way around.
What do you think 'restrict' was for?

I was referring to what C89 was. I specifically point out that C99
"fixes" the situation by making it explicit from a syntactical point of
view (adding restrict.) But in real effect, compilers don't behave
differently except for being able to go whole hog on no-alias
optimiation whenever they see "restrict".

Aliasing is also an issue in terms of *correctness*. strcat(p,p) has a
very obvious intuitive meaning that doesn't match up with its actual
meaning (which is nothing, since it performs UB). But even now that we
have this explicit syntax, we don't see any of these compilers
enforcing compile time checking for aliasing anyways. In general it
requires runtime checking to be fully enforced anyhow. Once you do
that, however, it becomes just as easy to go ahead and break out the
aliasing case and make it function in some well defined way.
 
K

Kenny McCormack

The crypto community responds simply by
ignoring the standard and instead uses de facto standards (long is
exactly 32 bits, all integers are 2s complement, right shift retains
the sign of signed integers, etc.) C99 did not address this problem.
Java and Python do.
[/QUOTE]
Code that assumes long is exactly 32 bits will break on several of the
platforms I use every day.

It is mind-numbing how dumb of a comment this is.
 
R

Richard Heathfield

(e-mail address removed) said:

*I* as a programmer can will never ever be in a position to test the
validity of code on a non-2s complement machine. I can't realistically
buy such a machine. I can't gain access to one, and I don't personally
know of anyone who has access to such a machine. And 99.999% of all
programmers are in the same position. Cryptographers have decided they
don't know how to deal with those 0.001% and so they just push on ahead
assuming 2s complement so they can do useful things like cryptography.

You can do cryptography with unsigned integers. No, really you can. And
people do. There is no need whatsoever to assume two's complement, because
there's no need whatsoever to use negative numbers.
 
K

Keith Thompson

Keith said:
(e-mail address removed) writes: [...]
and 2) Non-determinable integer scalar sizes, that are not
enforced to 2s complement.

I.e., the standard can be implemented on architectures that don't use
2's complement.

Right; and there was a time where this might have made sense. It was
certainly long before 1999.

*I* as a programmer can will never ever be in a position to test the
validity of code on a non-2s complement machine. I can't realistically
buy such a machine. I can't gain access to one, and I don't personally
know of anyone who has access to such a machine. And 99.999% of all
programmers are in the same position. Cryptographers have decided they
don't know how to deal with those 0.001% and so they just push on ahead
assuming 2s complement so they can do useful things like cryptography.

There's nothing stopping you from writing pure C code that assumes a
2's-complement representation. Just add something like this:

#include <limits.h>
#if INT_MIN == -INT_MAX
#error "This code only works on two's-complement systems."
#endif

in one of your application's header files. (I can imagine the test
not working properly in some bizarre circumstances, but it seems
unlikely.) Or, if you're really paranoid, do a test at run time that
examines the bits of a negative integer, and abort the program if it
fails (you can also check for padding bits if necessary). You're now
effectively programming in a subset of C that requires 2's-complement,
and there was no need to change the standard to forbid implementations
that use other representations.

(This is assuming you really need 2's-complement; as Richard said,
it's likely you can work just with unsigned integers and avoid the
issue altogether.)
That's nice. Do you also use C99 compilers on those platforms every
day? For your marginal platforms you already have C89.

On 64-bit systems, long is typically 64 bits. These are not "marginal
platforms", and they're becoming less marginal all the time. You can
buy an x86-64 system at your local computer store, and you can install
any of a number of mainstream operating systems on it.

And there's no need to assume that long is 32 bits anyway. If I have
C99, I can use int32_t and uint32_t. If I don't have C99, I can use
int32_t and uint32_t.

All the world was never a VAX, and all the world isn't an x86 now.
 
K

Keith Thompson

Mark said:
(e-mail address removed) wrote: [...]
[...] and 2) Non-determinable integer scalar sizes, that are not
enforced to 2s complement. The crypto community responds simply by
ignoring the standard and instead uses de facto standards (long is
exactly 32 bits, all integers are 2s complement, right shift retains
the sign of signed integers, etc.) C99 did not address this problem.
Java and Python do.

Long is exactly 32 bits? A couple of the boxes at my desk disagree
with that "de facto standard" (and yet still manage to do crypto).

Well they manage to do so with a lot of extra redundant work. Unless
they are an exacly integer multiple of 32 bits, they are also going to
end up being a lot slower because of it.

On a lot of current platforms, int is exactly 32 bits and long is
exactly 64 bits. If you want 32 bits on such a platform, there is
absolutely no problem; just use int, or some alias of int (such as
int32_t). I can only assume from your comments that you didn't know
this.

I have worked on systems that have no 32-bit integer type, but those
systems are fairly exotic, and they're optimized for floating-point
anyway. I don't know much about cryptography, but I would think that
the algoriths would work at least as well with 64-bit integers as with
32-bit integers, perhaps with some re-coding.
Hmmm ... this appears to be different from my version of the
documentation (presumably some free older draft). "and a two's
complement representation" isn't there in my version of the
documentation. Obviously this makes a huge difference.

Get a copy of n1124.pdf. It includes the offical C99 standard plus
TC1 and TC2, and it's freely available.
I was referring to what C89 was. I specifically point out that C99
"fixes" the situation by making it explicit from a syntactical point of
view (adding restrict.) But in real effect, compilers don't behave
differently except for being able to go whole hog on no-alias
optimiation whenever they see "restrict".

The C89/C90 standard does *not* allow the compiler to assume in
general that pointer parameters are not aliased, but it has specific
statements for certain library functions that says they have undefined
behavior for certain arguments (e.g., overlapping strings for
memcpy()).

The addition of "restrict" in C99 allowed some of these restrictions
to be stated directly in the function prototype. It also allows
user-written functions to impose similar restrictions.

Note that adding "restrict" *only* causes some constructs that would
otherwise be legal to invoke undefined behavior. A conforming C99
compiler could legally ignore it altogether (as long as it recognizes
the keyword, of course). Its intent is to give the compiler
permission to perform certain optimizations that it otherwise couldn't
perform. It gives the programmer a mechanism to make certain promises
to the compiler. A compiler may take advantage of those promises, but
it isn't required to.
 
F

Frederick Gotham

Keith Thompson posted:

#include <limits.h>
#if INT_MIN == -INT_MAX
#error "This code only works on two's-complement systems."
#endif


Perhaps the following would be preferable:


#if (-1 & 3) != 3
#error "This code only works on two's-complement systems."
#endif
 
K

Keith Thompson

Frederick Gotham said:
Keith Thompson posted:




Perhaps the following would be preferable:


#if (-1 & 3) != 3
#error "This code only works on two's-complement systems."
#endif

Yes, that should work. Arithmetic expressions in a "#if" are evaluted
using the same representation as long int and unsigned long int (in
C90), or intmax_t and uintmax_t (in C99), so the expression should be
evaluated as if it were being evaluated at run time.

I'm not sure which one I'd prefer.
 
W

websnarf

Keith said:
Keith said:
(e-mail address removed) writes: [...]
and 2) Non-determinable integer scalar sizes, that are not
enforced to 2s complement.

I.e., the standard can be implemented on architectures that don't use
2's complement.

Right; and there was a time where this might have made sense. It was
certainly long before 1999.

*I* as a programmer can will never ever be in a position to test the
validity of code on a non-2s complement machine. I can't realistically
buy such a machine. I can't gain access to one, and I don't personally
know of anyone who has access to such a machine. And 99.999% of all
programmers are in the same position. Cryptographers have decided they
don't know how to deal with those 0.001% and so they just push on ahead
assuming 2s complement so they can do useful things like cryptography.

There's nothing stopping you from writing pure C code that assumes a
2's-complement representation. Just add something like this:

#include <limits.h>
#if INT_MIN == -INT_MAX
#error "This code only works on two's-complement systems."
#endif

in one of your application's header files. (I can imagine the test
not working properly in some bizarre circumstances, but it seems
unlikely.)

How do I know this? As I said I don't have a machine where I can test
this.
[...] Or, if you're really paranoid, do a test at run time that
examines the bits of a negative integer, and abort the program if it
fails (you can also check for padding bits if necessary). You're now
effectively programming in a subset of C that requires 2's-complement,
and there was no need to change the standard to forbid implementations
that use other representations.

Right, or I could do nothing and watch nobody complain. BTW, which
test should I use to eliminate all other number representation systems?
Because I have no idea what all the alternatives are.
(This is assuming you really need 2's-complement; as Richard said,
it's likely you can work just with unsigned integers and avoid the
issue altogether.)

How would I know this? I use right shift, wrap around, mix exclusive
or with addition, etc as just a natural way of doing things. I know
that some of these things relies on the representation, but I don't
know what would fail on other systems.
On 64-bit systems, long is typically 64 bits.

Perhaps on marginal older 64-bit systems.
[...] These are not "marginal
platforms", and they're becoming less marginal all the time. You can
buy an x86-64 system at your local computer store, and you can install
any of a number of mainstream operating systems on it.

On x86-64 systems long is *32 bits*. This is because defacto standards
are far more compelling than unadopted ones. I'm pretty sure that
64bit UltraSparc is the same way, and I'd bet that 64bit PPC is also
the same.
And there's no need to assume that long is 32 bits anyway. If I have
C99, I can use int32_t and uint32_t. If I don't have C99, I can use
int32_t and uint32_t.

All the world was never a VAX, and all the world isn't an x86 now.

Tell that to Apple, Sun and DEC.
 
W

websnarf

Richard said:
(e-mail address removed) said:

You can do cryptography with unsigned integers. No, really you can. And
people do. There is no need whatsoever to assume two's complement, because
there's no need whatsoever to use negative numbers.

Even on 32bit rotate operations? Go look up RFC 1321. They have
things like this in their source:

/* UINT4 defines a four byte word */
typedef unsigned long int UINT4;

and this:

/* ROTATE_LEFT rotates x left n bits. */
#define ROTATE_LEFT(x, n) (((x) << (n)) | ((x) >> (32-(n))))

in there. Its an RFC, and of all the things I've heard complaints
about MD5, portability is not one of them. So certainly you *can* do
cryptography on unsigned int, and people do, and it works because the
defacto standard is that its exactly 32 bits.
 
R

Richard Heathfield

(e-mail address removed) said:
Even on 32bit rotate operations?

Yes, there is no need whatsoever to use negative numbers in a 32-bit rotate
operation.
Go look up RFC 1321. They have
things like this in their source:

/* UINT4 defines a four byte word */
typedef unsigned long int UINT4;

That looks like an unsigned integer to me, so it seems you are arguing my
case for me. Thank you. Incidentally, the comment in the RFC is wrong.
UINT4 does not define a four byte word. It is merely a synonym for unsigned
long int, which is at least *one* byte wide and *at least* 32 bits wide.
But we are not discussing 32-bit issues here; we are discussing your claim
that there is a need to assume two's complement for crypto work.
and this:

/* ROTATE_LEFT rotates x left n bits. */
#define ROTATE_LEFT(x, n) (((x) << (n)) | ((x) >> (32-(n))))

That looks like it would be suicidally unwise to use on anything /except/ an
unsigned integer type.
in there. Its an RFC, and of all the things I've heard complaints
about MD5, portability is not one of them. So certainly you *can* do
cryptography on unsigned int,

....which was the point I was making. You seemed to think that there was some
need to assume two's complement. Am I right in thinking that you now
concede you were incorrect to think so?

and people do, and it works because the
defacto standard is that its exactly 32 bits.

No, you can get crypto to work with unsigned integer types that are not 32
bits in size, too. I've proved you wrong on the need for two's complement,
and I can prove you wrong on the need for exactly-32-bit ints if you like.
 
I

Ike Naar

On x86-64 systems long is *32 bits*. This is because defacto standards
are far more compelling than unadopted ones. I'm pretty sure that
64bit UltraSparc is the same way, and I'd bet that 64bit PPC is also
the same.

System Configuration: Sun Microsystems sun4u Sun Ultra 45 Workstation
Solaris 10, Sun Studio 11 C compiler
$ cat a.c
#include <stdio.h>
int main(void)
{
printf("sizeof(long)=%lu\n", (unsigned long)sizeof(long));
return 0;
}
$ /opt/SUNWspro/bin/cc -xarch=generic64 a.c
$ ./a.out
sizeof(long)=8

Kind regards,
Ike
 
C

Chris Hills

Frederick Gotham said:
Keith Thompson posted:




Do you think the C99 Standard will ever replace the C90 Standard, or do you
think the C community is content with C90 and does not want to change
anything?

Technically C99 has replaced C90. Most compilers are at C95 that is C90
with Amendment1 and the two TC's

A lot of the compiler front ends are C99 compliant as are some of the
libraries it is just that the majority of working C programmers are not
that interested in C99 or it would have been implimented by now.
 
C

Chris Hills

Joe Wright said:
Is there a case where a correct C90 program cannot be successfully
compiled by a C99 compiler? Actually this is not a compiler question but
a language one. Is there anything in C90 that C99 must reject or handle
differently?

Yes but more of a problem are the places where C will compiler under
either C90 or C99 but the behaviour changed. There area few places where
some "minor" things were changed and it can have an effect. I must dig
out the list.
 
I

Ian Collins

Ike said:
System Configuration: Sun Microsystems sun4u Sun Ultra 45 Workstation
Solaris 10, Sun Studio 11 C compiler
$ cat a.c
#include <stdio.h>
int main(void)
{
printf("sizeof(long)=%lu\n", (unsigned long)sizeof(long));
return 0;
}
$ /opt/SUNWspro/bin/cc -xarch=generic64 a.c
$ ./a.out
sizeof(long)=8
Same on AMD64.
 
C

Chris Hills

Richard Heathfield said:
I think the part of the C community that cares about portability wants a
widely-implemented standard, which it already has. Until C99 becomes as
widespread as C90, why would anyone use it if they need portability?

And the part of the C community that couldn't give two hoots about
portability aren't interested in the Standard anyway - they just want
something that works on /their/ compiler, and many such people think the
Standard (if they've even heard of it) takes second place to their compiler
documentation.

A lot of people are interested in the standard but not portability. this
is mainly in the embedded areas where there are extensions to the
language due to architecture of the MCU.

Ie Use standard C except where the hardware requires a deviation for
efficient programming. Embedded systems tend to have restricted memory
 
C

Chris Hills

Andrew said:
<OT>
lot of new features anymore). My point is that people are going to just
give up on Microsoft sooner than they'll get a conforming implementation of
/anything/,
</OT>


completely incorrect.
let alone C.

quite possibly. They have moved on a bit from C so it is somewhat
irrelevant.
On the other hand, by 2010 GNU will likely have near-perfect C99 support, as
will plenty of other major compilers. At that point, Microsoft might wake up
and fix themselves. But maybe not.

They don't need to they are going the ECMA route and fast track ISO.
Of course, C89 certainly isn't broken, so C99 conformance isn't too much of
an actual issue.

You mean C95. C99 conformance is a major issue in some areas.
 
R

Roland Csaszar

Keith Thompson wrote:

Perhaps on marginal older 64-bit systems.

You want to look up the differences between LLP64 (used by MS's compiler)
and LP64 (used by any other Unix compiler on 64-bit Platforms).
On x86-64 systems long is *32 bits*. This is because defacto standards
are far more compelling than unadopted ones. I'm pretty sure that
64bit UltraSparc is the same way, and I'd bet that 64bit PPC is also
the same.

No. MS uses 32bit longs for their compiler on x86-64, Sun (SPARC, x86_64),
SGI (MIPS), IBM (Power), GNU (many ;), HP (HP-UX, Itanium, Alpha),
Pathscale, Portland, ... all use 64bit longs in their compilers.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,184
Messages
2,570,979
Members
47,579
Latest member
CharaS3188

Latest Threads

Top