Horrible Visual C Bug!

J

Joona I Palaste

Falcon Kirtarania <[email protected]> scribbled the following
Really, all it comes down to is that unless you really don't give a shit
what is in EAX after your program executes, you damn well better return int.
Theoretically, within standard, it could simply not set EAX on returning
void as it does for everything else. Then you would end up with return
codes that are pseudorandom numbers.

Do you really still think that all the world's a Wintel box?

--
/-- Joona Palaste ([email protected]) ---------------------------\
| Kingpriest of "The Flying Lemon Tree" G++ FR FW+ M- #108 D+ ADA N+++|
| http://www.helsinki.fi/~palaste W++ B OP+ |
\----------------------------------------- Finland rules! ------------/
"As we all know, the hardware for the PC is great, but the software sucks."
- Petro Tyschtschenko
 
R

Richard Heathfield

[Followups set to comp.lang.c]

Falcon said:
Christian Bau said:
On the other hand, imagine you are in a job interview and you are asked:
What will this statement do?

i = 3;
a [i++] = i;

I recommend that you answer: It could store the number 3 or the number 4
into a [3]. Anything beyond that and you might confuse the interviewer.

Theoretically, it would execute:

LINE 1 WATCH: i == 3 true
LINE 2: a[3]==3, and at the end i=4

Theoretically, it /could/ do that, at least partly on the grounds that,
theoretically, it could do anything at all.
because i++ is a postdecrement and taxes place at the end of the line,
doesn't it? Or is it the statement?

The behaviour is undefined because it violates a "shall" outside a
constraint, just like void main. The relevant Standard text (3.3 in C89,
6.5(2) in C99), is: "Between the previous and next sequence point an object
shall have its stored value modified at most once by the evaluation of an
expression. Furthermore, the prior value shall be accessed only to
determine the value to be stored."

The code

a[i++] = i; /* bug */

violates that "shall" clause, invoking UB. No diagnostic is required.
 
M

Martien Verbruggen

Christian Bau said:
On the other hand, imagine you are in a job interview and you are asked:
What will this statement do?

i = 3;
a [i++] = i;

I recommend that you answer: It could store the number 3 or the number 4
into a [3]. Anything beyond that and you might confuse the interviewer.

Theoretically, it would execute:

LINE 1 WATCH: i == 3 true
LINE 2: a[3]==3, and at the end i=4

because i++ is a postdecrement and taxes place at the end of the line,
doesn't it? Or is it the statement?

No, it doesn't, at least not necessarily. This subject is discussed at
least once per week in comp.lang.c, so I suggest you use
group.google.com to find a few of those discussions, and also that you
have a look at the C FAQ, questions 3.1 to 3.3, where this is discussed.

Martien
 
R

Richard Heathfield

[Followups set to comp.lang.c]

Falcon said:
Really, all it comes down to is that unless you really don't give a shit
what is in EAX after your program executes, you damn well better return
int.

If you really understood, you'd know that not all machines /have/ a register
called EAX. Furthermore, the Standard doesn't mandate that a return value
be stored in a register at all.
Theoretically, within standard, it could simply not set EAX on
returning
void as it does for everything else. Then you would end up with return
codes that are pseudorandom numbers.

Theoretically, it could do anything at all, as far as the Standard is
concerned. That's what "undefined behaviour" means. Returning pseudorandom
numbers would be a relatively harmless outcome compared to some I can
imagine.

<snip>
 
R

Richard Heathfield

Joona I Palaste wrote:

BTW, aren't all implementation-specific extensions such as the Windows
API also undefined behaviour?

Depends on your definition of "undefined". Personally, I agree that
extensions effectively invoke undefined behaviour. I seem to recall that
some people disagree with me. The name "Doug Gwyn" springs to mind, but I
may have disremembered.
 
B

Bruce Wheeler

A: Because it makes your article more difficult to read.

Q: Why shouldn't I top-post?

The relevance is intact, because the MS Visual Studio C++ .NET compiler will
also compile C.

The relevance of what to what is intact?

VS.NET will compile C and C++, depending on the settings you
provide.

However, C is not C++, just as C++ is not C.

Regards,
Bruce Wheeler
 
K

Kevin Easton

Joona I Palaste said:
Richard Heathfield <[email protected]> scribbled the following
[Followups set to comp.lang.c]
Falcon Kirtarania wrote:
Feel free to write a C compiler which diagnoses any and all instances of
undefined behaviour. Consider whether it should generate a diagnostic for
this code. If so, why? If not, why not?
int foo(int *a, int *b)
{
return *a = *b++;
}

Well, I believe that the problem here could be solved by using run-time
checks for undefined behaviour. That would make the generated code quite
slow, but it would still work correctly.

Many instances of undefined behaviour are there specifically to absolve
the implementation of the responsibility of detecting them - forcing the
implementation to detect them anyway makes the reason they were
undefined in the first place moot.

- Kevin.
 
A

Arthur J. O'Dwyer

Then, if we want to achieve what Falcon is suggesting, let's leave the
definition of undefined behaviour as it is, and instead change the
standard from saying "such and such results in undefined behaviour" to
"such and such results in an error". Whether we *do* want to achieve it
is another question.

Simple answer: Of course we don't want to achieve it!

Complex answer: Detecting all UB at runtime (or any time) is
computationally equivalent to solving the halting problem, which
isn't something we want to foist on compiler vendors. ;)

Dumb answer: Are there any classes of UB that *are* feasible
to detect at compile-time or runtime? 'foo main', where 'foo'!='int',
looks easy to me. I can't think of any others off the top of my
head.

-Arthur
 
J

jacob navia

Richard Heathfield said:
Joona I Palaste wrote:



Depends on your definition of "undefined". Personally, I agree that
extensions effectively invoke undefined behaviour.

Yes. When adding extensions to lcc-win32 I have followed explicitely undefined behaviour to avoid
being incompatible with existing code.

For instance this extension is undefined behavior in standard C:

/* Define a new operator addition for a user defined type of numbers */
int operator+(Number a,Number b) { ... }

No legal program can use that, so it is a compatible extension. I took pains that

int operator = 5;

still works, of course.

As far as the standard goes, extensions are not forbidden. They just should not introduce new
keywords in contexts where they would invalidate existing code. Existing code that writes standard C
like:

int operator = 5;

should always compile what is intended.

Syntax extensions should avoid the user name space. Microsoft proposed under windows
__try { /* guarded code block }
__except( /* integer expression */ ) {
/* exception code block */
}
Since no legal C program can be written like this, this is a compatible extension. Lcc-win32
followed that proposal.

Another Microsoft extension that is necessary under windows is
__declspec(dllexport)
to indicate to the linker to export that symbol in the export table of the DLL being compiled.
Again, the user name space was preserved.

More difficult to follow was __int64 for long long. In this case lcc-win32 added
#define __int64 long long
automatically at startup to be able to compile code that uses that symbol. Since it wasn't in the
legal namespace anyway I think it is OK.

More problematic were the windows API headers. They are huge, and lcc-win32 was forced to provide an
ANSI C version, avoiding the dreaded

asm { /* a lot of assembly in microsoft syntax */}

that polluted so many headers. This has gotten better now, and many SDK headers are compliant.
Still, sometimes I wonder what does it mean:
typedef struct tagWindowsStruct {
...
DWORD bitfield:2;
}

a DWORD is an unsigned long, what really doesn't fit into 2 bits... Why not use
unsigned bitfield:2;

instead ???
 
A

Arthur J. O'Dwyer

When adding extensions to lcc-win32 I have followed explicitly
undefined behaviour to avoid being incompatible with existing code.

For instance this extension is undefined behavior in standard C:

/* Define a new operator addition for a user defined type of numbers */
int operator+(Number a,Number b) { ... }

No legal program can use that, so it is a compatible extension. I took
pains that

int operator = 5;

still works, of course.

And, presumably,

int foo, *bar;
....
void baz()
{
int operator=(foo *bar[5]);
...
}

does the Right Thing(tm) as well. I'm glad it's you implementing that
sort of thing, and not me. :)
As far as the standard goes, extensions are not forbidden. They just
should not introduce new keywords in contexts where they would
invalidate existing code. Existing code that writes standard C like:

int operator = 5;

or even more pathological cases,
should always compile what is intended.
More difficult to follow was __int64 for long long. In this case
lcc-win32 added
#define __int64 long long
automatically at startup to be able to compile code that uses that symbol.
Since it wasn't in the legal namespace anyway I think it is OK.

It is okay, AFAICT. But why would __int64 be any harder to add than
any other extension, if you don't mind my asking?
More problematic were the windows API headers. They are huge, and
lcc-win32 was forced to provide an ANSI C version, avoiding the dreaded

asm { /* a lot of assembly in microsoft syntax */}

that polluted so many headers. This has gotten better now, and many
SDK headers are compliant. Still, sometimes I wonder what does it mean:

typedef struct tagWindowsStruct {
...
DWORD bitfield:2;
}

a DWORD is an unsigned long, what really doesn't fit into 2 bits...
Why not use
unsigned bitfield:2;

No reason. This looks like yet another MS extension, and I *think*
that the code as it stands invokes undefined behavior, so it's fair
game for an extension. Presumably this allows

DWORD wide_bitfield:20;

(or any large number of bits), so the implementors just used DWORD
everywhere for consistency.

-Arthur
 
J

jacob navia

Dumb answer: Are there any classes of UB that *are* feasible
to detect at compile-time or runtime? 'foo main', where 'foo'!='int',
looks easy to me. I can't think of any others off the top of my
head.

Pointers

The set of all pointers in the program is initialized at startup. They are either NULL or they point
to valid addresses, established in the raw data of the program. For instance:

int a,*pint = &a;

The set of valid addresses is established by the compiler at startup. When control arrives at main()
all pointers are valid.

Undefined behavior (what pointers concerns) is when any pointer is used that
1) Has not been initialized to point to an existing valid object.
or
2) Is NULL or points somewhere else than

Object start address <= p < (start address)+sizeof(Object)

We can distinguish two types of pointers:

A) Unbounded pointers, i.e. pointers where the calculation of sizeof(Object) is impossible
B) Bounded pointers where sizeof(Object) is known and can be checked at run time.

Detecting this class of UB is called bounds checking and is done in many languages.
C is notoriously lacking this facility. Worse, the machine is not used to automatically test the
programmer's assumptions and all pointers are considered unbound.

Lisp, APL, and many other languages check array accesses and avoid memoy corruption. C doesn't, and
we are plagued by memory corruption and obscure bugs.

An improvement would be to encourage the automatic checking of object accesses and discouraging the
usage of unbounded pointers. Instead of writing:

void matmult(int n,int m, double *pmat)

we would write:

void matmult(int n, int m, double mat[n][m]);

Such a proto would allow to check in the calling program that the buffer passed has enough space as
declared, and in the called function it would be possible to check that no index is being misused.

Most of this attitude comes because the Pascal language has this facility, and many C people see
Pascal as something quite horrible.

I think that was a good feature of Pascal. I miss this in C and I see each day the consequences in
array overruns, obscure bugs, and many other problems. Encouraging the use of bounded pointers would
introduce some hygienical concepts isn't it?

Nobody is proposing banning unbounded pointers. They should remain for special uses or in old
software. Encouraging the use of bounded pointers will make them slowly les and less frequent,
that's all.

The sizeof calculation is very problematic in C because of the refusal of passing this information
in array prototypes by the standards comitee. Of course this has historical reasons, but I just do
not understand why in 2003 we still want to save us the few machine cycles that that would cost, and
spare the users and the programmers the stack overruns, memory corruption and other problems!

An array decays in C, to an unbounded pointer when passed to a subroutine. All sizeof information is
not passed along. This is (maybe) efficient but it is a problem for checking the bounds of array
indexes!

jacob
 
P

pete

Richard said:
Joona I Palaste wrote:



Depends on your definition of "undefined". Personally, I agree that
extensions effectively invoke undefined behaviour.
I seem to recall that some people disagree with me.
The name "Doug Gwyn" springs to mind, but I
may have disremembered.

The consensus of comp.std.c,
was that a program which calls a clear screen function
exhibits behavior which is not defined by the standard,
and which is also not considered to be undefined behavior.
I don't understand the importance of the distinction.
 
R

Richard Heathfield

pete said:
[...] Personally, I agree that
extensions effectively invoke undefined behaviour.
I seem to recall that some people disagree with me.
The name "Doug Gwyn" springs to mind, but I
may have disremembered.

The consensus of comp.std.c,
was that a program which calls a clear screen function
exhibits behavior which is not defined by the standard,
and which is also not considered to be undefined behavior.
I don't understand the importance of the distinction.

I believe you're right about the csc consensus. I'm afraid I am just as much
in the dark over the difference between undefined behaviour and behaviour
which is not defined. I think I'm correct in saying that the committee sees
the two words "undefined behaviour" as being a key term with a particular
meaning, the meaning being, of course, the "this Standard imposes no
requirements" thing. But since the Standard imposes no requirements on
extensions, either, I continue to fail to see the distinction.
 
S

Steve Zimmerman

Falcon said:
Theoretically, any undefined behavior should return an error (as in, the
standard should change). This might help solve the troubles of awful
programming.
]
when
their
platform.
void

Good question. Having a main () function declared as "void main ()"
invokes undefined behavior. That means that according to the C Standard,
anything could happen. And when I say anything, I mean _absolutely
anything_. If you compile and run this program:

void main () { printf ("Hello, world\n"); }

then it could happen that your computer explodes, or your harddisk gets
formatted, and you can't complain that your compiler is not a Standard C
compiler. (You still can complain that a common mistake like that
shouldn't explode your computer, but you can't complain that the
compiler is not conforming to the C Standard. )

You can check the documentation for your compiler. Maybe it defines what
will happen; if the C Standard leaves something undefined then any
compiler is allowed to define it. Maybe your compiler refuses to compile
the program; I would say that would be a very sensible approach. Maybe
your program crashes as soon as you start it, maybe it crashes just when
it finishes. Maybe the operating system puts up an alert that says:
"Warning: Program xxxx seems to be broken. Please contact the
manufacturer of this program for further advice. ". Anything could
happen.

So my question is this: You have Microsoft code (not conforming to the
C standard); you have Linux code (which shitheads on this place say
doesn't conform to the C standard); what _does_ conform to the C
standard? The fucking document itself? There's standard and there's
real world. Standards are important and I like them, but the standards
inform the real world code _and_ real world code informs standards.
Standards are meant to be helpful.
 
S

Steve Zimmerman

Falcon said:
Theoretically, any undefined behavior should return an error (as in, the
standard should change). This might help solve the troubles of awful
programming.
]
when
their
platform.
void

Good question. Having a main () function declared as "void main ()"
invokes undefined behavior. That means that according to the C Standard,
anything could happen. And when I say anything, I mean _absolutely
anything_. If you compile and run this program:

void main () { printf ("Hello, world\n"); }

then it could happen that your computer explodes, or your harddisk gets
formatted, and you can't complain that your compiler is not a Standard C
compiler. (You still can complain that a common mistake like that
shouldn't explode your computer, but you can't complain that the
compiler is not conforming to the C Standard. )

You can check the documentation for your compiler. Maybe it defines what
will happen; if the C Standard leaves something undefined then any
compiler is allowed to define it. Maybe your compiler refuses to compile
the program; I would say that would be a very sensible approach. Maybe
your program crashes as soon as you start it, maybe it crashes just when
it finishes. Maybe the operating system puts up an alert that says:
"Warning: Program xxxx seems to be broken. Please contact the
manufacturer of this program for further advice. ". Anything could
happen.

So Micro
 
A

Arthur J. O'Dwyer

...


The standard just says int/unsigned int as possible types for a bit field.
I added long/unsigned long, short, and even char. This makes the compiler
more usable but strictly speaking the standard says int/unsigned.


Yes, but "unsigned" would do the trick too... And since you are specifying
the number of bits, there are no 64 bit portability issues!

I don't know, but I always assumed that the number following the colon in
a bit-field had to be less than or equal to sizeof(int). 20 isn't less
than sizeof(int) on some implementations. Make that example

DWORD wide_bitfield:48;

if you like; then the same thing applies (and making it
'int wide_bitfield:48' wouldn't work, I think).

-Arthur
 
H

Horst von Brand

Falcon Kirtarania said:
Really, all it comes down to is that unless you really don't give a shit
what is in EAX after your program executes,

My usual work machine has no EAX of any sort...
you damn well better return int.

_You_ (the programmer) might not care, the implementation of C you are
using (or will be using some day in the future, after you thoroughly
forgot about the issue, or (even worse) you will be using to run
programs written by some moron 15 years back) might very well care a lot.
Theoretically, within standard, it could simply not set EAX on returning
void as it does for everything else. Then you would end up with return
codes that are pseudorandom numbers.

There you are talking about _one_ way a _particular_ compiler for a
_certain_ architecture _might_ do things today within the standard.
Fine as long as you don't care about ever changing anything in this
equation. But that is true only of throwaway programs, why bother with
C then? Do it in Perl, Python, ...
 
L

Louis DeFiore

And what? Make these errors on gcc instead? Not that I have any great
love for Microsoft, but thats not the problem here.
 
G

goose

Alexander Grigoriev said:
My favorite example of a crappy compiler is GNU ARM C/C++ compiler. We've
had so many problems with it, and it's slow as molassa.
One time, for example, it just didn't complain about a global pointer,
_defined_ in multiple compilation units, and _twice_ defined in one of them,
for example:

struct A * pa;

struct A * pa=& a;

I dont think that that is an error.
-----------------------------------
[LManickum@lee] Tue Jul 29 10:46:42 [1 bg] /usr/src/hw.c
36 ok cat hw.c
#include <stdio.h>
#include <stdlib.h>

struct A {
char *a;
};

static struct A a;
struct A *pa;
struct A *pa = &a;

int main (void) {
a.a = "Hello World\n";
printf ("%s\n", pa->a);
return EXIT_SUCCESS;
}
[LManickum@lee] Tue Jul 29 10:46:44 [1 bg] /usr/src/hw.c
37 ok make
gcc.exe -c -ansi -W -Wall -pedantic -ggdb -c -o hw.o hw.c
gcc.exe -o hw.exe hw.o
[LManickum@lee] Tue Jul 29 10:46:49 [1 bg] /usr/src/hw.c
38 ok ./hw.exe
Hello World
-----------------------------------

see ?

Second definition just didn't have any effect,

that is just plain wrong.
without any diagnostics. Of
course, the first definition should have been a declaration, with 'extern',
but the compiler didn't give any help to find it.

When I've tried MS eVC ARM C, I thought it's godsent.

goose,
hth
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,077
Messages
2,570,566
Members
47,202
Latest member
misc.

Latest Threads

Top