help: gcc compilation difference

N

new

Hi C Experts,

I have the following program:

poitr.c
----------------------------------
#include<stdio.h>
#include<malloc.h>

int main()
{
char *s;
char p[] = "abcda";
s = malloc(sizeof(char) *256);
s[p[0]]++;
s[p[2]]++;
return 0;
}
-----------------------------------
When I compiled the above program with the command " gcc poitr.c" the
compilation was successful.
But when I compiled with "gcc -Wall -o poitr poitr.c" compiler has
thrown below warning messages.
-------------------------------------------------------
warning: array subscript has type 'char'
-------------------------------------------------------
Questions:
1/ what is the difference between these two compilations?
2/ Which one I need to use?
3/ Why the warnings have been hidden with the first compilation
4/ Can you please explain the operation of s[p[0]]++.
------------------------------------------------------
OS used: Fedora10

Appreciate your help.

Thanks a lot.
 
O

OrganizedChaos

Hi C Experts,

I have the following program:

Questions:
1/ what is the difference between these two compilations?
2/ Which one I need to use?
3/ Why the warnings have been hidden with the first compilation
4/ Can you please explain the operation of s[p[0]]++.
------------------------------------------------------
OS used: Fedora10

Appreciate your help.

Thanks a lot.

The "-Wall" option enables all warnings. It is just showing a warning
that does not prevent the compiler from completing successfully. Since
this option was not enabled in the first compilation, the warning was
not displayed. As far as I can tell, there are no differences in the
end result.

Hope this helps
 
K

Keith Thompson

new said:
Hi C Experts,

I have the following program:

poitr.c
----------------------------------
#include<stdio.h>
#include<malloc.h>

int main()
{
char *s;
char p[] = "abcda";
s = malloc(sizeof(char) *256);
s[p[0]]++;
s[p[2]]++;
return 0;
}

Really? Where did you get it?
-----------------------------------
When I compiled the above program with the command " gcc poitr.c" the
compilation was successful.
But when I compiled with "gcc -Wall -o poitr poitr.c" compiler has
thrown below warning messages.
-------------------------------------------------------
warning: array subscript has type 'char'
-------------------------------------------------------
Questions:
1/ what is the difference between these two compilations?
2/ Which one I need to use?
3/ Why the warnings have been hidden with the first compilation
4/ Can you please explain the operation of s[p[0]]++.

The difference is the "-Wall" option; consult your gcc documentation
for details. (You asked for more warnings; you got more warnings.)

In general, using an expression of type char as an array index can be
dangerous, since plain char may be either signed or unsigned, and a
negative array index will, in most circumstances, attempt to access
memory outside the array.

In this particular case, it's not a problem, since the values of p[0]
('a') and p[2] ('c') happen to be members of the basic execution
character set, and are therefore guaranteed to be non-negative.

There are several other problems with the program, including, but not
necessarily limited to, the following:

<malloc.h> is non-standard; use <stdlib.h> instead.

"int main()" should be "int main(void)" (this isn't likely to be a
real problem).

The result of malloc() is not checked.

The elements of the allocated array are uninitialized.

It's conceivable, but vanishingly unlikely, that the values of 'a' and
'c' could exceed 255, causing the indexing operations to go past the
end of the array (this is only possible if CHAR_BIT > 8, and even then
it won't happen with any character encoding I've ever heard of).

Nothing is done with the results of the computations; the entire
program could legitimately be optimized down to:

int main(void) { return 0; }
 
S

Seebs

Keith Thompson said:
new said:
Hi C Experts,

I have the following program:

poitr.c
----------------------------------
#include<stdio.h>
#include<malloc.h>

int main()
{
char *s;
char p[] = "abcda";
s = malloc(sizeof(char) *256);
s[p[0]]++;
s[p[2]]++;
return 0;
}

Really? Where did you get it?
Nothing is done with the results of the computations; the entire
program could legitimately be optimized down to:

int main(void) { return 0; }

Surely this optimisation will behave differently from the
above code since it will not call malloc?

How/why does the compiler know/assume that the call
can be optimised away?

Paul.
 
S

santosh

paul said:
Keith Thompson said:
new said:
Hi C Experts,

I have the following program:

poitr.c
----------------------------------
#include<stdio.h>
#include<malloc.h>

int main()
{
char *s;
char p[] = "abcda";
s = malloc(sizeof(char) *256);
s[p[0]]++;
s[p[2]]++;
return 0;
}

Really? Where did you get it?
Nothing is done with the results of the computations; the entire
program could legitimately be optimized down to:

int main(void) { return 0; }

Surely this optimisation will behave differently from the
above code since it will not call malloc?

The net result after both programs have run would be the same
(assuming the host can reclaim the malloc'ed memory which was not
freed.
How/why does the compiler know/assume that the call
can be optimised away?

It can't and it won't, atleast for a few more decades. But Keith can,
and did:) I guess he's making a point to the OP that the program as
posted seems pointless.
 
K

Keith Thompson

paul said:
Keith Thompson said:
#include<stdio.h>
#include<malloc.h>

int main()
{
char *s;
char p[] = "abcda";
s = malloc(sizeof(char) *256);
s[p[0]]++;
s[p[2]]++;
return 0;
}
[...]

Nothing is done with the results of the computations; the entire
program could legitimately be optimized down to:

int main(void) { return 0; }

Surely this optimisation will behave differently from the
above code since it will not call malloc?

Calling malloc is not part of the program's behavior, which is defined
by the standard as "external appearance or action".
How/why does the compiler know/assume that the call
can be optimised away?

Because malloc is part of the standard library, the implementation
is free to assume that it behaves as the standard specifies.
If the call succeeds, then the program continues to execute and
produces no output. If the call fails, then the behavior of the
following statements is undefined -- and one possible behavior
is continuing to execute and producing no output. If the program
calls a different function with the same name (say, one declared
in the non-standard header <malloc.h> and perhaps implemented in
some non-standard library), then again, the behavior is undefined,
and the implementation is free to assume that it will produce
no output. (If <malloc.h> defines "malloc" as a macro that does
something other than calling malloc, then this doesn't apply,
but I implicitly assumed that that wasn't the case.)

If the call were to some external function that's not part of the C
standard library, the compiler wouldn't be free to perform this kind
of optimization unless it happened to know what the function does;
for example, an implementation might perform some optimizations at
link time.
 
S

santosh

Seebs said:
Keith Thompson said:
Hi C Experts,

I have the following program:

poitr.c
----------------------------------
#include<stdio.h>
#include<malloc.h>

int main()
{
char *s;
char p[] = "abcda";
s = malloc(sizeof(char) *256);
s[p[0]]++;
s[p[2]]++;
return 0;
}

Really? Where did you get it?
Nothing is done with the results of the computations; the entire
program could legitimately be optimized down to:

int main(void) { return 0; }

Surely this optimisation will behave differently from the
above code since it will not call malloc?

How/why does the compiler know/assume that the call
can be optimised away?

Paul.

Seems as if your follow-up was "optimised away." ;-)
 
N

Noob

Richard said:
-Wall tells gcc to be a tiny bit picky about the code.

For reference.

<quote documentation>

-Wall enables all the warnings about constructions that some users
consider questionable, and that are easy to avoid (or modify to prevent
the warning), even in conjunction with macros.

Note that some warning flags are not implied by -Wall. Some of them warn
about constructions that users generally do not consider questionable,
but which occasionally you might wish to check for; others warn about
constructions that are necessary or hard to avoid in some cases, and
there is no simple way to modify the code to suppress the warning. Some
of them are enabled by -Wextra but many of them must be enabled
individually.

</quote>

http://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html
-Wall is rather less than the bare minimum warning level you need for
portable C programming with gcc. At the very least, I would use:

-W -Wall -ansi -pedantic

(NB: -W is the older name)

<quote>

-Wextra enables some extra warning flags that are not enabled by -Wall.
(This option used to be called -W. The older name is still supported,
but the newer name is more descriptive.)

</quote>

-ansi is equivalent to -std=c89,
other useful values are c99 and iso9899:199409

http://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html
http://gcc.gnu.org/onlinedocs/gcc/Standards.html

Regards.
 
N

new

Richard,Keith and all thanks a ton for your replies.
I have one more question.
In the code if I add the following line:
 
S

santosh

new said:
Richard,Keith and all thanks a ton for your replies.
I have one more question.
In the code if I add the following line:

Since you don't initialise the s array, it's members could contain
any garbage value, normally the value that was last written to the
memory location. So by extension, we cannot say z contains any useful
value after your assignment. In standard C parlance, it's value is
indeterminate, and reading an indeterminate object, as in the RHS of
your assignment statement, invokes undefined behaviour, again in the
standard's terminology. IOW, it assigns to z, a garbage value at the
very least, and could cause much worse behaviour at worst.

I think others have already explained how the array indexing is
evaluated and it's problems.
 
B

Ben Bacarisse

santosh said:
Since you don't initialise the s array, it's members could contain
any garbage value, normally the value that was last written to the
memory location. So by extension, we cannot say z contains any useful
value after your assignment. In standard C parlance, it's value is
indeterminate, and reading an indeterminate object, as in the RHS of
your assignment statement, invokes undefined behaviour, again in the
standard's terminology.

Is this really true? I don't think it is, at least not in the general
way that is often presented here. An indeterminate value is either a
valid value of the type or it is a trap representation. Thus, on
system with no trap representations for objects interpreted as having
type T, accessing an indeterminate value of type T must simply be
unspecified.

Now, in this case, s was of type char. 6.2.6.1 p5 which defines and
discusses trap representations states that:

Certain object representations need not represent a value of the
object type. If the stored value of an object has such a
representation and is read by an lvalue expression that does not
have character type, the behavior is undefined. If such a
representation is produced by a side effect that modifies all or any
part of the object by an lvalue expression that does not have
character type, the behavior is undefined. Such a representation is
called a trap representation.

I read that as forbidding UB when a trap representation is accessed
via an lvalue expression of type char which is the case here, is it
not?

Life would be simpler if such accesses were always undefined, but I
don't think that is how C is currently defined.

<snip>
 
S

santosh

Richard Heathfield said:
new said:
Richard,Keith and all thanks a ton for your replies.
I have one more question.
In the code if I add the following line:


Assuming p[0] has the value 'a', and assuming s['a'] has the value
0, after the statement has been executed s['a'] will have the value
1, and so will z. Which precise element of s is described by s['a']
depends on the character set encoding on your implementation. For
example, in ASCII you'd be looking at s[97] if I remember rightly,
whereas in EBCDIC you'd be looking at s[129].

z will have the value 1 after the assignment? Won't it be zero, since
it's a post-increment?
 
S

Stefan Ram

Keith Thompson said:
is continuing to execute and producing no output. If the program
calls a different function with the same name (say, one declared
in the non-standard header <malloc.h> and perhaps implemented in
some non-standard library), then again, the behavior is undefined,

Couldn't this be implementation-defined in a freestanding
implementation (last sentence of #1 of 5.1.2.1 of ISO/IEC
9899:1999 (E))?
 
K

Keith Thompson

Couldn't this be implementation-defined in a freestanding
implementation (last sentence of #1 of 5.1.2.1 of ISO/IEC
9899:1999 (E))?

Yes. I was assuming a hosted implementation, which was strongly (but
not absolutely) implied by the "#include<stdio.h>" in the OP's code.
I should have made that assumption explicit, especially since
I was being painfully pedantic anyway.
 
T

Tim Rentsch

Ben Bacarisse said:
Is this really true? I don't think it is, at least not in the general
way that is often presented here. An indeterminate value is either a
valid value of the type or it is a trap representation. Thus, on
system with no trap representations for objects interpreted as having
type T, accessing an indeterminate value of type T must simply be
unspecified.

Now, in this case, s was of type char. 6.2.6.1 p5 which defines and
discusses trap representations states that:

Certain object representations need not represent a value of the
object type. If the stored value of an object has such a
representation and is read by an lvalue expression that does not
have character type, the behavior is undefined. If such a
representation is produced by a side effect that modifies all or any
part of the object by an lvalue expression that does not have
character type, the behavior is undefined. Such a representation is
called a trap representation.

I read that as forbidding UB when a trap representation is accessed
via an lvalue expression of type char which is the case here, is it
not?

I believe that's a misreading. What the passage says is that if the
access type is a non-character type then the behavior is undefined.
It does not say that if the access type is a character type then the
behavior is defined. Access through a character type interprets the
stored value (ie, the representation) according to the type used to do
the read; if the access type is (char) or (signed char) and the
representation read is a trap representation for that type, it's still
undefined behavior, because there's no (Standard-)defined way to
produce a value from a trap representation. Or if you think there
is, what section in the Standard defines it?
 
B

Ben Bacarisse

Tim Rentsch said:
Ben Bacarisse said:
santosh said:
Richard,Keith and all thanks a ton for your replies.
I have one more question.
In the code if I add the following line:
------------------------------------
z = s[p[0]]++; // say z is declared as int
------------------------------------
what would be the value of z and how it is evaluated?

Thanks a lot in advance.

Since you don't initialise the s array, it's members could contain
any garbage value, normally the value that was last written to the
memory location. So by extension, we cannot say z contains any useful
value after your assignment. In standard C parlance, it's value is
indeterminate, and reading an indeterminate object, as in the RHS of
your assignment statement, invokes undefined behaviour, again in the
standard's terminology.

Is this really true? I don't think it is, at least not in the general
way that is often presented here. An indeterminate value is either a
valid value of the type or it is a trap representation. Thus, on
system with no trap representations for objects interpreted as having
type T, accessing an indeterminate value of type T must simply be
unspecified.

Now, in this case, s was of type char. 6.2.6.1 p5 which defines and
discusses trap representations states that:

Certain object representations need not represent a value of the
object type. If the stored value of an object has such a
representation and is read by an lvalue expression that does not
have character type, the behavior is undefined. If such a
representation is produced by a side effect that modifies all or any
part of the object by an lvalue expression that does not have
character type, the behavior is undefined. Such a representation is
called a trap representation.

I read that as forbidding UB when a trap representation is accessed
via an lvalue expression of type char which is the case here, is it
not?

I believe that's a misreading. What the passage says is that if the
access type is a non-character type then the behavior is undefined.
It does not say that if the access type is a character type then the
behavior is defined. Access through a character type interprets the
stored value (ie, the representation) according to the type used to do
the read; if the access type is (char) or (signed char) and the
representation read is a trap representation for that type, it's still
undefined behavior, because there's no (Standard-)defined way to
produce a value from a trap representation.

OK, that's reasonable (and was how I first read it): access via a
character type is undefined or defined depending on whether the byte
is or is not a trap representation for the character type used.

What, then, is the effect of the second sentence of the quote? It
must be to add a further blanket undefined for all accesses to one
type's trap representations when accessed via another type. I.e. that
given

union { int si; unsigned ui; } u;

access to u.ui is undefined when u.si holds a trap representation even
when unsigned int has no trap representations of its own.

If that is right there are two things that puzzled me and cause me to
over-think the clause in question. First, it seems odd to give signed
char this odd half-way position and, second, it seems at odds with the
explanation of unions in 6.5.2.3 p3. At the very least the footnote
should surely be expanded to cover the case where some other union
member is trap representation.

Neither of these are arguments for my reading. They are there to
explain why I thought the way I did.
 
P

Peter Nilsson

Tim Rentsch said:
I believe that's a misreading.

If it misreads the intent, it's because the intent is not
clear. ;)
What the passage says is that if the access type is a non-
character type then the behavior is undefined. It does not say
that if the access type is a character type then the behavior
is defined. Access through a character type interprets the
stored value (ie, the representation) according to the type
used to do the read; if the access type is (char) or (signed
char) and the representation read is a trap representation for
that type, it's still undefined behavior, because

To put it another way, the last time this discussion came up,
the majority view was that trap representations are possible for
all types except unsigned char (and unsigned bit-fields).
Access to trap representations for non character types is
explicitly undefined. Access to trap representations for signed
character types is _implicitly_ undefined due to a lack of
specification!

Note that "[It is implementation-defined] whether the value
with sign bit 1 and all value bits zero (for the first two),
or with sign bit and all value bits 1 (for ones’ complement),
is a trap representation or a normal value." does not
exclude application to signed character types.

Thus, signed character types can have trap representations.
Whether they can be accessed is a separate issue.

The question remains, why does 6.2.6.1p5 _explicitly_ exclude
character types?
there's no (Standard-)defined way to
produce a value from a trap representation.

What's the standard way to produce a value from a non trap
representation for an integer type? Why wouldn't that apply?
 
T

Tim Rentsch

Peter Nilsson said:
If it misreads the intent, it's because the intent is not
clear. ;)

I would not presume to argue that point. :)
To put it another way, the last time this discussion came up,
the majority view was that trap representations are possible for
all types except unsigned char (and unsigned bit-fields).

Of course you mean all scalar types -- struct's and union's are
exempt.
Access to trap representations for non character types is
explicitly undefined. Access to trap representations for signed
character types is _implicitly_ undefined due to a lack of
specification!
Right.

Note that "[It is implementation-defined] whether the value
with sign bit 1 and all value bits zero (for the first two),
or with sign bit and all value bits 1 (for ones' complement),
is a trap representation or a normal value." does not
exclude application to signed character types.

Thus, signed character types can have trap representations.

Yes, even in implementations that use 2's complement.
Whether they can be accessed is a separate issue.

The question remains, why does 6.2.6.1p5 _explicitly_ exclude
character types?

Probably because in most implementations the character
types don't have trap representations, and therefore they
shouldn't be included in a blanket statement of undefined
behavior. Also character types (notably "plain" char) are
typically used to get around representation issues; the
combination of how must implementations are and what was
(and is?) common usage probably accounts for the exception
being worded as it is. (Of course I'm only speculating...)
What's the standard way to produce a value from a non trap
representation for an integer type? Why wouldn't that apply?

This mapping is supplied by the required documentation giving the
implementation-defined information for representation of types
(plus the relevant sections of 6.2.6). That documentation
also defines which representations are trap representations,
ie, which representations correspond to "no value".
 
T

Tim Rentsch

Ben Bacarisse said:
Tim Rentsch said:
Ben Bacarisse said:
Richard,Keith and all thanks a ton for your replies.
I have one more question.
In the code if I add the following line:
------------------------------------
z = s[p[0]]++; // say z is declared as int
------------------------------------
what would be the value of z and how it is evaluated?

Thanks a lot in advance.

Since you don't initialise the s array, it's members could contain
any garbage value, normally the value that was last written to the
memory location. So by extension, we cannot say z contains any useful
value after your assignment. In standard C parlance, it's value is
indeterminate, and reading an indeterminate object, as in the RHS of
your assignment statement, invokes undefined behaviour, again in the
standard's terminology.

Is this really true? I don't think it is, at least not in the general
way that is often presented here. An indeterminate value is either a
valid value of the type or it is a trap representation. Thus, on
system with no trap representations for objects interpreted as having
type T, accessing an indeterminate value of type T must simply be
unspecified.

Now, in this case, s was of type char. 6.2.6.1 p5 which defines and
discusses trap representations states that:

Certain object representations need not represent a value of the
object type. If the stored value of an object has such a
representation and is read by an lvalue expression that does not
have character type, the behavior is undefined. If such a
representation is produced by a side effect that modifies all or any
part of the object by an lvalue expression that does not have
character type, the behavior is undefined. Such a representation is
called a trap representation.

I read that as forbidding UB when a trap representation is accessed
via an lvalue expression of type char which is the case here, is it
not?

I believe that's a misreading. What the passage says is that if the
access type is a non-character type then the behavior is undefined.
It does not say that if the access type is a character type then the
behavior is defined. Access through a character type interprets the
stored value (ie, the representation) according to the type used to do
the read; if the access type is (char) or (signed char) and the
representation read is a trap representation for that type, it's still
undefined behavior, because there's no (Standard-)defined way to
produce a value from a trap representation.

OK, that's reasonable (and was how I first read it): access via a
character type is undefined or defined depending on whether the byte
is or is not a trap representation for the character type used.

What, then, is the effect of the second sentence of the quote? It
must be to add a further blanket undefined for all accesses to one
type's trap representations when accessed via another type. I.e. that
given

union { int si; unsigned ui; } u;

access to u.ui is undefined when u.si holds a trap representation even
when unsigned int has no trap representations of its own.

If that is right there are two things that puzzled me and cause me to
over-think the clause in question. First, it seems odd to give signed
char this odd half-way position and, second, it seems at odds with the
explanation of unions in 6.5.2.3 p3. At the very least the footnote
should surely be expanded to cover the case where some other union
member is trap representation.

Neither of these are arguments for my reading. They are there to
explain why I thought the way I did.

Yes, that's perfectly understandable. Here are some ideas in
response to the implicit questions in your penultimate paragraph.

First, about unions. Suppose we have a union containing just a
signed integer member, eg,

union { signed int si; } siu;

where both sizeof siu == 4 and sizeof siu.si == 4 are true.

In such a case there are in fact two distinct objects, even though
they happen to occupy exactly the same area of memory -- there is
'siu', and 'siu.si'. We know these are different because the
object designated by 'siu' can never be a trap representation,
(because it's a union, which are never trap representations) even
though 'siu.si' holds a trap representation.

(Editorial side note: the language the Standard uses relating to
the term "object" in various places is among the poorest sets of
phrasings the Standard employs. At some point I might write
something more about that, but right now I'd like to gloss over
those problems.)

Similarly, in the example union mentioned above

union { int si; unsigned ui; } u;

there actually are three distinct objects -- u, u.si, and u.ui.
That at least two of these three occupy exactly the same bytes of
memory doesn't alter the number, since unions are described as
_overlapping_ objects.

What this means is that 'u.ui' and '*(unsigned*)&u.si', because
they are accessing different objects, are allowed to behave
differently.

Now for the second question -- why is (char), in the guise of
(signed char), different? Or why are types besides character
types distinguished? Here is my speculation. The character
types are different because, ever since the early days of C, the
type 'char' has been used to access memory "free form", and the
Standard didn't want to change that. The interesting question
is, why give blanket undefined behavior to all the other types?
Here is where the speculation goes a little deeper. I conjecture
that an implementation might want to use trap representations to
indicate "not yet initialized" values, doing this automatically
without being told, and furthermore that it knows this. In such
a case, we might want

unsigned u;
int i;
u = *(unsigned *) &i;

to be able to trap, because the variable 'i' hasn't been given an
explicit initial value. If the trap-representation-ness of
something depended just on what type is used for access, that
would prevent this form of error detection if some types had no
trap representations.

Even though the last part is pure speculation on my part, this
explanation seems like a plausible enough motivation for the funny
wording in 6.2.6.1#5. At least, for me it does so enough so that
my mental model can tolerate the seeming inconsistencies with
other areas of the Standard in this regard. So I offer it up here
in case it may be of help to other folks.
 
B

Ben Bacarisse

I agree with what you've written but want to raise a detail so
forgive me for snipping so much of a helpful reply...
First, about unions. Suppose we have a union containing just a
signed integer member, eg,

union { signed int si; } siu;

where both sizeof siu == 4 and sizeof siu.si == 4 are true.

In such a case there are in fact two distinct objects, even though
they happen to occupy exactly the same area of memory -- there is
'siu', and 'siu.si'. We know these are different because the
object designated by 'siu' can never be a trap representation,
(because it's a union, which are never trap representations) even
though 'siu.si' holds a trap representation.

(Editorial side note: the language the Standard uses relating to
the term "object" in various places is among the poorest sets of
phrasings the Standard employs. At some point I might write
something more about that, but right now I'd like to gloss over
those problems.)

Similarly, in the example union mentioned above

union { int si; unsigned ui; } u;

there actually are three distinct objects -- u, u.si, and u.ui.
That at least two of these three occupy exactly the same bytes of
memory doesn't alter the number, since unions are described as
_overlapping_ objects.

This is obviously the intent from the wording about unions but there
is a problem with the == operator. 6.5.9 p6 reads:

Two pointers compare equal if and only if both are null pointers,
both are pointers to the same object (including a pointer to an
object and a subobject at its beginning) or function, both are
pointers to one past the last element of the same array object, or
one is a pointer to one past the end of one array object and the
other is a pointer to the start of a different array object that
happens to immediately follow the first array object in the address
space.

So unless we stretch the meaning of the parenthetical remark, we would
have to conclude that (void *)&u.si == (void *)&u.ui must be false
since these two are not the same object.

Of course, both u.si and u.ui are subobjects at the same object's
beginning, but that case is not explicitly covered.

<snip>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,981
Messages
2,570,188
Members
46,733
Latest member
LonaMonzon

Latest Threads

Top