use pointer as array, Weird

W

wy

This code can't be compiled without errors.
/* test.c */
#include <stdio.h>
typedef char data_str;

data_str _data_[3], *data = _data_;
void data_pointer_print(){
printf("_data_ address 1st: %p\n", _data_);
printf(" data address 2nd: %p\n", data );
}

extern data_str data[3];
int main(){
data_pointer_print();
printf(" data address 3rd: %p\n", data );
}

I understand the error "conflicting types for ‘data’".
But after splitting the code into several files,
it is compiled successfully, and have an undesired result.

/* data.h */
#include <stdio.h>
typedef char data_str;
extern void datastr_pointer_print();

/* data.c */
#include "data.h"
data_str _data_[3], *data = _data_;
void data_pointer_print(){
printf("_data_ address in data.c: %p\n", _data_);
printf(" data address in data.c: %p\n", data );
}

/* test.c */
#include "data.h"
extern data_str data[3];
int main(){
data_pointer_print();
printf(" data address in test.c: %p\n", data );
}

And this is the result:
$ gcc test.c data.c
$ ./a.out
_data_ address in data.c: 0x6009b8
data address in data.c: 0x6009b8
data address in test.c: 0x6009a0

Here we see, it's compiled with no errors and warnings,
and the data addresses in data.c and test.c are different.
Why are they different and
why don't c complier prohibit this kind of use?
 
S

Shao Miller

This code can't be compiled without errors.
/* test.c */
#include <stdio.h>
typedef char data_str;

data_str _data_[3], *data = _data_;
void data_pointer_print(){
printf("_data_ address 1st: %p\n", _data_);
printf(" data address 2nd: %p\n", data );
}

extern data_str data[3];
int main(){
data_pointer_print();
printf(" data address 3rd: %p\n", data );
}

I understand the error "conflicting types for ‘data’".
But after splitting the code into several files,
it is compiled successfully, and have an undesired result.

/* data.h */
#include <stdio.h>
typedef char data_str;
extern void datastr_pointer_print();

/* data.c */
#include "data.h"
data_str _data_[3], *data = _data_;
void data_pointer_print(){
printf("_data_ address in data.c: %p\n", _data_);
printf(" data address in data.c: %p\n", data );
}

/* test.c */
#include "data.h"
extern data_str data[3];
int main(){
data_pointer_print();
printf(" data address in test.c: %p\n", data );
}

And this is the result:
$ gcc test.c data.c
$ ./a.out
_data_ address in data.c: 0x6009b8
data address in data.c: 0x6009b8
data address in test.c: 0x6009a0

Here we see, it's compiled with no errors and warnings,
and the data addresses in data.c and test.c are different.
Why are they different and
why don't c complier prohibit this kind of use?

Because you have lied to the C implementation. An array and a pointer
are different things; they are not interchangeable. test.c lies that
'data' has type 'data_str[3]'. The truth is that 'data' has type
'data_str *', as defined in data.c.
 
B

Bart van Ingen Schenau

This code can't be compiled without errors.
<snip - code using different types for one file-scope object called
`data`>
I understand the error "conflicting types for ‘data’". But after
splitting the code into several files, it is compiled successfully, and
have an undesired result.
Here we see, it's compiled with no errors and warnings, and the data
addresses in data.c and test.c are different. Why are they different and
why don't c complier prohibit this kind of use?

Different C files are processed completely independent of each other. For
all the compiler and linker are concerned, the two files can be processed
on different days on different machines. Until the linker gets his hands
on the object files, there is no indication that they belong together in
one application.

For that reason, by splitting the source over two files (without a common
header file with a declaration of `data`), you have taken away all
possibilities for the compiler to notice the conflict in the two
declarations of `data`.
Additionally, because the two declarations of `data` are not compatible
with each other, you have landed yourself squarely in the land of
Undefined Behaviour. The program can do whatever it likes and there is no
right or wrong any more.

Bart van Ingen Schenau
 
G

glen herrmannsfeldt

(snip)
/* data.h */
#include <stdio.h>
typedef char data_str;
extern void datastr_pointer_print();

/* data.c */
#include "data.h"
data_str _data_[3], *data = _data_;
void data_pointer_print(){
printf("_data_ address in data.c: %p\n", _data_);
printf(" data address in data.c: %p\n", data );

Add here:

printf(" data address in data.c: %p\n", &data );

}

/* test.c */
#include "data.h"
extern data_str data[3];
int main(){
data_pointer_print();
printf(" data address in test.c: %p\n", data );
}
(snip)

Here we see, it's compiled with no errors and warnings,

The two are compiled separately and, separately, are
legal C. The linker doesn't test for the error for
mostly historical reasons.
and the data addresses in data.c and test.c are different.
Why are they different and
why don't c complier prohibit this kind of use?

It isn't legal C, but the compiler can't tell.

-- glen
 
W

wy

The two are compiled separately and, separately, are
legal C. The linker doesn't test for the error for
mostly historical reasons.

It isn't legal C, but the compiler can't tell.

I see.
It's the linker's fault.
The compiler tests(tries to compile) source code,
and the linker links object code while not testing source or object.
Thank you.
 
J

James Kuyper

I see.
It's the linker's fault.

No. The linker is not required by the C standard to generate a
diagnostic, so the linker is not at fault. This means that the object
file format might not even provide enough information to allow the
linker to diagnose such problems. Even if the linker does have the
needed capabilities, the standard does not require a conforming compiler
to give the linker the needed information, so the compiler is also not
at fault.

Finally, this was a deliberate decision by the C committee to allow
implementation of C even on systems with fairly simple-minded linkers.
Keep in mind that linkers were generally a lot less capable back when C
was first standardized. This decision is part of a pattern in the C
standard, to avoid imposing requirements that are hard to meet on some
platforms. As a result, C can be easily and efficiently implemented on
almost every platform, and as a further result, C has been implemented
on almost every platform. You can argue with the C committee's goals -
other languages impose more requirements on implementations, allowing
them to impose fewer requirements on developers. However, this decision
is consistent with the goals of C, and so the standard is also not at
fault. The only fault is in the source code itself.
 
J

James Kuyper

One side issue:

This code can't be compiled without errors.
/* test.c */
#include <stdio.h>
typedef char data_str;

data_str _data_[3], *data = _data_;

"-- All identifiers that begin with an underscore and either an
uppercase letter or another underscore are always reserved for any use.
-- All identifiers that begin with an underscore are always reserved for
use as identifiers with file scope in both the ordinary and tag name
spaces." (7.1.3p1).

_data_ has file scope, and is in the ordinary name space.

"If the program declares or defines an identifier in a
context in which it is reserved (other than as allowed by 7.1.4), or
defines a reserved identifier as a macro name, the behavior is
undefined." (7.1.3p2)

Therefore, your program has undefined behavior. Don't use such names.
 
K

Keith Thompson

James Kuyper said:
One side issue:
This code can't be compiled without errors.
/* test.c */
#include <stdio.h>
typedef char data_str;

data_str _data_[3], *data = _data_;

"-- All identifiers that begin with an underscore and either an
uppercase letter or another underscore are always reserved for any use.
-- All identifiers that begin with an underscore are always reserved for
use as identifiers with file scope in both the ordinary and tag name
spaces." (7.1.3p1).

_data_ has file scope, and is in the ordinary name space.

"If the program declares or defines an identifier in a
context in which it is reserved (other than as allowed by 7.1.4), or
defines a reserved identifier as a macro name, the behavior is
undefined." (7.1.3p2)

Therefore, your program has undefined behavior. Don't use such names.

Yes, but it's very unlikely that the use of a reserved identifier causes
the symptoms. (James, you know that, but the OP might not.)
 
W

wy

Finally, this was a deliberate decision by the C committee to allow
implementation of C even on systems with fairly simple-minded linkers.

It's surprising!
After I change the declaration of data in test.c to
'extern short data;',
still no errors and warning occur.
And result is
'
_data_ address in data.c: 0x6009c0
data address in data.c: 0x6009c0
data address in test.c: 0x9c0
',
as expected.
Thanks for you detailed explanation.
 
K

Keith Thompson

wy said:
This code can't be compiled without errors.
/* test.c */
#include <stdio.h>
typedef char data_str;

data_str _data_[3], *data = _data_;
void data_pointer_print(){
printf("_data_ address 1st: %p\n", _data_);
printf(" data address 2nd: %p\n", data );
}

extern data_str data[3];
int main(){
data_pointer_print();
printf(" data address 3rd: %p\n", data );
}

I understand the error "conflicting types for ‘data’".
But after splitting the code into several files,
it is compiled successfully, and have an undesired result.

/* data.h */
#include <stdio.h>
typedef char data_str;
extern void datastr_pointer_print();

/* data.c */
#include "data.h"
data_str _data_[3], *data = _data_;
void data_pointer_print(){
printf("_data_ address in data.c: %p\n", _data_);
printf(" data address in data.c: %p\n", data );
}

/* test.c */
#include "data.h"
extern data_str data[3];
int main(){
data_pointer_print();
printf(" data address in test.c: %p\n", data );
}

And this is the result:
$ gcc test.c data.c
$ ./a.out
_data_ address in data.c: 0x6009b8
data address in data.c: 0x6009b8
data address in test.c: 0x6009a0

Here we see, it's compiled with no errors and warnings,
and the data addresses in data.c and test.c are different.
Why are they different and
why don't c complier prohibit this kind of use?

To some extent, your code conflates two different things: the often
confusing relationship between arrays and pointers in C (explained
in section 6 of the comp.lang.c FAQ, <http://www.c-faq.com/>),
and conflicting definitions of external objects in different
translation units.

Here's a simpler test case that demonstrates the latter problem:

==> foo.c <==
int x = 42;

==> bar.c <==
#include <stdio.h>

extern float x;

int main(void) {
printf("x = %g\n", x);
return 0;
}

This prints the garbage value that results from treating the int object
x as if it were a float object; I get "x = 5.88545e-44".

The way to avoid that is to declare anything that needs to be visible in
multiple translation unit in header files, guaranteeing that they all
see consistent declarations:

==> foo.h <==
#ifndef H_FOO_H
extern int x;
#endif /* H_FOO_H */

==> foo.c <==
#include "foo.h"

int x = 42;

==> bar.c <==
#include <stdio.h>
#include "foo.h"

int main(void) {
printf("x = %g\n", x);
return 0;
}

There's a single external *declaration* of x, visible to both foo.c
and bar.c. foo.c contains the *definition* of x. Since foo.c
includes foo.h, the compiler can check that the definition and
declaration are consistent.

And now -- well, actually, the program still compiles and prints
a garbage value:

x = 4.85447e-270

but only because I didn't fix the format in the printf() call (it
misbehaves for different reasons). (Why the different result?
A float argument, when passed to printf, is promoted to double.
Since int is 32 bits on my system and double is 64, the printed
result is probably composed of the 32 bits of the int value 42 plus
32 bits of stack garbage.)

At least some compilers will warn about that; gcc warns:

bar.c:5:5: warning: format ‘%g’ expects argument of type ‘double’, but argument 2 has type ‘int’ [-Wformat]

And when we change the "%g" to "%d", the output is correct:

x = 42

When I started writing this, I intended to show how careful use
of header files addresses this problem -- and it does. I also
accidentally showed that you *still* have to be very careful
when writing C code. Headers let certain errors be caught by the
compiler rather than by the linker, but other errors still have to
be caught by the programmer (compilers won't necessarily diagnose
format string errors).
 
S

Shao Miller

I see.
It's the linker's fault.
The compiler tests(tries to compile) source code,
and the linker links object code while not testing source or object.
Thank you.

It is not the linker's fault because the code lies. Whatever particular
linker you're using happens to believe your code. Imagine how much
worse it could be... You could define 'int x = 13;' in one file and
declare 'extern void x(void);' in another file and try to call it. What
would happen then?

It's likely that you're getting the same results as you would with a union:

char data_[3];
union {
char * cp;
char ca[3];
} data = { data_ };

Here, 'data.ca != data.cp', because pointers and arrays are different
things, as mentioned before.
 
E

Eric Sosman

It's surprising!
After I change the declaration of data in test.c to
'extern short data;',
still no errors and warning occur.
And result is
'
_data_ address in data.c: 0x6009c0
data address in data.c: 0x6009c0
data address in test.c: 0x9c0
',
as expected.

There's no reason (in C) to have any particular expectation
of the consequences of undefined behavior. That's what "undefined"
means! If you expect some specific outcome when undefined behavior
is at work, you are kidding yourself.

That said, it is entirely possible that a given implementation
of C may go beyond what the Standard requires and define behaviors
that the Standard does not. An implementation may advertise that
its linker will detect certain kinds of clashes, and will react in
thus-and-such a way when they occur. If your expectation is based
on beyond-the-language documentation of this kind, then it may be
well-founded -- but you cannot hold that same expectation across
other C implementations, because they are under no obligation to
behave in the same way, nor even to document how they will behave.

(NOBODY expects Undefined Behavior! Our chief weapon is the
crash... crashes and bugs, bugs and crashes... Our two weapons
are crashes and bugs and subtle oddities... Our *three* weapons
are crashes, bugs, subtle oddities, and meeting unjustified
expectations... Our *four* -- no -- *Amongst* our weapons...
our weaponry... are such elements as crashes, bugs... I'll
come in again.)
 
J

James Kuyper

James Kuyper said:
One side issue:
This code can't be compiled without errors.
/* test.c */
#include <stdio.h>
typedef char data_str;

data_str _data_[3], *data = _data_;

"-- All identifiers that begin with an underscore and either an
uppercase letter or another underscore are always reserved for any use.
-- All identifiers that begin with an underscore are always reserved for
use as identifiers with file scope in both the ordinary and tag name
spaces." (7.1.3p1).

_data_ has file scope, and is in the ordinary name space.

"If the program declares or defines an identifier in a
context in which it is reserved (other than as allowed by 7.1.4), or
defines a reserved identifier as a macro name, the behavior is
undefined." (7.1.3p2)

Therefore, your program has undefined behavior. Don't use such names.

Yes, but it's very unlikely that the use of a reserved identifier causes
the symptoms. (James, you know that, but the OP might not.)

That's why I called it a side issue. Still, you're right - I should have
mentioned that fact explicitly.
 
J

James Kuyper

You have declared the same 'data' object of file scope, twice. Delete the second
declaration (extern data_str data[3];) and the program will compile and behave
correctly.

[After Wy split the two different definitions into separate source files:]
....
They are different because the translator creates two different 'data' objects
that the linker must treat separately.

There's no "must" about it. As far as the C standard is concerned, the
behavior is simply undefined, the linker can do anything it wants. As
far as reality goes, there have been real linkers that would place both
'data' objects in the same location; in C terms they would behave as if
they were a members of an unnamed union of the two different
declarations. The compiler doesn't know that the two 'data's share the
same memory, and presumably neither does the author of the code, so the
typical result using such a linker would be disaster.

As if that's not bad enough, it's not necessarily the case that the
linker would set aside enough space for the larger of the two
declarations; it might set aside enough space for whichever one it
processed first, which could be the smaller one. There's a reason,
soundly rooted in the behavior of commonly used linkers at the time the
C standard was first written, why the standard was written to give code
like this undefined behavior.
 
G

Geoff

This code can't be compiled without errors.
/* test.c */
#include <stdio.h>
typedef char data_str;

data_str _data_[3], *data = _data_;
void data_pointer_print(){
printf("_data_ address 1st: %p\n", _data_);
printf(" data address 2nd: %p\n", data );
}

extern data_str data[3];
int main(){
data_pointer_print();
printf(" data address 3rd: %p\n", data );
}

I understand the error "conflicting types for ‘data’".

OK, if you think you understand the error, then why do you apply the wrong fix?

You have declared the same 'data' object of file scope, twice. Delete the second
declaration (extern data_str data[3];) and the program will compile and behave
correctly.

The first declaration defines and instantiates the object and reserves the space
for it. The second declaration declares some object, 'data', as external,
meaning the translator expects to encounter the instantiation at link time, but
your code doesn't do that.
But after splitting the code into several files,
it is compiled successfully, and have an undesired result.

Yes, because you applied a "fix" (below) that doesn't eliminate the point of the
error.
/* data.h */
#include <stdio.h>
typedef char data_str;
extern void datastr_pointer_print();

/* data.c */
#include "data.h"
data_str _data_[3], *data = _data_;
void data_pointer_print(){
printf("_data_ address in data.c: %p\n", _data_);
printf(" data address in data.c: %p\n", data );
}

/* test.c */
#include "data.h"
extern data_str data[3];
int main(){
data_pointer_print();
printf(" data address in test.c: %p\n", data );
}

And this is the result:
$ gcc test.c data.c
$ ./a.out
_data_ address in data.c: 0x6009b8
data address in data.c: 0x6009b8
data address in test.c: 0x6009a0

Here we see, it's compiled with no errors and warnings,
and the data addresses in data.c and test.c are different.
Why are they different and
why don't c complier prohibit this kind of use?

They are different because the translator creates two different 'data' objects
that the linker must treat separately.
 
B

BartC

wy said:
I see.
It's the linker's fault.

Not really. The linker only matches the values of symbols. 'data' in data.c
and 'data' in test.c are resolved by the linker to the same address. It
doesn't know about types.

But data.c treats its 'data' as a pointer (and writes the value of the
pointer: the contents of that address), while test.c treats 'data' as an
array (and writes the address of element 0 of the array).

To avoid problems, make sure the two files share the same declaration of
'data'.
 
A

Andrey Tarasevich

Here we see, it's compiled with no errors and warnings,
and the data addresses in data.c and test.c are different.
Why are they different and
why don't c complier prohibit this kind of use?

Historically, C compilers are built on the principle of "independent
translation". The compiler proper sees and compiler each translation
unit independently, without any knowledge of any other translation units.

For this reason, the compiler proper cannot detect any errors that are
caused by any inconsistencies between different translation units. The
language specification explicitly gives the compilers the freedom to
ignore such errors, i.e. it allows them generate invalid code without
issuing any diagnostic messages.

The only part of the compiler that actually sees the program in its
entirety (in a typical implementation) is called linker. So, linker is
actually in position to detect such errors. But in a typical
implementation by the time the program gets to the linking stage, the
additional information needed for error detection is already lost
irreversibly.

Hence the responsibility to observe the inter-translation-unit
relationships resides entirely on the user. If you fail to observe them,
you'll typically end up with a program, whose behavior is undefined.

It doesn't have to be that way. It is really a quality-of-implementation
issue. Some compiler might decide to go that extra mile and take extra
steps in order to detect such errors. However, most compilers indulge
the liberty provided by the language specification and leave such errors
undiagnosed.
 
S

Shao Miller

There's a reason,
soundly rooted in the behavior of commonly used linkers at the time the
C standard was first written, why the standard was written to give code
like this undefined behavior.

There might be a reason, but there definitely isn't a citation, here.
 
J

James Kuyper

On 01/29/2013 04:21 PM, glen herrmannsfeldt wrote:
....
Not that I really know "What they thought when they did it" but
it seems to me that we are stuck with what Fortran could do 50 years
ago, on much smaller computers. That, and that people don't write a
new linker for each new language.

We're not completely stuck. Nothing prevents implementations of C from
taking advantage of the capabilities of more modern linkers, and
diagnosing such problems. If linkers with the required capabilities
become common enough, and enough implementations of C choose to take
advantage of that fact, the standard might eventually be modified to
make diagnosis of such problems mandatory. However, don't expect
anything like that to happen any time soon (possibly not for several
decades).
 
G

Geoff

You have declared the same 'data' object of file scope, twice. Delete the second
declaration (extern data_str data[3];) and the program will compile and behave
correctly.

[After Wy split the two different definitions into separate source files:]
...
They are different because the translator creates two different 'data' objects
that the linker must treat separately.

There's no "must" about it. As far as the C standard is concerned, the
behavior is simply undefined, the linker can do anything it wants. As
far as reality goes, there have been real linkers that would place both
'data' objects in the same location; in C terms they would behave as if
they were a members of an unnamed union of the two different
declarations. The compiler doesn't know that the two 'data's share the
same memory, and presumably neither does the author of the code, so the
typical result using such a linker would be disaster.

Good point. There is nothing in the C standard that defines what must happen in
the multiple translation unit, redefinition case. I should have said:

They are different because the translator creates two different 'data' objects
that your linker treats separately. This is UB in C and can cause unexpected
behavior at run time.
As if that's not bad enough, it's not necessarily the case that the
linker would set aside enough space for the larger of the two
declarations; it might set aside enough space for whichever one it
processed first, which could be the smaller one. There's a reason,
soundly rooted in the behavior of commonly used linkers at the time the
C standard was first written, why the standard was written to give code
like this undefined behavior.

His initial case, the single-file multiple definition, was a constraint
violation and the compiler correctly indicated the error. The OP eliminated the
error by splitting the file into separate translation units but this induced UB.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,002
Messages
2,570,261
Members
46,858
Latest member
FlorrieTuf

Latest Threads

Top