Returning a struct from a function - strange behavior

L

lawrence.jones

Nick Keighley said:
so is the program incorrect?

Yes, as far as C90 is concerned (but it is correct for C99).
Or did a particular version of gcc have a bug?

Also yes.
If the program is exhibiting UB then translating it and running it to
produce the "expected" behaviour is perfectly ok!

Which is the intended result with gcc, hence the characterization as a
gcc bug.
 
L

lawrence.jones

Keith Thompson said:
I *think* that the behavior is undefined in both C90 and C99 (but
becomes well defined in C201X).

No, support for non-lvalue arrays was added in C99.
 
K

Keith Thompson

No, support for non-lvalue arrays was added in C99.

Can you cite the section of the C99 standard that does this?

I see this in the list of major changes in the Foreword:

-- conversion of array to pointer not limited to lvalues

But C99 6.3.2.1p3 says:

Except when it is the operand of the sizeof operator or the unary
& operator, or is a string literal used to initialize an array, an
expression that has type "array of _type_" is converted to an
expression with type "pointer to _type_" that points to the
initial element of the array object and is not an lvalue. If the
array object has register storage class, the behavior is
undefined.

The phrase "the array object" implies that there must be an array
object somewhere. In the case we're considering (the return value of
a function that returns a struct with an array member), there is no
array object whose initial element we can point to. Is there other
(normative) wording that clarifies this?

It seems that the new wording in n1336, creating an object with
"temporary lifetime" (6.2.4p7) is intended to avoid the need for
non-lvalue arrays.

My assumption is that you're right and that I've missed something.
 
L

lawrence.jones

Keith Thompson said:
The phrase "the array object" implies that there must be an array
object somewhere. In the case we're considering (the return value of
a function that returns a struct with an array member), there is no
array object whose initial element we can point to. Is there other
(normative) wording that clarifies this?

Yes, in N1336. :)

In C90, the section you quoted said that only lvalues with array type
are converted to pointers; C99 relaxed that to allow all array type
expressions to be converted. As you note, that leads to a bit of
cognitive dissonance since an object suddenly appears in the midst of a
value, which doesn't otherwise happen. Nonetheless, you can safely
assume that it's magically created out of the luminiferous aether as
required and mysteriously evaporates again at the next sequence point.
The committee was loathe to say anything more about such objects since
it opens a can of worms (e.g., what storage duration and lifetime they
have), but we bit the bullet for C1X.
 
P

Peter Nilsson

Keith Thompson said:
Peter Nilsson said:
DiAvOl said:
#include <stdio.h>
typedef struct person {
  char name[40];
  int age;
} Person;
static Person make_person(void);
int main(void) {
  printf("%s\n", make_person().name);
  return 0;
}
static Person make_person(void) {
  static Person p = { "alexander", 18 };
  return p;
}

The above small program when compiled without the
-std=c99 option (using gcc 4.2.3) gives me a warning:
"warning: format ‘%s’ expects type ‘char *’, but
argument 2 has type ‘char[40]’"
and also fails with a segmentation fault when executed.

That's a bug in gcc 4.2.3 then. [The same segfault
happens with -ansi.]

No, I don't think it's a bug.

On reflection, I conceed. It's not a bug; it's the worst
compiler 'feature' I've ever seen.
 
P

Peter Nilsson

DiAvOl said:
Keith Thompson said:
...make_person().name reveals an anomaly in the C type
system; it should be a value of type char[40], which
should decay to a pointer to the first element of the
corresponding array object, but there is no array object,
just a value.

My question then is if there is no array object why does
the &make_person().name[0] works?

It doesn't, since name doesn't decay to a pointer. Even
if it did, & requires an object. In other words, it works
by ub. Segfault is one example of ub, 'works just fine' is
another.

The question is why people went out of their way to cause
your sample to suddenly segfault. [Why they did a half
job is lesser question.]
 
K

Keith Thompson

Yes, in N1336. :)

In C90, the section you quoted said that only lvalues with array type
are converted to pointers; C99 relaxed that to allow all array type
expressions to be converted. As you note, that leads to a bit of
cognitive dissonance since an object suddenly appears in the midst of a
value, which doesn't otherwise happen. Nonetheless, you can safely
assume that it's magically created out of the luminiferous aether as
required and mysteriously evaporates again at the next sequence point.
The committee was loathe to say anything more about such objects since
it opens a can of worms (e.g., what storage duration and lifetime they
have), but we bit the bullet for C1X.

Hmm.

So in C90, where the array-to-pointer conversion is defined only for
lvalues, ``make_person().name'' is an expression of array type. In
the code sample in the original post, it's passed as an argument to
printf, corresponding to a "%s" format, invoking undefined behavior.

In theory, I suppose, you could write a variadic function that
actually extracts an array value using va_arg -- though I'm not sure
what it could do with it, and I doubt that any implementations
actually support it.

I decline to believe that the phrase "the array object" in the C99
standard actually causes such an object to be created (or, more
precisely, imposes a requirement on implementers to arrange for such
an object to be created). In particular, I see no implied guarantee
that "the array object" will continue to exist until the next sequence
point. If the called function returns the value of an object of
struct type, then "the array object" could plausibly refer to the
array member of that object, which could be local to the function and
therefore nonexistent after the function returns.

I'm glad to see this is being corrected in C1x -- and I'll just avoid
writing such code in C90 or C99.
 
C

CBFalconer

DiAvOl said:
#include <stdio.h>

typedef struct person {
char name[40];
int age;
} Person;

static Person make_person(void);

int main(void) {
printf("%s\n", make_person().name);
return 0;
}

static Person make_person(void) {
static Person p = { "alexander", 18 };
return p;
}

The above small program when compiled without the -std=c99 option
(using gcc 4.2.3) gives me a warning:
"warning: format ‘%s’ expects type ‘char *’, but argument 2 has
type ‘char[40]’"
and also fails with a segmentation fault when executed.

If I replace the line printf("%s\n", make_person().name); with
printf("%s\n", &make_person().name[0]); everything works as
expected.

Why does this happen? Isn't make_person().name a pointer to the
array's first element?

No. make_person() returns a struct by value, which has a field
identified by .name. That field is an array of 40 chars. It is a
portion of the return struct, which has never been put in
accessible memory.

Your alleged 'good' experience with lcc shows a bug in lcc. I
don't know if you mean lcc-win32 (which has quite a few known
insects) or lcc (which is less well known here).
 
C

CBFalconer

DiAvOl said:
Keith Thompson said:
[...]
An lvalue is only explicitly required in C90. But in either
case, make_person().name is an lvalue.

I don't believe it is. make_person() yields a value of type
Person. make_person().age yields a value of type int.
make_person().name reveals an anomaly in the C type system; it
should be a value of type char[40], which should decay to a
pointer to the first element of the corresponding array object,
but there is no array object, just a value.

My question then is if there is no array object why does the
&make_person().name[0] works?

Please don't strip attributions for any material you quote. I
restored the third one above.

The point is that you have undefined behaviour. Defining why it
works on your particular system would require complete analysis of
the running system, after which you MIGHT know when it wouldn't
work. Until you use a different issue of the compiler, library,
optimizations, etc.
 
P

Peter Nilsson

CBFalconer said:
DiAvOl said:
#include <stdio.h>
typedef struct person {
  char name[40];
  int age;
} Person;

static Person make_person(void);

int main(void) {
  printf("%s\n", make_person().name);
  return 0;
}

static Person make_person(void) {
  static Person p = { "alexander", 18 };
  return p;
}

The above small program when compiled without the
-std=c99 option (using gcc 4.2.3) gives me a warning:
"warning: format ‘%s’ expects type ‘char *’, but
argument 2 has type ‘char[40]’" and also fails with
a segmentation fault when executed.

If I replace the line printf("%s\n", make_person().name);
with printf("%s\n", &make_person().name[0]); everything
works as expected.

Why does this happen? Isn't make_person().name a pointer
to the array's first element?

No.  make_person() returns a struct by value, which has a
field identified by .name.  That field is an array of 40
chars.  It is a portion of the return struct, which has
never been put in accessible memory.

Your alleged 'good' experience with lcc shows a bug in
lcc.

What bug does it show?
 
C

CBFalconer

Keith said:
.... snip ...

But C99 6.3.2.1p3 says:

Except when it is the operand of the sizeof operator or the
unary & operator, or is a string literal used to initialize
an array, an expression that has type "array of _type_" is
converted to an expression with type "pointer to _type_"
that points to the initial element of the array object and
is not an lvalue. If the array object has register storage
class, the behavior is undefined.

The phrase "the array object" implies that there must be an array
object somewhere. In the case we're considering (the return
value of a function that returns a struct with an array member),
there is no array object whose initial element we can point to.
Is there other (normative) wording that clarifies this?

It seems that the new wording in n1336, creating an object with
"temporary lifetime" (6.2.4p7) is intended to avoid the need for
non-lvalue arrays.

My assumption is that you're right and that I've missed something.

I don't think so. Remember that n1336 is not the C99 standard, but
a draft for a new C0x system. I don't believe that any 'temporary
lifetime' storage for function results will survive - it will
involve too many ugly inefficiencies.
 
C

CBFalconer

Martin said:
It still surprises me, since _my_ copy of gcc 4.2.3 with -W -Wall
0std=c99 -pedantic neither reports the diagnostic nor eegfaults.

Therefore it would be useful for DiAv01 to report the compiler
version he used, and on what system it was running.
 
C

CBFalconer

Nick said:
.... snip ...


so is the program incorrect? Or did a particular version of gcc
have a bug? If the program is exhibiting UB then translating it and
running it to produce the "expected" behaviour is perfectly ok!

The program is incorrect. There is no requirement for UB to cause
an error message.
 
M

Martien Verbruggen

DiAvOl wrote:

The OP wrote, which you snipped, but responded to:
Your alleged 'good' experience with lcc shows a bug in lcc. I
don't know if you mean lcc-win32

Can you explain why that is a bug in lcc or lcc-win32?

It seems to me after reading this thread -- particularly the posts by
Larry Jones -- that the behaviour under c89 for that code is undefined,
which means that producing the expected behaviour is perfectly valid. It
also appears that under c99 the code is defined, and is supposed to
produce the expected behaviour.

Since the lcc compiler that the OP used produces the expected behaviour
(printing the string "alexander"), which is valid behaviour under both
c89 and c99, how can there be a bug?

Did you mean that a diagnostic was required? If so, can you explain why?

Martien
 
C

CBFalconer

jacob said:
DiAvOl wrote:
.... snip ...


Yes, lcc-win compiles and executes correctly your code. As does
MSVC, that correctly executes it.

No, lcc-win's UB is such as to hide the problem.
 
K

Keith Thompson

CBFalconer said:
DiAvOl said:
#include <stdio.h>

typedef struct person {
char name[40];
int age;
} Person;

static Person make_person(void);

int main(void) {
printf("%s\n", make_person().name);
return 0;
}

static Person make_person(void) {
static Person p = { "alexander", 18 };
return p;
}

The above small program when compiled without the -std=c99 option
(using gcc 4.2.3) gives me a warning:
"warning: format ‘%s’ expects type ‘char *’, but argument 2 has
type ‘char[40]’"
and also fails with a segmentation fault when executed.

If I replace the line printf("%s\n", make_person().name); with
printf("%s\n", &make_person().name[0]); everything works as
expected.

Why does this happen? Isn't make_person().name a pointer to the
array's first element?

No. make_person() returns a struct by value, which has a field
identified by .name. That field is an array of 40 chars. It is a
portion of the return struct, which has never been put in
accessible memory.

Your alleged 'good' experience with lcc shows a bug in lcc. I
don't know if you mean lcc-win32 (which has quite a few known
insects) or lcc (which is less well known here).

What bug are you referring to? In another followup in this thread,
you said that the program's behavior is undefined; if so, anything
lcc-win does is permitted, and in this particular case its behavior
seems reasonable.

Larry Jones says that the stated behavior of lcc (or lcc-win) is what
was intended for C99. I'm skeptical that the C99 standard actually
states this, but N1336, the first draft for C1X, makes it explicit
that there is a temporary object. (It might arguably be a constraint
violation in C90, but lcc-win doesn't claim to support C90.)
 
K

Keith Thompson

CBFalconer said:
I don't think so. Remember that n1336 is not the C99 standard, but
a draft for a new C0x system.

Yes, I know what n1336 is. (It's C201X, BTW, or C1X if you want to be
terse; the final document presumably won't be ready before the end of
next year.)
I don't believe that any 'temporary
lifetime' storage for function results will survive - it will
involve too many ugly inefficiencies.

How so? It seems to me that the most natural way to implement a
function returning a struct value involves, in effect, creating an
object of the struct type somewhere in memory and setting it to the
value to be returned. In fact, unless the struct is no bigger than a
machine word, I'm having trouble thinking of an plausible
implementation scheme that doesn't do this.

The current C standard doesn't mention such an object, so it doesn't
necessarily exist *as an object*. In the abstract machine, there's
just a struct value floating around somewhere. This causes serious
conceptual problems for this corner case. where a reference to an
array member of this struct value *needs* the object to exist.

The proposed change for C201X makes this object explicit, but only
when the struct (or union; I just noticed that) has an array member.

As far as I can tell, this only affects a function returning a struct
with an array member, something that I think is fairly rare; it's more
common to deal with structs, especially large ones, by passing
pointers around. And it's likely to mandate what many compilers
already do.

Where are the "ugly inefficiencies"? And what's your proposed
alternative to the temporary object? Would you leave the behavior of
this corner case undefined?
 
M

Martin Ambuhl

CBFalconer said:
Therefore it would be useful for DiAv01 to report the compiler
version he used, and on what system it was running.

No, I misunderstood. He is *not* using -std=c99 and attempting to use a
c99 construct. I misread 'without' as 'with' in
> The above small program when compiled without the -std=c99 option
This reading error was probably triggered by my inability to realize
that someone might try to use a c99-only construct in a c89 environment
and then ask why it didn't work.
 
D

DiAvOl

No, I misunderstood.  He is *not* using -std=c99 and attempting to use a
c99 construct.  I misread 'without' as 'with' in
 > The above small program when compiled without the -std=c99 option
This reading error was probably triggered by my inability to realize
that someone might try to use a c99-only construct in a c89 environment
and then ask why it didn't work.

That is exactly the reason I asked, to understand why it works for C99
and not for C89

I won't use such a construct in C89 of course, I'm trying to
understand (and I think i do understand now) why it works (or does
not!!) this way in C89

Why is this hard to understand?
 
K

Keith Thompson

DiAvOl said:
That is exactly the reason I asked, to understand why it works for C99
and not for C89

I won't use such a construct in C89 of course, I'm trying to
understand (and I think i do understand now) why it works (or does
not!!) this way in C89

In C89/C90, the implicit conversion of an expression of array type to
a pointer to the array object's first element occurs only
when the array expression is an lvalue. Quoting the C90 standard:

Except when it is the operand of the sizeof operator or the unary
& operator, or is a character string literal used to initialize an
array of character type, or is a wide string literal used to
initialize an array with element type compatible with wchar_t, an
lvalue that has type "array of _type_" is converted to an
expression that has type "pointer to _type_" that points to the
initial element of the array object and is not an lvalue.

Your array expression "make_person().name" is not an lvalue, so the
conversion doesn't occur. It's unclear what happens next. I *think*
you're passing the array by value to printf, which normally isn't
possible; since printf is expecting a char* due to the "%s" format,
the behavior is undefined. But when you assign the result of
make_person() to the object p, the expression p.name *is* an lvalue,
the conversion does occur, and everything works.

C99 drops the requirement for the array expression to be an lvalue, so
the array-to-pointer conversion does occur, even for
"make_person().name". The problem, though, is that it's not at all
clear what "the array object" is. Arguably if there's no lvalue, then
there's no array object, and the standard's requirement is
meaningless.

C1X proposes to create an implicit temporary object in this case, so
"make_person().name" *is* an lvalue. It's been suggested that this
was the intent for C99, but I'm not convinced -- but perhaps the
authors of gcc were convinced.

Arrays in C are almost always treated as second-class objects. It's
almost impossible to obtain an expression of array type that doesn't
refer to an array object. I believe this can *only* occur when the
array is a member of a struct or union, and that struct or union is
returned from a function -- which is itself the only way (I think) to
obtain an expression of struct or union type that doesn't refer to an
object of struct or union type.

Whether accidentally or deliberately, you've run into an obscure
corner of the language where even the experts don't necessarily agree
on what's supposed to happen. Your best bet, if you're actually
trying to get some work done, is to avoid the issue and use an
explicit temporary.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,226
Members
46,815
Latest member
treekmostly22

Latest Threads

Top