Returning a struct from a function - strange behavior

R

Richard Tobin

MORON: Your Honor, the compiler doesn't conform to C89!

JUDGE: What was the problem?

MORON: I compiled a program containing comments written in another
standard and the compiler did NOT complain.

JUDGE: But the generated program executed OK?

MORON: Well... yes your Honor.

JUDEGE: Well, then there are no damages?

Sure, there's no damage to someone who wants a program that doesn't
conform to C89 to work.

But that's not always the situation. Typically I want to compile
a program that is supposed to conform to C89 (at least in respect
of C89/C99 differences), and it's a bug that I need to correct if
it doesn't conform to C89. And that's because I want my program
to be portable to other C89 implementations.

So here's the alternative scenario:

Programmer: Your Honour, the compiler doesn't conform to C89!

Judge: What was the problem?

Programmer: I inadvertently used a // comment and a C99-style
compound literal, and the compiler didn't produce any
diagnostics, so when I sold my program I got
complaints from customers who couldn't compile it.
Some of them demanded their money back and bought
a competing product instead.

Now the programmer here is clearly unwise, since all compilers contain
bugs, and you can't rely on one compiler to detect all errors. But
that fact remains that such a compiler wouldn't conform to C89.

Aside: C is actually quite liberal in this respect. All it requires
is a diagnostic. XML on the other hand requires that an XML processor
MUST NOT continue normal processing after a well-formedness error.
This was prompted by the "browser wars" of the mid-1990, when each
vendor added their own extensions to HTML with the obvious
consequences for interoperability. The XML authors were determined
that this sort of thing should not happen with XML.

-- Richard
 
J

James Kuyper

jacob navia wrote:
....
MORON: Your Honor, the compiler doesn't conform to C89!

JUDGE: What was the problem?

MORON: I compiled a program containing comments written in another
standard and the compiler did NOT complain.

JUDGE: But the generated program executed OK?

MORON: Well... yes your Honor.

JUDEGE: Well, then there are no damages?

Not until he delivered the code to his client, who's using a different
compiler which actually conforms to C90, and didn't accept the code.
Then there were damages.
 
C

CBFalconer

Keith said:
.... snip ...

And what's your alternative? If I write a function that returns
a struct that contains an array, and I refer to the array member
of the result of a call to that function, what should happen?
Constraint error? Undefined behavior? Mandatory nasal demons?

Create a location that can hold the function result. Write:

foo result;
...
result = getfoo(params);
/* operate on foo.whatever */

No demons in sight.
 
B

Ben Bacarisse

CBFalconer said:
Create a location that can hold the function result. Write:

foo result;
...
result = getfoo(params);
/* operate on foo.whatever */

No demons in sight.

That does not answer Keith Thompson's question. What meaning would
give the code in question? In other words he was asking how would you
re-word the standard, not asking how would you re-write the code to
avoid the issue.
 
K

Keith Thompson

Ben Bacarisse said:
That does not answer Keith Thompson's question. What meaning would
give the code in question? In other words he was asking how would you
re-word the standard, not asking how would you re-write the code to
avoid the issue.

Exactly.
 
B

Ben Bacarisse

CBFalconer said:
Oh, so you want special handling, depending on the type of function
result?

Why not? I certainly don't want the compiler to do the same thing
when it calls

int rand(void);

as it does when it calls

struct huge foo(void);

I want it to know how to make the best use of my machine's
architecture and that almost certainly means doing "special handling
depending on the type of the function result".
 
T

Tim Rentsch

Keith Thompson said:
Yes, in N1336. :)

In C90, the section you quoted said that only lvalues with array type
are converted to pointers; C99 relaxed that to allow all array type
expressions to be converted. As you note, that leads to a bit of
cognitive dissonance since an object suddenly appears in the midst of a
value, which doesn't otherwise happen. Nonetheless, you can safely
assume that it's magically created out of the luminiferous aether as
required and mysteriously evaporates again at the next sequence point.
The committee was loathe to say anything more about such objects since
it opens a can of worms (e.g., what storage duration and lifetime they
have), but we bit the bullet for C1X.

[... C90 stuff ...]

I decline to believe that the phrase "the array object" in the C99
standard actually causes such an object to be created (or, more
precisely, imposes a requirement on implementers to arrange for such
an object to be created). In particular, I see no implied guarantee
that "the array object" will continue to exist until the next sequence
point. If the called function returns the value of an object of
struct type, then "the array object" could plausibly refer to the
array member of that object, which could be local to the function and
therefore nonexistent after the function returns.

An object is just a region of memory that can hold values.
The value returned by a function must be held somewhere;
since where ever that is can hold a value, it is an object.
The only question, as you point out, is what its lifetime is.

One interpretation (only slightly perverse) is that in C99
(and probably C90 as well) the lifetime for such an object
must be either static or automatic. (I don't mean to make
the argument, since it's not really a strong argument, but
I think it's worth noting.) The point of the language in
N1336 is not that the result is an object (because it already
must be), but to clarify what the lifetimes of such objects
are.
 
T

Tim Rentsch

CBFalconer said:
But that provision would force the storage of all those small
values. That is where the inefficiency comes in. Also consider
the interactions.

I think if you work through what's actually required, you'll
see there really is no extra burden. If f is a function
that takes a (struct s), and g is a function that returns
a (struct s), then

f( g() )

has the same storage requirements as

f( s )

presuming s is a variable of type (struct s). If we're going
to accept f(s) then it isn't any more costly to accept f(g()).
 
K

Keith Thompson

Tim Rentsch said:
Keith Thompson said:
The phrase "the array object" implies that there must be an array
object somewhere. In the case we're considering (the return value of
a function that returns a struct with an array member), there is no
array object whose initial element we can point to. Is there other
(normative) wording that clarifies this?

Yes, in N1336. :)

In C90, the section you quoted said that only lvalues with array type
are converted to pointers; C99 relaxed that to allow all array type
expressions to be converted. As you note, that leads to a bit of
cognitive dissonance since an object suddenly appears in the midst of a
value, which doesn't otherwise happen. Nonetheless, you can safely
assume that it's magically created out of the luminiferous aether as
required and mysteriously evaporates again at the next sequence point.
The committee was loathe to say anything more about such objects since
it opens a can of worms (e.g., what storage duration and lifetime they
have), but we bit the bullet for C1X.

[... C90 stuff ...]

I decline to believe that the phrase "the array object" in the C99
standard actually causes such an object to be created (or, more
precisely, imposes a requirement on implementers to arrange for such
an object to be created). In particular, I see no implied guarantee
that "the array object" will continue to exist until the next sequence
point. If the called function returns the value of an object of
struct type, then "the array object" could plausibly refer to the
array member of that object, which could be local to the function and
therefore nonexistent after the function returns.

An object is just a region of memory that can hold values.
The value returned by a function must be held somewhere;
since where ever that is can hold a value, it is an object.
The only question, as you point out, is what its lifetime is.

One interpretation (only slightly perverse) is that in C99
(and probably C90 as well) the lifetime for such an object
must be either static or automatic. (I don't mean to make
the argument, since it's not really a strong argument, but
I think it's worth noting.) The point of the language in
N1336 is not that the result is an object (because it already
must be), but to clarify what the lifetimes of such objects
are.

I disagree.

C, or at least the C abstract machine, distinguishes strongly between
objects and values. An object may hold a value, but a value does not
necessarily require an object to hold it.

A value is the result of evaluating an expresssion. Given, for
example:

int y = 2 * x + 1;

the expression "2 * x" and "2 * x + 1" have values, but those values
don't have objects associated with them. (They might be stored in
memory, but that's an implementation detail.)

And in the following:

int func(void) { return 41; }
...
int x = func() + 1;

there is no object whose value is 41. A function result is just
another expression value. The value specified in a function's return
statement must be communicated somehow to the caller, so that it can
become the result of the function call expression, but again, there's
no implied object. Whether the value happens to be a simple scalar or
some huge structure makes no conceptual difference -- in most cases.

And it wouldn't make any difference in the case under discussion
(accessing an array member of struct value returned from a function)
except for C's (rather odd) rule that an expression of array type is
usually converted to a pointer value. (Note that this isn't a
conversion in the usual sense; an array value consists of the values
of the array elements, and there's no way to get a pointer value from
that other than my compiler magic.)

But since an attempt to refer to an array value implies, because of
The Rule, the existence of an array object, there has to be an object
in this one bizarre case.
 
J

jameskuyper

CBFalconer said:
Oh, so you want special handling, depending on the type of function
result?

Yes, I do want to retain the current state of affairs. Handling return
values of different types in different fashions is the norm, not the
exception. It is commonplace for functions that return small scalar
values to store the return value in a register; a strategy that is
just plain impossible for most structs that contain arrays. It is no
horrendous inefficiency to require a change in the handling of those
few array-containing structs that happen to be small enough to fit in
a register.
 
J

jacob navia

Yes, I do want to retain the current state of affairs. Handling return
values of different types in different fashions is the norm, not the
exception. It is commonplace for functions that return small scalar
values to store the return value in a register; a strategy that is
just plain impossible for most structs that contain arrays. It is no
horrendous inefficiency to require a change in the handling of those
few array-containing structs that happen to be small enough to fit in
a register.

This could be more difficult than you imagine.

For instance under windows, the ABI specifies that structures
smaller or equal than intptr_t should be returned in a register
anyway. For instance

struct s { char a[sizeof(intptr_t)];};

will be returned in a register. It has an array of chars OK but
its whole size is <= EAX.

Sometimes, bigger quantities are returned in the EAX:EDX register
pair, with 64 bits worth of data.

In 64 bits the situation is even "worse" since structures with up to
128 bits can be returned in RDX:RAX. Lcc-win uses those for returning
a 128 bit integer, for instance.

Floating point data is returned in the FPU under windows 32 bits,
but returned in the XMM register 0 in 64 bit windows.

Since the register file has increased enormously in the last decade,
machines like the PC can return up an array of 32 double precision data
in the floating point register file. If memory serves, the Power PC
architecture has a 32 double precision data register file.

This is a can of worms since specifying how to return data for every
processor could be well an impossible task.

To carry on your idea, the standard should stay at a very high level
of specifications leaving the details to be filled by the
implementation.
 
J

jameskuyper

jacob said:
(e-mail address removed) wrote: ....

This could be more difficult than you imagine.

Yes, I'm sure it can be. However, you yourself have already done it,
so it must be doable; and you have only a small staff, so there's an
upper limit to how difficult it must have been to do it.

More to the point, C++ already implements a more complicated version
of the same concept, so there's a large body of relevant practical
experience that can be drawn on. The C concept is more limited in both
applicability and in the lifetime of the temporary object, so it
should be, if anything, substantially easier to implement than the C++
concept.
To carry on your idea, the standard should stay at a very high level
of specifications leaving the details to be filled by the
implementation.

I agree - the proposed change only defines the behavior of a given
code construct; it's entirely up to the implementor to decide how to
make that behavior happen.
 
C

CBFalconer

Ben said:
That does not answer Keith Thompson's question. What meaning would
give the code in question? In other words he was asking how would
you re-word the standard, not asking how would you re-write the
code to avoid the issue.

Return to the specification of C99. Maybe add a comment somewhere
pointing out that a function result has to be used in whole.
 
C

CBFalconer

Ben said:
.... snip ...


Why not? I certainly don't want the compiler to do the same
thing when it calls

int rand(void);

as it does when it calls

struct huge foo(void);

I want it to know how to make the best use of my machine's
architecture and that almost certainly means doing "special
handling depending on the type of the function result".

Well, I think there is no melding of our attitude to compilers and
how and what they should translate. Note that I did not object to
things done during optimization. Don't forget that this whole
argument is about fooling around with sub-components of that
function result.
 
C

CBFalconer

Yes, I'm sure it can be. However, you yourself have already done,
it so it must be doable; and you have only a small staff, so
there's an upper limit to how difficult it must have been to do it.

No he hasn't. What he has done is implement it on one type of CPU
running one OS. This is much different from being able to do it on
any machine anywhere. It's still an accomplishment, but it is not
definitive.
 
C

CBFalconer

Tim said:
.... snip ...


I think if you work through what's actually required, you'll
see there really is no extra burden. If f is a function
that takes a (struct s), and g is a function that returns
a (struct s), then

f( g() )

has the same storage requirements as

f( s )

presuming s is a variable of type (struct s). If we're going
to accept f(s) then it isn't any more costly to accept f(g()).

But what if you have a function f that takes two (struct s)s as
parameters. You want to write:

f(g(1).a, g(2).b)

(remember that the limitation is against using sub-portions of
those returned structures). I maintain you should write:

gv1 = g(1); gv2 = g(2);
f(gv1.a, gv2.b);

and nobody is confused, including the compiler.
 
N

Nate Eldredge

CBFalconer said:
Return to the specification of C99. Maybe add a comment somewhere
pointing out that a function result has to be used in whole.

Could we rephrase this as: "forbid the application of the . operator to
a struct value returned from a function" ?

This would break backwards compatibility, of course. Code like

struct foo { int a; int b; };
struct foo blah(void);
void bar(void) {
int x = blah().a;
/* ... */
}

is perfectly legal in C90 and C99, as far as I can tell, but would be
illegal under your proposed change. You're willing to go that far?

I also think it's somewhat un-C-like to force the programmer to use data
she doesn't care about. You can discard an entire return value without
a peep from the compiler, why shouldn't you be able to discard half of
it?
 
J

James Kuyper

CBFalconer said:
No he hasn't. What he has done is implement it on one type of CPU
running one OS. This is much different from being able to do it on
any machine anywhere. It's still an accomplishment, but it is not
definitive.

The "it" I'm talking about involves just one platform at a time; there's
not need to tackle all possible machines at the same time. Leave
something for the next person to do! He did "it" for one machine, and
that's all I was talking about.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,228
Members
46,818
Latest member
SapanaCarpetStudio

Latest Threads

Top