UB when flowing off end of value-returning function

Ö

Öö Tiib

So presumably a Java compiler has to make sure there is no such path.

Java and C# compilers sometimes analyse such things deeply but
sometimes seemingly add run-time checks that throw. Amount of
undefined behaviors is very low in these languages.

C++ may do the same, but then it may cause loss of efficiency of
compiler if analyses deeply or resulting executable if run-time checks
are added. I already imagine protests.

Alteratively C++ may add an expression type (some "never") that
guarantees (and specifies) that function (like exit) or expression
(like throw) never returns. It might aid optimizations too on some
cases. Again it may cause some things not backward compatible so there
will be protests. IOW it is hard to get rid of undefined behavior.

Currently there are plenty of code where one compiler warns about "non-
returning control path" if dummy is missing and other warns about "un-
reachable code" when dummy is present. The diagnostics can be silenced
by reorganizing code, but not always.

If lambdas are not warning on VC 10 then ... aren't you one of
architects of the very compiler there?
 
N

Niklas Holsti

Scott said:
I can't help wondering whether this is a technology limitation, a blast
from the past, or just laziness on the part of compiler writers. We've
been writing compilers for some 50 years now, is it really the case that
asking compiler writers to track control flow arcs out of a function and
make sure a value is returned along all of them (excluding
exception-based paths) really asking too much? My guess is no, but I'm
not a compiler writer.

What do other, more modern, languages do?

If you will permit me to include Ada in that category:

If an Ada function falls of the end, the Program_Error exception is
raised at the point of call.

Statically, an Ada function must contain at least one return statement,
but it need not be at the end. The GNAT compiler warns if there is a
path that falls of the end of the function, but it does not analyse the
logic very deeply to verify that the path is feasible.

A non-returning Ada procedure (void function) can be marked with the
pragma No_Return. Value-returning functions cannot be so marked.
 
H

Hans Bos

Scott Meyers said:
So presumably a Java compiler has to make sure there is no such path.

The java compiler assumes that functions always return.
E.g.

void throw_error()
{
throw new RuntimeException();
}

int f()
{
throw_error();
}

Gives a missing return statement error.

Note that if throw_error was in another class, the code could be changed
without recompiling this class.
Since java loads all classes dynamically, you can never know during
compilation if a function will never return during runtime.

Greetings,
Hans.
 
S

Scott Meyers

If lambdas are not warning on VC 10 then ... aren't you one of
architects of the very compiler there?

No. I'm self-employed, and have never worked for Microsoft (or any other
compiler vendor).

Scott
 
G

gwowen

If lambdas are not warning on VC 10 then ... aren't you one of
architects of the very compiler there?

You're probably thinking of Herb Sutter. (It's even easier to mistake
the titles "Effective C++" and "Exceptional C++")
 
V

Victor Bazarov

You're probably thinking of Herb Sutter. (It's even easier to mistake
the titles "Effective C++" and "Exceptional C++")

AFAICT, both titles are exceptional :)

V
 
J

James Kanze

Both C++98/03 and draft C++0x say this:
Does anybody know why this is undefined behavior instead of
a hard error?

Probably mainly historical reasons. Way, way back, before void,
functions implicitly returned int, if you didn't specify
anything else; functions which were logically void still
implicitly returned int, and often flowed off the end. By
declaring this undefined behavior, instead of a hard error, the
C folks allowed implementations to continue supporting such
code. And C++ follows suite for reasons of C compatibility.
It has an interesting implication for lambda expressions.
Lambdas declaring a return type but returning nothing yield UB
and, with the compilers I tested, don't necessarily issue
a warning:
auto f = []()->int { std::cout << "Oops, I forgot to return something"; };
All enlightenment appreciated.

There was never any question of requiring an error from the
compiler. It's almost impossible to determine reliably. It
wouldn't be very difficult (nor add much overhead) to require
some sort of runtime error, say abort after an implementation
defined message. For practical purposes, you can't do it for
int; some code that was written before void might still be in
use, fixed up just enough for it to pass the compiler. But it
would seem a reasonable requirement for other types.
 
J

James Kanze

[...]
It is.
Yes Balog suggested issuing a runtime error else-thread and
said s(he) couldn't see much benefit; neither do I if I am
honest. Undefined behaviour includes crashing and one has to
expect crashes when one creates bugs. :) The argument in
favour of what you suggest is the insidious possibilty of UB
silently working resulting in undiscovered bugs. I guess this
is a QoI issue.

The problem is that for return types like int, the code probably
will work, but with a random value (whatever happened to be in
EAX on an Intel, for example). Catching such errors would be
useful; they're probably more frequent (and less expensive to
catch) than a lot of errors debugging implementations currently
catch.
 
J

James Kanze

On 11/2/2010 12:09 PM, Pete Becker wrote:
I can't help wondering whether this is a technology
limitation, a blast from the past, or just laziness on the
part of compiler writers. We've been writing compilers for
some 50 years now, is it really the case that asking compiler
writers to track control flow arcs out of a function and make
sure a value is returned along all of them (excluding
exception-based paths) really asking too much? My guess is
no, but I'm not a compiler writer.

It's not always trivial. I've gotten warnings about falling off
the end with code like:

wchar_t
getNextChar(char const* utf8Sequence)
{
switch (*utf8Sequence & 0xC0)
{
case 0x00:
case 0x40:
return *utf8Sequence;
case 0x80:
throw IllegalSequenceError();
case 0xC0:
return getMultibyte(utf8Sequence);
}
}

A compiler could catch this, but it would require extra
processing. In order to catch all such particular cases, the
extra processing (for this and all other possible cases) is
likely to lead to unacceptable compile times. In fact, most
compilers do almost no flow analysis unless optimization is
turned on, because it is so costly in compile time.
What do other, more modern, languages do? As far as I know,
there is no UB in languages like Java and C#, so how do they
deal with this issue?

Java specifies the exact analysis the compiler is required to
do. And says it's an error if the function would flow off the
end. In cases like those described here, Java would require
a dummy return statement at the end in order to prevent
a compiler error.

This is acceptable in Java because you can only return "simple"
types: basic types or pointers. It's not acceptable in C++
because you can return very complex types, which may not have
a default constructor.
If they reject such code during compilation (and I don't know
if they do), that would suggest that tracking control flow
arcs is not an unreasonable burden to put on compiler writers.

Perfect tracking isn't possible. Java's solution is to require
some tracking (I can't find how much), and to require a dummy
return which will never be executed otherwise. Depending on the
return type, either "return 0;" or "return null;" will work.
That's not the case in C++.
 
J

James Kanze

On 11/2/2010 4:34 PM, LR wrote:

[...]
And again I ask what Java and C# do with this kind of code?

Java fails to compile, with an error message "missing return
statement". You can suppress the error either by adding a dummy
return or a throw statement at the end of the function.
 
J

James Kanze

I can't say I know the answer, and what I'm about to say
probably will generate a lot of heat, but it seems to me that
if the compiler can't statically determine if a value will be
returned, then maybe the logic of the function is too obtuse.

Would you say that my code with the switch is "too obtuse"? Or
what about:

virtual int conditionallySupported()
{
logAndThrowError("some message");
}

in an inheritance hierarchy, where this particular derived class
doesn't support the functionality.
 
R

Rui Maciel

Scott said:
I can't help wondering whether this is a technology limitation, a blast
from the past, or just laziness on the part of compiler writers. We've
been writing compilers for some 50 years now, is it really the case that
asking compiler writers to track control flow arcs out of a function and
make sure a value is returned along all of them (excluding
exception-based paths) really asking too much? My guess is no, but I'm
not a compiler writer.

That's a good question. It would be interesting to know what was the
logic behind the decision to leave this part of the standard as undefined
behavior. Some C++ compilers manage to detect a "flow off" apparently
without any trouble, which leads to believe that this is doable.
Moreover, I'm not able to imagine any situation where it is desireable to
let programmers write code in a way that they don't know what values are
being returned by their own routines. So, what exactly forced this into
the standard?


Rui Maciel
 
L

LR

James said:
Would you say that my code with the switch is "too obtuse"? Or
what about:

virtual int conditionallySupported()
{
logAndThrowError("some message");
}

in an inheritance hierarchy, where this particular derived class
doesn't support the functionality.


A comment elsethread about java loading classes at run time reminded me
that the compiler might not ever get a look at the code in
logAndThrowError if it's in a library or an object file or whatever is
analogous on a particular platform. That would make a lot of code "too
obtuse" by the standard outlined above.

Although I wouldn't begin to know how to prove it, this problem seems
almost equivalent to the halting problem to me. I doubt that a general
solution can be found.

LR
 
D

Dilip

And again I ask what Java and C# do with this kind of code?  I'm a programming
monoglot, so I can't test it myself.  I have an opinion about what one might
reasonably expect of a compiler given the above code, but I'd like to know what
decision was reached by people designing languages for a living.

Scott

In addition to what Mr. Balog Pal posted, here is what the C# spec
says about what can happen inside a Method Body:
http://en.csharp-online.net/ECMA-334:_17.5.8_Method_body
 
B

Balog Pal

Scott Meyers said:
No. I'm self-employed, and have never worked for Microsoft (or any other
compiler vendor).

Guess you're getting confused with Herb Sutter. ;-)
 
F

Florian Weimer

* Scott Meyers:
So presumably a Java compiler has to make sure there is no such
path.

Yes, and it interacts very poorly with enums, e.g.

switch (e) {
case CASE_1:
return 1;
case CASE_2:
return 2;
}

is not flagged as returning a value even if the enum contains only two
constants CASE_1 and CASE_2. C++ would probably have to behave in the
same way, for completely different reasons. (The JVM can't pull
values out of thin air, and it does not know anything about enums, and
by extension the exhaustiveness of switch statements.)

Your initial example could be addressed by requiring that a function
not returning void must contain at least one return or asm statement.
(That's the rule used by Ada, by the way.)
 
M

Marcel Müller

Alf said:
C++0x currently attempts to address both issues.

However, I have no compiler that compiles the code below.

What a pity.
And Microsoft
has sort of promised to never implement C++0x attribute support.

Well, I did not expect anything else from MS. MSVC always was more or
less crap with respect to C++. We currently use 9.0 but I stopped to
bother about that around 7.1.

But hopefully it gives the idea of roughly how such C++0x code might look:

So let's wait and hope till compilers which support this are adequately
common. And of course, till the libraries are converted to support it.
(E.g. most of the math functions should be defined as constexpr.)

Btw. can constexpr also be applied to constructors?


Marcel
 
M

Marc

Marcel said:
(E.g. most of the math functions should be defined as constexpr.)

How can you handle rounding modes if the math functions are constexpr?
Not that C++ is very helpful there, as it failed to import the C99
pragmas.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,143
Messages
2,570,822
Members
47,368
Latest member
michaelsmithh

Latest Threads

Top