breaking from a loop only works when I compile with optimisation

L

lloyd

My program enumerates arrangements of a specified combinatorial
object, at different orders. It's recursive and has to go pretty deep.
If I invoke it with the "quick" flag, then instead of telling me how
many arrangements of the object there are at a particular order, it
just tells me as soon as it finds *any*, and exits--this gives me a
quick answer about existence, when I don't care how many solutions
there are (which is sometimes lengthy to compute). The way I
implemented this is by using the solution counter, which is a global
variable: I run a comparison at the head of each loop and if
num_solutions > 0 then I break, and the program surfaces from all the
recursed inner function calls that way.

At least that's how it's supposed to work. And I spent hours trying to
find why it didn't break out and end after finding a solution, as I
was sure my code was sound. In the end I happened to remember a
similar problem I'd had once, where compiling at a different
optimisation changed some detail of how the program worked, and when I
compiled it (with gcc) using -O2 it worked as it was supposed to. Very
frustrating!

I can't possibly paste the code here as it is too nasty, and it would
take a lot of work to get it down to minimum size. But I thought I'd
ask if anyone knows this issue, and whether it is likely to be a bug
in my code, or something badly phrased that -O2 somehow accidentally
fixes, or whether it's more likely to be a bug in gcc as invoked
without optimisation.

Thanks! --Lloyd
 
S

Seebs

I can't possibly paste the code here as it is too nasty, and it would
take a lot of work to get it down to minimum size. But I thought I'd
ask if anyone knows this issue, and whether it is likely to be a bug
in my code, or something badly phrased that -O2 somehow accidentally
fixes, or whether it's more likely to be a bug in gcc as invoked
without optimisation.

It is more likely that it's a bug in your code, but not guaranteed.

Work on stripping it down. Keep in mind, the reproducer doesn't
have to WORK -- it just has to produce different results at two
optimization levels. e.g. you could try dummying out a large calculation
and replacing it with "x = 1;", and so on. That should make it
possible to get rid of a TON of code.

Just... Do it on a copy, and be sure you have backups, and that
your source control system (you're using one, right?) is all happy.

-s
 
E

Eric Sosman

My program enumerates arrangements of a specified combinatorial
object, at different orders. It's recursive and has to go pretty deep.
If I invoke it with the "quick" flag, then instead of telling me how
many arrangements of the object there are at a particular order, it
just tells me as soon as it finds *any*, and exits--this gives me a
quick answer about existence, when I don't care how many solutions
there are (which is sometimes lengthy to compute). The way I
implemented this is by using the solution counter, which is a global
variable: I run a comparison at the head of each loop and if
num_solutions> 0 then I break, and the program surfaces from all the
recursed inner function calls that way.

At least that's how it's supposed to work. And I spent hours trying to
find why it didn't break out and end after finding a solution, as I
was sure my code was sound. In the end I happened to remember a
similar problem I'd had once, where compiling at a different
optimisation changed some detail of how the program worked, and when I
compiled it (with gcc) using -O2 it worked as it was supposed to. Very
frustrating!

I can't possibly paste the code here as it is too nasty, and it would
take a lot of work to get it down to minimum size. But I thought I'd
ask if anyone knows this issue, and whether it is likely to be a bug
in my code, or something badly phrased that -O2 somehow accidentally
fixes, or whether it's more likely to be a bug in gcc as invoked
without optimisation.

It's impossible to say with certainty where the problem lies, so
I'm thrown back upon generic assumptions. There are two plausible
sources for the problem:

1) gcc, a large and intricate piece of software with many chances
for subtle errors, but maintained and developed over three or so
decades by hordes of skilled developers, and now trusted as THE engine
for compiling even larger software artifacts, like all of Linux.

2) Lloyd.

I hope you will not be insulted, not even in the slightest, if
I opt for (2) on the scanty evidence available. Were more evidence
at hand I might well change my mind -- but it ain't, and I don't.

One thing you should try is to crank up the diagnostic level just
as high as you can. I routinely use "-W -Wall -ansi -pedantic" (or
"-std=c99" instead of "-ansi"), and I believe it's possible to go
higher still. If gcc starts issuing warnings about obviously correct
constructs, your first response should be to re-examine "obvious."
 
L

lloyd

Work on stripping it down.

You know, I just looked over my program, which has been accumulating
bits and pieces for several years, and it would take me days to strip
it down. So if this bug or whatever it is doesn't ring any bells for
someone who could guess what's going on, it'll just be easier for me
just to remember to compile with -O2 than to figure it out properly.
(If programming was a more central part of my work I would of course
want to figure it out properly, but C is "just" an occasional problem-
solving tool for me.)

Thanks though - Lloyd
 
S

Seebs

You know, I just looked over my program, which has been accumulating
bits and pieces for several years, and it would take me days to strip
it down.

All the more reason to do it. This is a big fat warning sign that
something is probably VERY wrong.
So if this bug or whatever it is doesn't ring any bells for
someone who could guess what's going on, it'll just be easier for me
just to remember to compile with -O2 than to figure it out properly.

NO NO NO NO NO NO NO NO!

If you do not know what is wrong, it is entirely possible that you
are getting NONSENSE RESULTS. Which you might not notice right away.

Does anyone, ever, in any way, depend on the output of this program?
If not, why does it exist? If they do, is it okay if it lies to them?
(If programming was a more central part of my work I would of course
want to figure it out properly, but C is "just" an occasional problem-
solving tool for me.)

And it's okay if the solutions it gives you are wrong, then? No problem
if every so often a few decimal places get shifted? Doesn't matter if
you sometimes get a negative number where the right answer was positive?

You *KNOW* that something is wrong. It might be the compiler. It
might be your code. Either way, *something* is wrong. You don't know
what all it can screw up, but you know it can screw up something...

-s
 
E

Eric Sosman

You know, I just looked over my program, which has been accumulating
bits and pieces for several years, and it would take me days to strip
it down. So if this bug or whatever it is doesn't ring any bells for
someone who could guess what's going on, it'll just be easier for me
just to remember to compile with -O2 than to figure it out properly.
(If programming was a more central part of my work I would of course
want to figure it out properly, but C is "just" an occasional problem-
solving tool for me.)

I'll second Seebs: That's a terrible idea. I don't know what your
field of endeavor is; let's just suppose it's astronomy. You're doing
careful measurements of magenta-shifted Z-ray emissions, and everything
seems plausible except on days when you lunch on baloney sandwiches.
Eat tuna salad, chicken salad, ham and cheese, microwaved pizza, all
measurements seem reasonable. Eat baloney, you get nonsense. Imagine
using these measurements in your PhD thesis, and imagine an examiner
at your orals saying "It's strange, Mr. Lloyd, but there's a lot less
data here than I'd have expected for the amount of telescope time you
used." And imagine yourself replying "There *was* more, but I threw
it out because of the baloney sandwiches." Now tell yourself you're
an honest man who's capable of writing a reliable report. With a
straight face, if you can manage it ...

Here's what you *know*: Something strange is happening.

Here's what you *mustn't* do: Imagine that the strangeness you've
happened to notice is the only strangeness.
 
L

luserXtrog

     One thing you should try is to crank up the diagnostic level just
as high as you can.  I routinely use "-W -Wall -ansi -pedantic" (or
"-std=c99" instead of "-ansi"), and I believe it's possible to go
higher still.  If gcc starts issuing warnings about obviously correct
constructs, your first response should be to re-examine "obvious."

And don't forget lint!
It may take sometime to feed it the right options to make it ignore
little things you *know* are ok. But once you've done that, it just
might show you where the funny stuff is.
 
H

H Vlems

You know, I just looked over my program, which has been accumulating
bits and pieces for several years, and it would take me days to strip
it down. So if this bug or whatever it is doesn't ring any bells for
someone who could guess what's going on, it'll just be easier for me
just to remember to compile with -O2 than to figure it out properly.
(If programming was a more central part of my work I would of course
want to figure it out properly, but C is "just" an occasional problem-
solving tool for me.)

Thanks though - Lloyd

As others have argued, that is not the best approach. Without even a
fragment of the source code
this is difficult to diagnose. Given that compiler optimisation seems
to play a role, is boolean expression handling may be the problem?
The execution of a boolean expression like (A and B and C) is usually
stopped as soon as one of the three booleans evaluate as false.
If one of these is, say, a function then that function may not be
guaranteed to execute.
And perhaps gcc may decide to alter the order in which A, B and C are
computed.
Compiler optimisation was intended to improve execution speeds for
flawlessly written programs, not to hide flaws in them!
Hans
 
M

Moi

My program enumerates arrangements of a specified combinatorial object,
at different orders. It's recursive and has to go pretty deep. If I
invoke it with the "quick" flag, then instead of telling me how many
arrangements of the object there are at a particular order, it just
tells me as soon as it finds *any*, and exits--this gives me a quick
answer about existence, when I don't care how many solutions there are
(which is sometimes lengthy to compute). The way I implemented this is
by using the solution counter, which is a global variable: I run a
comparison at the head of each loop and if num_solutions > 0 then I
break, and the program surfaces from all the recursed inner function
calls that way.

At least that's how it's supposed to work. And I spent hours trying to
find why it didn't break out and end after finding a solution, as I was
sure my code was sound. In the end I happened to remember a similar
problem I'd had once, where compiling at a different optimisation
changed some detail of how the program worked, and when I compiled it
(with gcc) using -O2 it worked as it was supposed to. Very frustrating!

The usual suspects:

1) stale object files (from a previous compile with different flags of #defines)

2) Memory overwrites. Since your flags is global, it may be adjacent to a global
array that is (sometimes) being addressed out of bounds. Add some slack to test.

3) Shadowed variables. Try renaming some.

4) variables that are uninitialized before the first time the loop is entered.

5) Plain thinko. Try "minimal refactoring": replace the breaks by returns or goto's

....
....
....
666) compiler error.

HTH,
AvK
 
T

Tom St Denis

And don't forget lint!
It may take sometime to feed it the right options to make it ignore
little things you *know* are ok. But once you've done that, it just
might show you where the funny stuff is.

I find no settings for lint produce useful diagnostics. Worse, I can
get both splint and PC-lint to miss an uninitialized variable bug with
ease.

Valgrind is more useful, even though it has false positives as well it
more often than not does catch the uninitialized bugs and off-by-
one's.

Tom
 
N

Nick Keighley

My program enumerates arrangements of a specified combinatorial
object, at different orders. It's recursive and has to go pretty deep.
If I invoke it with the "quick" flag, then instead of telling me how
many arrangements of the object there are at a particular order, it
just tells me as soon as it finds *any*, and exits--this gives me a
quick answer about existence, when I don't care how many solutions
there are (which is sometimes lengthy to compute). The way I
implemented this is by using the solution counter, which is a global
variable: I run a comparison at the head of each loop and if
num_solutions > 0 then I break, and the program surfaces from all the
recursed inner function calls that way.

how doe sit do this? Does it use multiple returns or something? Can
you trace it in a debugger or put printf()s in? This might give you a
clue as to which bits "work" and which bits "fail".
 
L

lloyd

Seebs said:
Does anyone, ever, in any way, depend on the output of this program?
If not, why does it exist?  If they do, is it okay if it lies to them?

Thanks to all for your righteous concern (no sarcasm intended,
honestly). My program does nothing mission-critical for anyone, it is
used to produce solutions to a mathematical problem with artistic
applications, so no buildings will fall down.

Once I get through the hell that is the beginning of a new academic
semester I will try stripping the program down to isolate the bug,
more out of sheer morbid curiosity than because the possibility of
wrong results is so awful. The program is a little convoluted in
structure but deliberately not too clever in how it does it, so I can
understand it.

To Eric: I actually have no doubt that it's Lloyd who's doing
something dodgy, and not gcc.

Thanks everyone.
 
S

Seebs

Seebs said:
Thanks to all for your righteous concern (no sarcasm intended,
honestly). My program does nothing mission-critical for anyone, it is
used to produce solutions to a mathematical problem with artistic
applications, so no buildings will fall down.

That's encouraging. :)
Once I get through the hell that is the beginning of a new academic
semester I will try stripping the program down to isolate the bug,
more out of sheer morbid curiosity than because the possibility of
wrong results is so awful. The program is a little convoluted in
structure but deliberately not too clever in how it does it, so I can
understand it.

Cool. And yes, under the circumstances, waiting for a more opportune
time might be reasonable.

If the program is in modules, note that you can sometimes narrow things
down by compiling them at different optimization levels; this will often,
but not always, tell you where the problem is.
To Eric: I actually have no doubt that it's Lloyd who's doing
something dodgy, and not gcc.

Well, that tears it, you'll have found a compiler bug. :p

-s
 
J

Jon Wild

Seebs said:
Well, that tears it, you'll have found a compiler bug.  :p

In case that wasn't clear, I meant I have no doubt (i.e. I believe it
to be true) that the dodginess is on my part, not on gcc's part.
 
L

lloyd

Seeb said:
Well, that tears it, you'll have found a compiler bug.  :p

In case I wasn't clear, I meant that I do not doubt (i.e. I believe it
to be true) that the dodginess is on my part, not on the part of gcc.
 
K

Keith Thompson

lloyd said:
In case I wasn't clear, I meant that I do not doubt (i.e. I believe it
to be true) that the dodginess is on my part, not on the part of gcc.

I think it was clear; the ":p" was a smiley.

(BTW, he's "Seebs", not "Seeb". Are you constructing attribution lines
manually?)
 
G

Gene

Once I get through the hell that is the beginning of a new academic
semester I will try stripping the program down to isolate the bug,
more out of sheer morbid curiosity than because the possibility of
wrong results is so awful. The program is a little convoluted in
structure but deliberately not too clever in how it does it, so I can
understand it.

Bravo. I join with all others on the point that an optimization-
dependent bug is a smoking gun; you need to go find the bullet. It
may be in your own foot.

After you've resolved the problem, you might want to replace the
global variable for computation-unwinding with setjmp() and a
longjmp() at the point where a complete solution is found. See your
favorite documentation.

Cheers.
 
S

Seebs

In case that wasn't clear, I meant I have no doubt (i.e. I believe it
to be true) that the dodginess is on my part, not on gcc's part.

Right. And the moment you are *sure* it's your bug, the compiler will
have a bug just to throw you off the scent.

Henry Spencer once said "Do not lie to the compiler, for it will have its
revenge." At this point, enough people have lied to compilers often enough
that they actively seek revenge on all humans as a matter of principle.

-s
 
L

lloyd

Keith Thompson was all like:
(BTW, he's "Seebs", not "Seeb".  Are you constructing attribution lines
manually?)

Heavens no, I always use the computer keyboard to type them in :p
 
L

lloyd

Gene said:
After you've resolved the problem, you might want to replace the
global variable for computation-unwinding with setjmp() and a
longjmp() at the point where a complete solution is found.  See your
favorite documentation.

Hey, I just looked into this longjmp thing. How long has this been
going on? Thanks for the hint.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,954
Messages
2,570,114
Members
46,702
Latest member
VernitaGow

Latest Threads

Top