Some sort of weird round off error -- please help!

almurph · Mar 6, 2009

Hi,

can you help me with the following anomaly please? I'm really at my
wits end. I need the help of some C people like you.

I am debugging some pre-99 C code. I have a logical contradiction as
follows (btw the printf stuff is my diagnostics):

float Sum=0.0;
float N

printf("Sum before: %.15f\n", Sum);

Sum += N * Log(N); <--- this is the original line of code

printf("N: %.15f\n", N);
printf("Log(N): %.15f\n", Log(N));
printf("Product of N and Log(N): %.15f\n", N * Log(N));
printf("Sum: %.15f\n", Sum)

The results are as follows:

Sum Before: 0.000000000000000
N: 4322.00000000000000
Log(N): 12.077483366859508
Product of N and Log(N): 52198.883068346796790
Sum: 52198.882812500000000

what confuses me is the value of Sum.

When I multiply N and Log(N) by hand calculator I get:

4322 * 12.077483366859508 = 52198.883068346793576

but this value does not tally with the valaue of Sum which starts to
differ in the 3rd decimal place i.e.

52198.882812500000000
^
starts to differ here for some reason.

For the life of me I do not understand why this is occuring? That is,
why the apparent contradiction between the RHS and the LHS of the
simple line of code:

Sum += N * Log(N);

it can't be the "+=" operator can it? I've heard about round-off
error but this just stumps me. Can anyone please help me. Would
greatly appreciate any comments/code-sampels/explanations/suggestions/
insight that you may be able to offer.

Thanking you,
Al.
The confused!

PS: Don't worry about Log(N) - just is just a macro and gets the
value: log(N) / log(2)

James Kuyper · Mar 6, 2009

Hi,

can you help me with the following anomaly please? I'm really at my
wits end. I need the help of some C people like you.

I am debugging some pre-99 C code. I have a logical contradiction as
follows (btw the printf stuff is my diagnostics):

float Sum=0.0;
float N

printf("Sum before: %.15f\n", Sum);

Sum += N * Log(N); <--- this is the original line of code

printf("N: %.15f\n", N);
printf("Log(N): %.15f\n", Log(N));
printf("Product of N and Log(N): %.15f\n", N * Log(N));
printf("Sum: %.15f\n", Sum)

The results are as follows:

Sum Before: 0.000000000000000
N: 4322.00000000000000
Log(N): 12.077483366859508
Product of N and Log(N): 52198.883068346796790
Sum: 52198.882812500000000

what confuses me is the value of Sum.

When I multiply N and Log(N) by hand calculator I get:

4322 * 12.077483366859508 = 52198.883068346793576

but this value does not tally with the valaue of Sum which starts to
differ in the 3rd decimal place i.e.

52198.882812500000000
^
starts to differ here for some reason.

For the life of me I do not understand why this is occuring? That is,
why the apparent contradiction between the RHS and the LHS of the
simple line of code:

Sum += N * Log(N);

it can't be the "+=" operator can it? I've heard about round-off
error but this just stumps me.

Apparently you haven't heard enough about it. The standard only requires
that FLT_EPSILON be smaller that 1E-5; a more typical value is
1.19209290e-7, but that still only gives you 7 significant digits.
That's not "digits after the decimal point", but "significant digits".

In this case, you can't expect 52198.882 to be accurate to better than
FLT_EPSILON*52198.882, or +/- 0.006.

There's the additional, less important issue: your code performs
multiplication and calls the log() function, each of which introduces
it's own error, on top of round-off error. The C standard does not,
itself, specify the accuracy of ANY floating point operations. However,
if your implementation conforms to IEEE/IEC 60559. then you get much
tighter guarantees on the accuracy, but nothing that will help you get
better than about 7 significant digits out of a single precision
floating point number. If you really need more precision, you'll need to
use doubles rather than floats.

badger · Mar 6, 2009

is it the casting? i.e. double instead of float?

Richard Tobin · Mar 6, 2009

For the life of me I do not understand why this is occuring? That is,
why the apparent contradiction between the RHS and the LHS of the
simple line of code:

Sum += N * Log(N);

The RHS is evaluated as a double, and then truncated to a float
when assigned to Sum. If you used double instead of float
everywhere, the values would be the same.

-- Richard

Mark Wooding · Mar 6, 2009

[email protected] said:
float Sum=0.0;
float N

Missing semicolon here. (For those following at home, N is set to 4322
at some point.)

printf("Sum before: %.15f\n", Sum);

Sum += N * Log(N); <--- this is the original line of code

(Again, for those at home, Log is a macro which computes binary logs,
often notated log_2 or lg, by computing log(x)/log(2).)

printf("N: %.15f\n", N);
printf("Log(N): %.15f\n", Log(N));
printf("Product of N and Log(N): %.15f\n", N * Log(N));
printf("Sum: %.15f\n", Sum)

The results are as follows:

Sum Before: 0.000000000000000
N: 4322.00000000000000
Log(N): 12.077483366859508
Product of N and Log(N): 52198.883068346796790
Sum: 52198.882812500000000

what confuses me is the value of Sum.

This isn't so mysterious. There are three floating-point types in C:
`float', `double' and `long double'. Despite the name, you probably
want `double' for most purposes; `float' typically uses less memory, and
is therefore suitable for large arrays, but typically has too little
precision for naÃ¯ve use.

The <math.h> function `log' accepts as an argument and returns as a
result a value of type `double'. Your variable `N' has type `float',
which is therefore converted to `double'. The result of `Log(N)'
has type `double'. When we multiply this by `N', the operand `N' is
promoted to `double'. When you print the value, you see a fairly
high-precision result.

However, you now add this high-precision result onto `sum', which is a
`float'. The addition is carried out to `double' precision, but then
demoted to `float' for storage, which is why you see the imprecise value
when you print `sum'.

Note that `printf' arguments of type `float' get promoted to `double'
automatically, and `%f' actually prints `double'-precision numbers.q

Fix: live with it because it's what you asked for, or use `double'
rather than `float'.

-- [mdw]

almurph · Mar 6, 2009

The RHS is evaluated as a double, and then truncated to a float
when assigned to Sum. If you used double instead of float
everywhere, the values would be the same.

-- Richard

Hi guys,

Thanks you both for the responses - its been an education. It worked
- I inserted the following alteration:

Sum += (float)(N * Utilities.Log(N))

which gives me the the right answer (specifically the value:
52198.8828125).

But I have another slight problem (sorry!). This line of code is in a
loop and the next iteration produces the wrong result. Its just
slightly out from my C# equivalent implementation.

The values for the second iteration are as follows:

Sum Before: 52198.8828125 (correct)
N: 10132 (correct)
Log(N): 13.3066313617125 (correct)
Product of N and Log(N): 134822.788956871
(correct)
Sum: 187021.6640625 (slightly incorrect)

but the correct value of sum should be: 187021.671875000000000

I have N and Sum implemented as doubles to extract more precision
but am confused as to why I can't emulate the C behaviour.

Once again guys would appreciate any comments/suggestions/code-
samples that you may be able to provide. Really appreciate it.

Thanks a mill,
Al.

Keith Thompson · Mar 6, 2009

[email protected] said:
can you help me with the following anomaly please? I'm really at my
wits end. I need the help of some C people like you.

I am debugging some pre-99 C code. I have a logical contradiction as
follows (btw the printf stuff is my diagnostics):

float Sum=0.0;
float N

printf("Sum before: %.15f\n", Sum);

Sum += N * Log(N); <--- this is the original line of code

printf("N: %.15f\n", N);
printf("Log(N): %.15f\n", Log(N));
printf("Product of N and Log(N): %.15f\n", N * Log(N));
printf("Sum: %.15f\n", Sum)

The results are as follows:

Sum Before: 0.000000000000000
N: 4322.00000000000000
Log(N): 12.077483366859508
Product of N and Log(N): 52198.883068346796790
Sum: 52198.882812500000000

what confuses me is the value of Sum.

When I multiply N and Log(N) by hand calculator I get:

4322 * 12.077483366859508 = 52198.883068346793576

but this value does not tally with the valaue of Sum which starts to
differ in the 3rd decimal place i.e.

52198.882812500000000
^
starts to differ here for some reason.

For the life of me I do not understand why this is occuring? That is,
why the apparent contradiction between the RHS and the LHS of the
simple line of code:

Sum += N * Log(N);

it can't be the "+=" operator can it? I've heard about round-off
error but this just stumps me. Can anyone please help me. Would
greatly appreciate any comments/code-sampels/explanations/suggestions/
insight that you may be able to offer.
[...]

PS: Don't worry about Log(N) - just is just a macro and gets the
value: log(N) / log(2)

It would be a lot easier to help you if you posted actual code.
You've obviously re-typed the above code rather than copying and
pasting it from your actual program (note the missing semicolons on
the declaration of N and the last printf, and the fact that the
spacing of the presented output differs from what would be produced by
your printf calls). If you don't know what the problem is, then
almost by definition you don't know which part of your program we need
to see.

In any case, the problem is that the log() function returns a result
of type double, but you're storing your results in variables of type
float, discarding some of your precision. Your second and third
printf calls print double values directly; your fourth prints the
value of a float object (the value is implicitly promoted to double,
but that doesn't restore the lost precision).

It's almost always better to use double rather than float (or long
double if you need even more precision and your compiler uses a wider
representation for long double than for double). The only real reason
to use float is to save space in memory; for just a few variables, the
space saving is insignificant, and could easily be wiped out by the
increase in code size.

Keith Thompson · Mar 6, 2009

badger said:
is it the casting? i.e. double instead of float?

Is what the casting? Please quote some context when you post a
followup.

There were no casts in the article that started this thread, so no, it
isn't the casting. There were some implicit conversions between
double and float, and those are what caused the problem the original
poster was asking about. (A cast is an operator; there's no such
thing as an implicit cast. A conversion can be either explicit (i.e.,
a cast) or implicit.)

Keith Thompson · Mar 6, 2009

[email protected] said:
Thanks you both for the responses - its been an education. It worked
- I inserted the following alteration:

Sum += (float)(N * Utilities.Log(N))

which gives me the the right answer (specifically the value:
52198.8828125).

Hmm. I'm a bit surprised that adding a cast made any difference. The
value is stored in Sum, which you've declared as float, so it's going
to be narrowed from double to float anyway. The addition shouldn't
make any difference, since the previous value of Sum was 0.0.

In any case, adding an explicit conversion to float just means that
you're explicit throwing away some of your precision bits. A better
solution would be to declare everything as double rather than float
(unless you have some specific requirement to use float).

But it's hard to tell unless you show us your actual code (i.e.,
copy-and-paste the actual code that you fed to the compiler).
Trimming the code down to the minimal compilable example that
illustrates the problem is helpful.

But I have another slight problem (sorry!). This line of code is in a
loop and the next iteration produces the wrong result. Its just
slightly out from my C# equivalent implementation.

The values for the second iteration are as follows:
[snip]
Sum: 187021.6640625 (slightly incorrect)

but the correct value of sum should be: 187021.671875000000000

I'm not going to try to figure this out without seeing your code, but
how did you determine what the correct value should be?

[...]

almurph · Mar 6, 2009

Thanks you both for the responses - its been an education. It worked
- I inserted the following alteration:

Click to expand...

Sum += (float)(N * Utilities.Log(N))

Click to expand...

which gives me the the right answer (specifically the value:
52198.8828125).

Click to expand...

Hmm. I'm a bit surprised that adding a cast made any difference. The
value is stored in Sum, which you've declared as float, so it's going
to be narrowed from double to float anyway. The addition shouldn't
make any difference, since the previous value of Sum was 0.0.

In any case, adding an explicit conversion to float just means that
you're explicit throwing away some of your precision bits. A better
solution would be to declare everything as double rather than float
(unless you have some specific requirement to use float).

But it's hard to tell unless you show us your actual code (i.e.,
copy-and-paste the actual code that you fed to the compiler).
Trimming the code down to the minimal compilable example that
illustrates the problem is helpful.

But I have another slight problem (sorry!). This line of code is in a
loop and the next iteration produces the wrong result. Its just
slightly out from my C# equivalent implementation.

Click to expand...

The values for the second iteration are as follows:
[snip]
Sum: 187021.6640625 (slightly incorrect)

Click to expand...

but the correct value of sum should be: 187021.671875000000000

Click to expand...

I'm not going to try to figure this out without seeing your code, but
how did you determine what the correct value should be?

[...]

--
Keith Thompson (The_Other_Keith) (e-mail address removed) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"- Hide quoted text -

- Show quoted text -

Actual Code (hope this helps). i am trying to emulate the following C
in C#:

float Sum=0.0;
float N;
float Total;

for(v = Min; V <= Max; ++V)
{

N = A[a];

Sum += N * Log(N);

Total += N;

}

for the Log bit:

#define Log2 0.69314718055994530942

#define Log(y) ((y) <= 0 ? 0.0 : log( (float) y ) / Log2)

Keith Thompson · Mar 6, 2009

[email protected] said:
Actual Code (hope this helps). i am trying to emulate the following C
in C#:

float Sum=0.0;
float N;
float Total;

for(v = Min; V <= Max; ++V)
{

N = A[a];

Sum += N * Log(N);

Total += N;

}

By "actual code", I meant something that I can copy-and-paste into a
file on my own system, where I can compile and execute it. The above
is missing a main function, any output, any declarations for Min
and Max, a definition of Log, and a consistent spelling of v vs. V.

If you manually typed it into your newsreader, it's not actual code.
I could take the above fragment, copy it into a C source file, and
fix it to the point where I can compile and run it -- but since
you already have the actual code, I'm not going to waste my time
trying to reproduce it.

In any case, if your actual C code is similar to the paraphrase
you've posted, then I'd say it's bad C. There's no good reason I
can think of to use float rather than double.

If you *really* need to emulate some C code that uses float (with
implicit conversions from float to double and back to float), the
best we can do is describe the semantics of the C code (but only
if you show it to us!). I can't help you do the same thing in C#;
I don't even know whether C# has anything corresponding to C's
distinction between float and double.

Have you tried posting these questions to some C# forum? If not, why?

almurph · Mar 6, 2009

[...]

Actual Code (hope this helps). i am trying to emulate the following C
in C#:

Click to expand...

float Sum=0.0;
float N;
float Total;

Click to expand...

for(v = Min; V <= Max; ++V)
{

Click to expand...

N = A[a];

Click to expand...

Sum += N * Log(N);

Click to expand...

Total += N;

Click to expand...

}

Click to expand...

By "actual code", I meant something that I can copy-and-paste into a
file on my own system, where I can compile and execute it. The above
is missing a main function, any output, any declarations for Min
and Max, a definition of Log, and a consistent spelling of v vs. V.

If you manually typed it into your newsreader, it's not actual code.
I could take the above fragment, copy it into a C source file, and
fix it to the point where I can compile and run it -- but since
you already have the actual code, I'm not going to waste my time
trying to reproduce it.

In any case, if your actual C code is similar to the paraphrase
you've posted, then I'd say it's bad C. There's no good reason I
can think of to use float rather than double.

If you *really* need to emulate some C code that uses float (with
implicit conversions from float to double and back to float), the
best we can do is describe the semantics of the C code (but only
if you show it to us!). I can't help you do the same thing in C#;
I don't even know whether C# has anything corresponding to C's
distinction between float and double.

Have you tried posting these questions to some C# forum? If not, why?

--
Keith Thompson (The_Other_Keith) (e-mail address removed) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"- Hide quoted text -

- Show quoted text -

I just though that the C forum would be a better place to get help
first, not sure why. I have implemeneted a C# version using doubles so
my results are probably more accurate than the authors (its legacy
code I'm translation).
Problem is though - it does not give the exact same results. I see
the author is using the value 1E-5 in his version. This goes back to
James's post about the FLT_EPSILON - "FLT_EPSILON be smaller that
1E-5; a more typical value is 1.19209290e-7"
Perhaps I should replace that contant wit hthe more accurate one
(1.19209290e-7), or whatever the double equivalent of this is...

Al.
Slightly less confused.

Ben Bacarisse · Mar 6, 2009

Keith Thompson said:
Hmm. I'm a bit surprised that adding a cast made any difference. The
value is stored in Sum, which you've declared as float, so it's going
to be narrowed from double to float anyway. The addition shouldn't
make any difference, since the previous value of Sum was 0.0.

It would not matter in C but I think the code fragment above is C#. I
point this out in case this happens again. In this thread, the poster
sometimes posts the original C and sometimes the attempted C#
equivalent.

I'm not going to try to figure this out without seeing your code, but
how did you determine what the correct value should be?

Again, just in case it helps in the future... It seems that the OP is
re-implementing something like a neural net simulation in C#. The
"correct value" will be the one the original C code uses. If I am
right, to get the same results the C# version will have to get very
close to the C values since such nets are often chaotic rather than
linear. The overall behaviour of the system should not depend on the
rounding errors, but testing and debugging will be easier if the two
version behave identically at every level. I fear this is an almost
impossible goal...

jameskuyper · Mar 6, 2009

....
I just though that the C forum would be a better place to get help
first, not sure why.

If any aspect of your question depends upon C#, this is the wrong
place to post it; you should have learned that by now from the
responses to your previous questions.

... I have implemeneted a C# version using doubles so
my results are probably more accurate than the authors (its legacy
code I'm translation).
Problem is though - it does not give the exact same results. I see

If your C# version using 'double' produces different results from your
C version using doubles, and the C code is an accurate translation of
the C# code, then it may come down to minor differences in the way the
two compilers handle floating point operations. If that's the case,
you're going to need someone who is an expert in both the internals of
the C# compiler you're using and of the C compiler you're using. Good
luck on finding such a person.

If the C code is NOT an accurate translation of the C# code, then
you'll need an expert in both C and C# to look at both programs. It
sounds like Ben might be able to help you there (I'm not volunteering
his time - I don't have that authority - just noting that he seems to
have the necessary skills). However, no one can help you unless you
provide the actual text of both programs, not just small hand-typed
excerpts from it.

the author is using the value 1E-5 in his version. This goes back to
James's post about the FLT_EPSILON - "FLT_EPSILON be smaller that
1E-5; a more typical value is 1.19209290e-7"
Perhaps I should replace that contant wit hthe more accurate one
(1.19209290e-7), or whatever the double equivalent of this is...

The most accurate value is whichever one is actually #defined in your
implementation's float.h header. You don't set FLT_EPSILON, it gets
set by <float.h>, and changing the value defined in float.h won't have
any effect on how your program actually behaves.

The double equivalent is DBL_EPSILON. The long double equivalent is
LDBL_EPSILON.

CBFalconer · Mar 7, 2009

[email protected] said:
can you help me with the following anomaly please? I'm really at
my wits end. I need the help of some C people like you.

I am debugging some pre-99 C code. I have a logical contradiction
as follows (btw the printf stuff is my diagnostics):

All floating point values are approximations. The accuracy of the
result depends on the accuracy of those approximations.

Keith Thompson · Mar 7, 2009

CBFalconer said:
All floating point values are approximations. The accuracy of the
result depends on the accuracy of those approximations.

Yes, that's a nice generalization.

The OP's problem was analyzed in considerably more detail, and the
actual problem was diagnosed, within a couple of hours of the original
post.

*Please* read the whole thread before you post a followup.

Ben Bacarisse · Mar 7, 2009

jameskuyper said:
If the C code is NOT an accurate translation of the C# code, then
you'll need an expert in both C and C# to look at both programs. It
sounds like Ben might be able to help you there (I'm not volunteering
his time - I don't have that authority - just noting that he seems to
have the necessary skills).

Me? No. Everything I say about C# is either a wild guess or was read
a few seconds before posting. (I have a copy of the C# specification
due to having had a... disagreement with certain EN over on
comp.programming.)

Richard Tobin · Mar 7, 2009

The RHS is evaluated as a double,

[/QUOTE]

What happens if FLT_EVAL_METHOD is 2?

I ignored that possiblity because the OP said "pre-C99" (though he
might be using a C99 compiler anyway), and because I've never looked
at that stuff in detail.

If I understand correctly, it would be evaluated as a long double
(which may of course be the same as double, though I don't see why an
implementation would set it to 2 in that case).

In any case it would be ocnverted to a double for the printf(), so
the inconsistent values he sees must be float and double versions.

-- Richard

Richard Tobin · Mar 7, 2009

Product of N and Log(N): 52198.883068346796790
Sum: 52198.882812500000000 ....
52198.882812500000000
^
starts to differ here for some reason.

A minor point: it doesn't start to differ here. (For those not using
a fixed-width font, "here" is at the first zero.) It actually starts
to differ after 52198.88. Because floating point uses binary on
almost all machines, truncation from double to float doesn't result
just result in decimal digits getting replaced with zeros. But any
floating point number printed out to sufficient precision will
eventually end in zeros, because any binary fraction terminates as a
decimal (because 10 is divisible by 2).

-- Richard

Richard · Mar 7, 2009

CBFalconer said:
All floating point values are approximations. The accuracy of the
result depends on the accuracy of those approximations.

I don't think I ever heard so much nonsense from one man.

So, 4.0 is an approximation of 4 is it?

SENTINEL CONTROL LOOP WHEN DEALING WITH TWO ARRAYS	1	Oct 26, 2023
Weird Behavior with Rays in C and OpenGL	4	Feb 13, 2024
Trying to get the average value of the elements, please help ! JavaScript	3	Dec 13, 2022
Help! (Beginner)	2	Nov 29, 2019
Merge Sort on linked list..my code is almost done..please help me onit	9	Apr 26, 2010
Help a beginner - function with pointer ...	19	Jul 13, 2009
Please help me to find the error	3	Oct 30, 2007
converting floating point types round off error ....	13	Oct 5, 2008

Some sort of weird round off error -- please help!

almurph

James Kuyper

badger

Richard Tobin

Mark Wooding

almurph

Keith Thompson

Keith Thompson

Keith Thompson

almurph

Keith Thompson

almurph

Ben Bacarisse

jameskuyper

CBFalconer

Keith Thompson

Ben Bacarisse

Richard Tobin

Richard Tobin

Richard

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads