Floating point arithmetic model of C...

almurph · Mar 16, 2009

Hi,

Is anyone familiar with the floating point arithmetic that C uses? I
asking as I am trying to emulate this in C# and am having some
difficulty.

Comments appreciated,
Cheers,
Al.

Nate Eldredge · Mar 16, 2009

[email protected] said:
Hi,

Is anyone familiar with the floating point arithmetic that C uses? I
asking as I am trying to emulate this in C# and am having some
difficulty.

Yes, someone is familiar with it

More than one person, in fact, and
some of them read this newsgroup.

What's your question? If you can state it in a self-contained way,
without reference to C#, it will be easier for people here to answer.

Ben Pfaff · Mar 16, 2009

[email protected] said:
Is anyone familiar with the floating point arithmetic that C uses? I
asking as I am trying to emulate this in C# and am having some
difficulty.

If you can find a copy of the C standard, or a draft, around the
web, take a look at section 5.2.4.2.2 "Characteristics of
floating types <float.h>". It defines the C floating-point
model.

Ordinarily, I would post the proper excerpt here, but this
particular section of the standard has plenty of non-ASCII
characters and superscripts and subscripts that don't render
properly when cutting and pasting from the PDF, so I am afraid
that the excerpt would be more confusing than helpful.

osmium · Mar 16, 2009

Is anyone familiar with the floating point arithmetic that C uses? I
asking as I am trying to emulate this in C# and am having some
difficulty.

The details depend on the compiler and the associated hardware. You can
only make some gross inferences that apply to all C implementations.

jameskuyper · Mar 16, 2009

Hi,

Is anyone familiar with the floating point arithmetic that C uses?

Insofar as that question is meaningful, the answer is "Yes, a great
many people are familiar with it". So, what do you want to know about
it?

However, while the C standard mandates support for floating point
arithmetic, it leaves a great many details about it up to each
implementation to decide. Therefore, depending upon what it is that
you mean by your question, it's entirely possible that your use of the
word "the" is inappropriate. It's quite possible to get different
results for a given calculation when you use different compilers, even
if you're using identical source code and identical inputs. Decent
implementations are likely to have only small differences between
them, but even the best implementation for one platform stands a good
chance of giving slightly different results than the best
implementation for a platform that uses a different FPU.

If you're using a C99 implementation, and you discover that the
__STDC_IEC559__ macro has been predefined by the implementation, then
you should be able to rely upon conformance to IEEE/IEC 60559, which
imposes much stricter specifications about how floating point
arithmetic is performed. However, even IEEE/IEC 60559 gives some
freedome for different systems to give you different results.

... I
asking as I am trying to emulate this in C# and am having some
difficulty.

That seems quite plausible. However, you'll have to get a lot more
specific if you want to get any useful comments.

almurph · Mar 16, 2009

Insofar as that question is meaningful, the answer is "Yes, a great
many people are familiar with it". So, what do you want to know about
it?

However, while the C standard mandates support for floating point
arithmetic, it leaves a great many details about it up to each
implementation to decide. Therefore, depending upon what it is that
you mean by your question, it's entirely possible that your use of the
word "the" is inappropriate. It's quite possible to get different
results for a given calculation when you use different compilers, even
if you're using identical source code and identical inputs. Decent
implementations are likely to have only small differences between
them, but even the best implementation for one platform stands a good
chance of giving slightly different results than the best
implementation for a platform that uses a different FPU.

If you're using a C99 implementation, and you discover that the
__STDC_IEC559__ macro has been predefined by the implementation, then
you should be able to rely upon conformance to IEEE/IEC 60559, which
imposes much stricter specifications about how floating point
arithmetic is performed. However, even IEEE/IEC 60559 gives some
freedome for different systems to give you different results.

That seems quite plausible. However, you'll have to get a lot more
specific if you want to get any useful comments.

Thanks fior the comments guys. I have reverse engineered some 99-C
code into C# and have noticed an unusual bug(it causes the C# to not
work properly). I habve traced it down to a single method and have
compared results with the C version and noticed that small differences
in the results build up (as the calulation i s inside a loop with a +=
operator working on it).

Obviously the first thing I checked was the syntax. No error there.
Next on the list was the classic roundoff error. I'm using double so I
though that mine would be better. It is better (ie more accuratte) but
does not produced the right results. As mentioned before the small
difference that seems to creep in mount up. Next was to use
System.Single which resulted in some small improvement. So i was then
in the domain of "the floating point model in C is different *somehow*
from that of the C# model"
So I implement a C++ DLL of the offending method (as C++ is closer
to C than C# or so I have been told as regards floating point model) .
Worked a bit better but error still replicating.

Anyone with any ideas? I will post some code and the inputs and output
to show you what i mena in my next post. This is doing my head in....

Al.

Keith Thompson · Mar 16, 2009

Ben Pfaff said:
If you can find a copy of the C standard, or a draft, around the
web, take a look at section 5.2.4.2.2 "Characteristics of
floating types <float.h>". It defines the C floating-point
model.

The latest semi-official draft can be found at
http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf

See the section Ben cited and also Annex F, "IEC 60559 floating-point
arithmetic"; note that support for IEC 60559 is optional.

[...]

almurph · Mar 16, 2009

If you can find a copy of the C standard, or a draft, around the
web, take a look at section 5.2.4.2.2 "Characteristics of
floating types <float.h>". It defines the C floating-point
model.

Click to expand...

The latest semi-official draft can be found athttp://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf

See the section Ben cited and also Annex F, "IEC 60559 floating-point
arithmetic"; note that support for IEC 60559 is optional.

[...]

--
Keith Thompson (The_Other_Keith) (e-mail address removed) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Thanks guys. here are some more details:

float Sum = 0.0

Sum += N * Log(N);

where Log(N) is defined by a macro like:
#define Log(y) ((y) <= 0 ? 0.0 : log((float)y) /
0.69314718055994530942)

First iteration:

N = 4322;
Sum = 52198.883

Second Iteration:

N = 10131.0
Sum = 187006.92

Third Iteration (sum reset):

N = 169.00000
Sum = 1250.7487

Fourth Iteration:

N = 377
Sum = 4477.2734 (here I tend to get: 4477.2732, the beginning of the
small error! drat!)

Fifth Iteration (Sum reset again):

N = 4153.0
Sum = 49918.805

Sixth Iteration:

N = 9754.0
Sum = 179176.66 (here I tend to get 49918.80316)

So to summarise, ny problem is not simple roundoff error. If it was I
would have corrected it before. Rather its trying to emulate the C
compiler, specifically the floating point arithmetic model in a C#
environment.

Would appreciate any comments you may be able to offer..

Thanking you,
Al.

Keith Thompson · Mar 16, 2009

[email protected] said:
Thanks guys. here are some more details:

float Sum = 0.0

Sum += N * Log(N);

where Log(N) is defined by a macro like:
#define Log(y) ((y) <= 0 ? 0.0 : log((float)y) /
0.69314718055994530942)

So Log(y) is the base-2 logarithm of y. ("log2" or "lg" would be a
clearer name for that.)

Note that the log function takes a double argument and returns a
double result. If you really want to do your calculations in float
rather than double, see if your system has logf().

But I wonder why you don't just do the calculations in double. If the
existing program whose behavior you're trying to replicate is doing
its calculations in float, then it could probably get better results
by using double. Are you really required to replicate the relatively
imprecise results exactly? Would producing *better* results violate
your requirements?

First iteration:

N = 4322;
Sum = 52198.883

Second Iteration:

N = 10131.0
Sum = 187006.92

That looks correct.

Third Iteration (sum reset):

By "sum reset", I assume you mean that the value of Sum is set to 0.

N = 169.00000
Sum = 1250.7487

Fourth Iteration:

N = 377
Sum = 4477.2734 (here I tend to get: 4477.2732, the beginning of the
small error! drat!)

A small roundoff error isn't surprising. C's requirements for
floating-point are loose enough that either 4477.2734 or 4477.2732
could easily be a correct result.

Fifth Iteration (Sum reset again):

Which means that the preceding results are irrelevant to what follows.

N = 4153.0
Sum = 49918.805

Sixth Iteration:

N = 9754.0
Sum = 179176.66 (here I tend to get 49918.80316)

179176.66 (or thereabouts) is correct. If you're getting 49918.80316,
you're doing something very wrong.

But since you haven't shown us any actual code, it's impossible to
guess what the problem is.

If the programs that's producing the incorrect results is written in
C, show it to us, preferably after trimming it down to a minimal
complete program that exhibits the problem. If it's written in C#,
post it to a C# forum.

[...]

Jens Thoms Toerring · Mar 16, 2009

Thanks guys. here are some more details:

float Sum = 0.0

where Log(N) is defined by a macro like:
#define Log(y) ((y) <= 0 ? 0.0 : log((float)y) /
0.69314718055994530942)

So you're calculating the logarithm to base 2 (as long as the
argument is larger than 0). What's the cast to float in the
argument to log() good for? log() takes a double and you may
now make the computer do some extra work to get a lower pre-
cision...

Sum += N * Log(N);

What hapens here in C is that (acording to the defintion
of the macro) 'N' will converted to a float, i.e. a low
precision number. This is passed, after conversion back to
a double, to log(), which returns a double. The division
by log(2) is done in double and the multiplication with
'N' also. Then the result plus the (float) value in 'Sum'
are again added as doubles and the final result is then
truncated to float precision for storing it in 'Sum'.

If C# does e.g. the multiplication, division and addition
all in float then you are likely to end up with even larger
rounding errors than with the C program.

First iteration:

N = 4322;
Sum = 52198.883

Second Iteration:

N = 10131.0
Sum = 187006.92

Third Iteration (sum reset):

N = 169.00000
Sum = 1250.7487

Fourth Iteration:

N = 377
Sum = 4477.2734 (here I tend to get: 4477.2732, the beginning of the
small error! drat!)

Fifth Iteration (Sum reset again):

N = 4153.0
Sum = 49918.805

Sixth Iteration:

N = 9754.0
Sum = 179176.66 (here I tend to get 49918.80316)

All I can see here are small rounding errors, fully within the
minimum requirement of 6 decimal digits for C's float type. For
example the last result is off by less 2e-2, which is a relative
error of about 1e-8. Not too bad under the circumstances.

So to summarise, ny problem is not simple roundoff error.

What else is it? You're doing calculations with a rather low
precision and you thus get low precision results. And sometimes
you're lucky and rounding errors cancel each other and at other
times they add up. Nothing unusual here.

If it was I
would have corrected it before.

How would you propose to do that?

Rather its trying to emulate the C
compiler, specifically the floating point arithmetic model in a C#
environment.

There isn't one "model". There are minimum specifications, but
that's it. Take the same C program and run it on a machine with
a different architecture and you may get different results. And
all of them are correct as far as correctness goes with floating
point numbers. So, if you want to get the exact same results
from the C# program as from the C program then you're looking
for something that simply doesn't exist since there is not a
unique result of the C program. At best there's a unique result
for a certain combination of hardware, compiler and math library.
Getting the C# program to do exactly the same as one of all
the possible combinations would be the best you can achieve, but
is, of course, a complete waste of time since all it would do
would be to reproduce the same errors as resulting from the use
of just this combination, not a higher presision of the result.

Regards, Jens

CBFalconer · Mar 16, 2009

[email protected] said:
Is anyone familiar with the floating point arithmetic that C uses?
I asking as I am trying to emulate this in C# and am having some
difficulty.

It can use various forms. For details, see the C standard (C99
below). N869_txt.bz2 is a bzipped version of the last draft of
C99, and was the last version available in text form.

Some useful references about C:
<http://www.ungerhu.com/jxh/clc.welcome.txt>
<http://c-faq.com/> (C-faq)
<http://benpfaff.org/writings/clc/off-topic.html>
<http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf> (C99)
<http://cbfalconer.home.att.net/download/n869_txt.bz2> (pre-C99)
<http://www.dinkumware.com/c99.aspx> (C-library}
<http://gcc.gnu.org/onlinedocs/> (GNU docs)
<http://clc-wiki.net/wiki/C_community:comp.lang.c:Introduction>
<http://clc-wiki.net/wiki/Introduction_to_comp.lang.c>

CBFalconer · Mar 16, 2009

[email protected] said:
.... snip ...

Thanks fior the comments guys. I have reverse engineered some
99-C code into C# and have noticed an unusual bug(it causes the
C# to not work properly). ...

C# has nothing to do with C. It is off-topic on c.l.c.

Martin Ambuhl · Mar 16, 2009

Hi,

Is anyone familiar with the floating point arithmetic that C uses? I
asking as I am trying to emulate this in C# and am having some
difficulty.

You really haven't asked a question to which any reasonable response can
be given. There is no way to guess what difficulty you might be having,
and of course this is not the place to ask about how to do _anything_ in
a language other than C, especially not a proprietary language like C#.

Of course someone is familiar with floating point arithmetic in C. It
could hardly be otherwise, since most of us use floating point
arithmetic. C, of course, does not have hard-and-fast rules about how
floating point arithmetic is done. It could hardly be otherwise, since
not only do different computers and different FPUs have different data
representations and different hardware instructions, but different
software implementers may use them in different ways. For example,
Microsoft seems never have found a way to use long double to provide
either more precision or greater range than double, even though their
principal hardware would support ir.

C provides limits on the seen behavior of programs using floating point
rather than decreeing a particular implementation as Microsoft can with
its proprietary language. Some of those limits are minimum requirements
to be bet, and any implementation is free to do better. Implementations
will provide much information about their own floating point limits in
<float.h>, including the base radix, the rounding behavior, whether
arithmetic is done with the type expressions require or to higher rank
types (double or long double), the difference between 1.0 and the next
larger value (for each type), the minimum number of decimal digits, the
number of digits in the mantissa in the base used in the implementation,
how many decimal digits it takes to represent the largest supported
value, the smallest and largest values for each type, the smallest and
largest exponents for each type in both the implementation base and in
base 10. All of these, as long as they meet the standard's minimum
requirements, are subject to the implementer's decisions.

That having been said, many implementations choose to use the
floating-point standard in
IEC 60559:989, Binary floating-point arithmetic for microprocessor
design, 2nd edition.
The earlier names for this standard were
IEC 559:1989 and ANSI/IEEE 654-1985, IEEE Standard for Binary
Floating-point arithmetic
and
ANSI/IEEE 854-1987, IEEE Standard pfr Radix-Independent Floating-point
Arithmetic,
If, as implied by your question. C# does not use that standard, that is
a shocking situation.
The standard header <fenv.h> will be helpful to you.
The #prgama STDC FENV_ACCESS (resetable by the program) indicates
whether a C program will deal with control modes and status bits. The
macro FE_DEFL_ENV specifies the default floating-point environment. The
header also contains macros for sccesseing the status flags for each
supported FP exception, more macros concerning rounding behavior, a
prgrama for whether FP expressions can be optimized (contracted) to take
advantage of fast FP operations. And there are prototypes there for
functions to use or take advantage all this, which you can find in your
C reference manual by checking functions that have names beginning with
"fe".

And Microsoft is your problem with C#. We really don't discuss other
languages here, and especially not proprietary languages apart from one
developer who keeps hyping his own product, which happens to be a very
good one, even though off-topic here). The people who can tell you how
to do things in C# -- although I can't imagine why anyone would want to
-- hang out in C# newsgroups or subscribe to C# msiling lists.

Martin Ambuhl · Mar 16, 2009

Keith said:
See the section Ben cited and also Annex F, "IEC 60559 floating-point
arithmetic"; note that support for IEC 60559 is optional.

Also note that the macro to be defined if IEC 60559:989 is supported has
its name, __STDC_IEC_559__ taken from that standard's earlier
designation, IEC 559:1989.

Walter Banks · Mar 16, 2009

Keith said:
So Log(y) is the base-2 logarithm of y. ("log2" or "lg" would be a
clearer name for that.)

Note that the log function takes a double argument and returns a
double result. If you really want to do your calculations in float
rather than double, see if your system has logf().

But I wonder why you don't just do the calculations in double. If the
existing program whose behavior you're trying to replicate is doing
its calculations in float, then it could probably get better results
by using double. Are you really required to replicate the relatively
imprecise results exactly? Would producing *better* results violate
your requirements?

That looks correct.

By "sum reset", I assume you mean that the value of Sum is set to 0.

A small roundoff error isn't surprising. C's requirements for
floating-point are loose enough that either 4477.2734 or 4477.2732
could easily be a correct result.

Which means that the preceding results are irrelevant to what follows.

179176.66 (or thereabouts) is correct. If you're getting 49918.80316,
you're doing something very wrong.

But since you haven't shown us any actual code, it's impossible to
guess what the problem is.

If the programs that's producing the incorrect results is written in
C, show it to us, preferably after trimming it down to a minimal
complete program that exhibits the problem. If it's written in C#,
post it to a C# forum.

I think Keith is on the right track. log has a lot of implementation
variations depending on library. Logs implemented with a taylor
series sometimes have nasty errors.

One approach to track this down would be to replace the macro
with a call to a log function not supplied by the compiler vendor
and run the same code under C#. This might at least identify the
the error source.

Regards,

Ben Bacarisse · Mar 17, 2009

Thanks fior the comments guys. I have reverse engineered some 99-C
code into C# and have noticed an unusual bug(it causes the C# to not
work properly). I habve traced it down to a single method and have
compared results with the C version and noticed that small differences
in the results build up (as the calulation i s inside a loop with a +=
operator working on it).

Have you posted in a C# group yet? This is been suggested many
times. Understanding the C is only half of it and the other half
(the equivalent C#) is off topic here.

I hope you don't think the += is the same in the two languages. In
particular, the rules for promoting floating types are different. To
match the arithmetic between your versions you need to understand what
both languages do.

Peter Nilsson · Mar 17, 2009

Keith Thompson said:
...
The latest semi-official draft can be found at
<http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf>

That's the latest C99 draft (ISO/IEC 9899:TC3), but the latest
draft is ISO/IEC 9899:201x...

<http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1336.pdf>

Dik T. Winter · Mar 17, 2009

> Thanks fior the comments guys. I have reverse engineered some 99-C
> code into C# and have noticed an unusual bug(it causes the C# to not
> work properly). I habve traced it down to a single method and have
> compared results with the C version and noticed that small differences
> in the results build up (as the calulation i s inside a loop with a +=3D
> operator working on it).

Yes, that is a property of floating-point.

> Obviously the first thing I checked was the syntax. No error there.
> Next on the list was the classic roundoff error. I'm using double so I
> though that mine would be better. It is better (ie more accuratte) but
> does not produced the right results.

What *are* the right results when considering floating-point? Different
implementations will likely give different results. None of them being
wrong.

> As mentioned before the small
> difference that seems to creep in mount up. Next was to use
> System.Single which resulted in some small improvement. So i was then
> in the domain of "the floating point model in C is different *somehow*
> from that of the C# model"

As far as I know they both use Brent's model of floating point to define
things. But even within a single model there can be differences.

> Anyone with any ideas? I will post some code and the inputs and output
> to show you what i mena in my next post. This is doing my head in....

With floating-point there is not a single result that is correct. Rounding
errors are inherent in it, and how rounding is done can influence the
result a lot.

Keith Thompson · Mar 17, 2009

Peter Nilsson said:
That's the latest C99 draft (ISO/IEC 9899:TC3), but the latest
draft is ISO/IEC 9899:201x...

<http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1336.pdf>

Yes, but the 201x drafts (n1362.pdf is newer, BTW) have less official
standing.

The C99 standard (not available unless you pay for it) and the three
Technical Corrigenda (available at no charge) constitute the current
official ISO C standard. n1256.pdf consists of the C99 standard with
the three TCs merged into it; it's not an official ISO document, but
it was edited by a member of the committee, and it's close enough for
most purposes.

The C201X drafts are just that -- drafts. They're *very* preliminary
works in progress, consisting of n1256 plus whatever the committee has
gotten around to so far.

Guest · Mar 17, 2009

On 16 Mar, 18:20, "(e-mail address removed)" <[email protected]>
wrote:

I have reverse engineered some 99-C
code into C# and have noticed an unusual bug(it causes the C# to not
work properly).

define "not work properly". How do you know the original C code "works
properly"? How did you detect the "unusual bug"?

I habve traced it down to a single method and have
compared results with the C version and noticed that small differences
in the results build up (as the calulation i s inside a loop with a +=
operator working on it).

If it's a small piece of code could you post the C code here?
And tell us what result you expect it to produce.
And:-

POST THE C# TO A C# NEWSGROUP!!

Obviously the first thing I checked was the syntax.

I'd have expected your compiler to do that

No error there.
Next on the list was the classic roundoff error.

I must have been asleep in the lecture in my Numerical Analysis course
that dealt with the subject of "classic roundoff error". What *is*
CRE?

I'm using double so I
[thought] that mine would be better.

The C code uses float and the C# uses double? And you
expect to get the same result?

It is better (ie more accuratte) but
does not produced the right results.

it is more accurate, but wrong? Sorry, what does accurate but wrong
mean?

As mentioned before the small
difference that seems to creep in mount up. Next was to use
System.Single

no idea what System.Single is

which resulted in some small improvement. So i was then
in the domain of "the floating point model in C is different *somehow*
from that of the C# model"

I'm not convinced you've demonstrated this. You could have a mistake
in the
C# or in the original C. Or you could have some numerically unstable
computation that is just magnifying small differences in the
environment.

So I implement a C++ DLL of the offending method (as C++ is closer
to C than C# or so I have been told as regards floating point model) .
Worked a bit better but error still replicating.

"worked better", how do you know its better? How do you know the
errors
are replicating?

Anyone with any ideas? I will post some code and the inputs and output
to show you what i mena in my next post. This is doing my head in....

Have you asked on a C# newsgroup?

C++ SSE and SSE2 compiler settings, and their Floating Point effects.	0	May 31, 2022
Java OpenJDK Floating Point Dare	3	Jan 17, 2023
Avoiding NaN and Inf on floating point division	14	Jan 4, 2014
Accessing array elements via floating point formats.	33	Dec 10, 2010
floating point arithmetic	34	Jul 17, 2009
float point arithmetic a-a != 0.0	12	Mar 8, 2010
floating point arithmetic	6	Aug 26, 2008
Java MemoryLayout/ValueLayout Questions.	2	Feb 5, 2023

Floating point arithmetic model of C...

almurph

Nate Eldredge

Ben Pfaff

osmium

jameskuyper

almurph

Keith Thompson

almurph

Keith Thompson

Jens Thoms Toerring

CBFalconer

CBFalconer

Martin Ambuhl

Martin Ambuhl

Walter Banks

Ben Bacarisse

Peter Nilsson

Dik T. Winter

Keith Thompson

Guest

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads