converting float to double

E

Ernie Wright

christian.bau said:
Yes, I got this wrong. I missed that f * 1000.0 was again assigned to
a float. Which makes the situation more interesting. So here is my
analysis:

[excellent analysis snipped]
So any dollar values between 2^k and 1.024 * 2^k are suspicious, that
is values 1.01 and 1.02, 2.01 to 2.04, 4.01 to 4.09 and so on.

The first number where this algorithm fails is 4.01, and it fails for
many values slightly above 4, 8, 16, 32 and so on.

Very illuminating, I wish I'd done this.

Note to the unwary: I initially tested this with something like the
following,

float f;
double d, d0;
int i;

for ( i = 1; i <= 10000; i++ ) { /* $0.01 to $100.00 */
d0 = i / 100.0; /* best possible precision */

f = i / 100.0f; /* funky rounding method */
f *= 1000.0; /* questioned by the OP */
d = f / 1000.0;

if ( d != d0 ) { /* are they the same? */
printf( "%8.2f", d );
}

in Microsoft C on an x86 machine, and found that d != d0 was never true.
As it turns out, MS C never actually uses the 32-bit value of f when it
appears on the right side of the assignments. It uses the 80-bit value
still sitting at the top of the FP stack after the previous calculation,
and as both Christian and Dik point out, the rounding of f in

f *= 1000.0

is crucial to the behavior.

I found that inserting expressions involving address-of f,

f = i / 100.0f;
testf( &f ); /* here */
f *= 1000.0;
testf( &f ); /* and here */
d = f / 1000.0;

forced the compiler to reload the contents of f after each line, and
this produced the expected result: the funky *1000/1000 rounding fails
for 4.01, 4.03, 4.05, 4.07, 4.09, 8.02, 8.03, 8.06, and so on.

The moral being that writing test code often isn't enough.

- Ernie http://home.comcast.net/~erniew
 
W

William Hughes

Dik said:
If the period is 1/2 year, the interest per period is not one half of
the year interest.
Try to do it over a five year period. A yearly
interest rate of .05 gives after five years $ 127.63 and an half-yearly
interest rate of .025 gives after five years $ 128.01. A difference
that some accountants worry about. There are specific rules how to
do interest about a period smaller than the base period (although the
rules depend on the situation).

But let me assume that the interest is 0.025 / 6 months.


I would do it as:
long amount = 20000; /* 2 times the amount in cents */
...
for(i = 1; i <= n; i++) {
amount += amount * 1000_i_per_n / 1000;
amount += (amount & 1);
}
amount /= 2; /* the real amount in cents after the calculations */
Your formulation is wrong. When the amount is 10769 cents (3-rd
iteration), the interest calculations give an interest of 269.225 cents,
that is rounded 269 cents. The next result is 11038 cents. Your formula
gives 11039.

Teach me to try to program at 4 am. Both attempts were
had errors. Here is a (hopefully) correct version.

#include <math.h>
int main (void){


long p = 10000;
long delta_p;
int i_1000 = 50;
int i_1000_per_n;

double dp = 100.0;
double i = .05;
double i_per_n;


int n = 200; /*number of periods*/
int j;

int i_dollars, i_cents, d_dollars, d_cents;

i_1000_per_n = i_1000 / 2;
i_per_n = i/2.0;

for(j = 1;j<=n;j++){


delta_p = p*i_1000_per_n/1000;



if( (p*i_1000_per_n%1000) >= 500 ){

delta_p++;
}


p += delta_p;





dp += dp*i_per_n;

if( fmod(dp,0.01) > 0.00495) {
dp += .01 - fmod(dp, 0.01);
} else {
dp -= fmod(dp, 0.01);
}

if( j%10 == 0 ) {

i_dollars = p/100;
i_cents = p % 100;

d_dollars = floor(dp);
d_cents = floor(100*dp+.05) - 100*d_dollars;


printf("%d %d %d %d
%d\n",j,i_dollars,i_cents,d_dollars,d_cents);
}



}
}


I see no clear advantage to using integer or floating
point. Yes, both can be optimized, but as calculation
speed is unlikely to be the bottleneck, optimization is
unlikely to be needed.

(I do not find the doubling trick to be very natural, and
it does not generalize to other forms of rounding)


- William Hughes
 
D

Dik T. Winter

> Dik T. Winter wrote:
....
Note this part of my article:
> Teach me to try to program at 4 am.

Yes, that is approximately the time that I normally write my articles.
I am a bit early now.
> i_1000_per_n = i_1000 / 2;
> i_per_n = i/2.0;

And see what I wrote above about this. But otherwise your results are
now correct. Although I can not see a guarantee from your floating
point formulation that it always will be correct, although off-hand
I can not find wrong examples. Ah, I found one. 31 cent initial
capital, interest per period 1.6%. The correct method gives zero
interest after one period, your double method gives one cent interest.
After 200 periods your double precision method gives 849 cents, while
with the correct method there has been no accumulation at all. Now
that I have this case it is easy to create other cases as well.
And I may note that interest percentages are frequently stated in
decimals. My account gives an interest of 3.6% if I remember right.
> I see no clear advantage to using integer or floating
> point. Yes, both can be optimized, but as calculation
> speed is unlikely to be the bottleneck, optimization is
> unlikely to be needed.

In the financial world calculation speed frequently *is* the bottleneck.
And as for speed, for 1000000 iterations, mine goes in 0.32 seconds,
your integer variant in 0.61 seconds and your floating point variant
in 0.096 seconds. And the advantage of integer over floating point
is that with the first you have a guarantee that the result is correct,
while with the second you cannot give that guarantee at all (and I am
stating this as a numerical mathematician, which I have been some time).
> (I do not find the doubling trick to be very natural, and
> it does not generalize to other forms of rounding)

It is not natural. But that is the way you would code when doing
fixed point arithmetic. It does generalise to some other forms of
rounding (round up, round down), it does not generalise to, e.g.,
bankers rounding. But there are other methods that can handle that.
 
W

William Hughes

Dik said:
...
Note this part of my article:

This is irrelevent, unless the rules specified can only be implemented
in integer arithmetic.
Yes, that is approximately the time that I normally write my articles.
I am a bit early now.


And see what I wrote above about this. But otherwise your results are
now correct. Although I can not see a guarantee from your floating
point formulation that it always will be correct, although off-hand
I can not find wrong examples. Ah, I found one. 31 cent initial
capital, interest per period 1.6%. The correct method gives zero
interest after one period, your double method gives one cent interest.
After 200 periods your double precision method gives 849 cents, while
with the correct method there has been no accumulation at all. Now
that I have this case it is easy to create other cases as well.
And I may note that interest percentages are frequently stated in
decimals. My account gives an interest of 3.6% if I remember right.

The problem here is that 0.00495 was used instead of the
needed 0.004995. I miscounted intitially and did not update this.

The floating point calculations are almost
exact, the only possible problem occurs at the equality condition.
Given principal in hundreths, and interest in thousandths, the
minimum spacing is 10 ^-5. Thus, though we could get a true
value of 0.00499, we cannot get a true value of 0.004996.
Thus if our floating point answer is greater than 0.004995, we know
that the true answer was 0.00500 or greater and we need to round
up. If the true answers are generated by integer arithmetic, then
there must be discrete spacing at some scaling, so we can use
sufficiently accurate floating point arithmetic to give "exact"
answers.

- William Hughes
 
D

Dik T. Winter

>
> This is irrelevent, unless the rules specified can only be implemented
> in integer arithmetic.

It is extremely relevant. Dividing the interest rate for a period by two
to get the interest rate for a half period is almost certainly wrong.
I would not be surprised if there are rules in some branches that state the
a 5% yearly interest rate should be converted to a 2.24695% interest rate
for a half period.
> The floating point calculations are almost
> exact, the only possible problem occurs at the equality condition.
> Given principal in hundreths, and interest in thousandths, the
> minimum spacing is 10 ^-5.

So it will not work for conversion of pre-euro valuta to euro valuta
and the other way around. I think you do not have any idea about the
way the financial market works. In 2001 we have done a strong analysis
when converting gulden to euro and the reverse did work, and whether the
back conversion resulted in the original (upto the cent). The conversion
factors were stated with four digits after the decimal point. For bankers
it was important that the back conversion indeed did give the original
value *upto a cent*. Such an analysis is not as simple as you apparently
think.

Also said:
> The problem here is that 0.00495 was used instead of the
> needed 0.004995. I miscounted intitially and did not update this.

To me that makes it clear that to use floating point is much more
difficult than to use integer arithmetic. A simple miscount can
result in an error. This more so for people who do not understand
the subtleties of floating point.
 
W

William Hughes

Dik said:
It is extremely relevant. Dividing the interest rate for a period by two
to get the interest rate for a half period is almost certainly wrong.
I would not be surprised if there are rules in some branches that state the
a 5% yearly interest rate should be converted to a 2.24695% interest rate
for a half period.

And as this has absolutely nothing to do with the question
of whether calculations should be done in integer or
floating point it is irrelevent.
So it will not work for conversion of pre-euro valuta to euro valuta
and the other way around.

No, if you change your minimum spacing you have
to change your constant. You have a different problem but it is still
discrete.
There must exist a minimum spacing. True some values have
to change, but this is also true of the integer math
(indeed going from three decimal places to four is
easier in the floating point case. Consider the difference between
going from
i_1000, to i_10000, as opposed to changing the value of i and changing
0.004995 to 0.0049995).

I think you do not have any idea about the
way the financial market works. In 2001 we have done a strong analysis
when converting gulden to euro and the reverse did work, and whether the
back conversion resulted in the original (upto the cent). The conversion
factors were stated with four digits after the decimal point. For bankers
it was important that the back conversion indeed did give the original
value *upto a cent*. Such an analysis is not as simple as you apparently
think.

Whatever gives you that idea? I have a very good appreciation
of approximate inverse functions and what you have to do
when f(g(x)) does not equal x as g is only an approximate inverse. If
your
forward function is new = round(conversion*original), the inverse
function will probably not be simple (and may not be unique, what if
there
are two values of original which lead to the same new).
However, it can be calculated using either integer or floating point
math.
To me that makes it clear that to use floating point is much more
difficult than to use integer arithmetic. A simple miscount can
result in an error. This more so for people who do not understand
the subtleties of floating point.

I would argue that "much more" is overstating the case. Agreed, there
are some subtleties that you have to be aware of but overall
the real problems are in undestanding and specifying the
accounting algorithms, which do not follow the rules
of integer or floating point math.


- William Hughes
 
D

Dik T. Winter

> Dik T. Winter wrote: ....
>
> No, if you change your minimum spacing you have
> to change your constant. You have a different problem but it is still
> discrete.
> There must exist a minimum spacing. True some values have
> to change, but this is also true of the integer math
> (indeed going from three decimal places to four is
> easier in the floating point case. Consider the difference between
> going from
> i_1000, to i_10000, as opposed to changing the value of i and changing
> 0.004995 to 0.0049995).

In integer math it is a difference in the scale, in f-p math it is a
different constant.
>
> Whatever gives you that idea? I have a very good appreciation
> of approximate inverse functions and what you have to do
> when f(g(x)) does not equal x as g is only an approximate inverse.

You do not appreciate it. The formal rules, as stated, did not guarantee
a round conversion exact to the cent, and precise conditions under which
that would happen had to be provided. However the error would be for such
large amounts that it was thought to be insignificant. (It had to do with
the transfer of money that was in euro to accounts that were still in
gulden.) Doing it in floating point would be much more problematical.
> If your
> forward function is new = round(conversion*original), the inverse
> function will probably not be simple (and may not be unique, what if
> there are two values of original which lead to the same new).

The inverse function was also precisely stated.
>
> I would argue that "much more" is overstating the case. Agreed, there
> are some subtleties that you have to be aware of but overall
> the real problems are in undestanding and specifying the
> accounting algorithms, which do not follow the rules
> of integer or floating point math.

They follow the rules of (and are stated in) fixed point math which is
easily emulated with scaled integers, but much less easy with floating
point. For instance, the rules for gulden to euro vv. were a conversion
rate of 1 euro = 2.20371 gulden and the result should be rounded to the
nearest cent (and note: 2.20371 is exact).
 
W

William Hughes

Dik said:
In integer math it is a difference in the scale, in f-p math it is a
different constant.

Since the scale is a constant, I fail to see a difference.
You do not appreciate it.
'

You are psychic? The example I had in mind was a
transformation from slant range to ground range imaging
geometry. The forward transformation is approximated by a cubic
polynomial, and then it is necessary to round to the nearest
pixel. The backward tranformation is definied similarly.
In general, the backward transformation is not an exact inverse,
but there are times when an exact inverse is needed.
The problem is very similar to the currency conversion problem,
except it is a bit more difficult. It is possible to solve this
problem using floating point math.


The formal rules, as stated, did not guarantee
a round conversion exact to the cent, and precise conditions under which
that would happen had to be provided. However the error would be for such
large amounts that it was thought to be insignificant. (It had to do with
the transfer of money that was in euro to accounts that were still in
gulden.) Doing it in floating point would be much more problematical.

Why. Any conversion that can be done in fixed point
math can be done in floating point.
The inverse function was also precisely stated.

In which case it probably was not a true inverse (i.e. there
would be cases in which f(g(x)) =/= x). If it was a true inverse,
stating it was redundant.
They follow the rules of (and are stated in) fixed point math which is
easily emulated with scaled integers, but much less easy with floating
point.

Why?. Equality consideration are a bit more direct with integer
math, but changing scaling is easier with floating point.

For instance, the rules for gulden to euro vv. were a conversion
rate of 1 euro = 2.20371 gulden and the result should be rounded to the
nearest cent (and note: 2.20371 is exact).

And a double contant of 2.20371 may not be "exact" but it is certainly
sufficiently precise that the integer calculations and the floating
point
calculations give the same answer. (Your minimum spacing is 10^-7)

- William Hughes
 
D

Dik T. Winter

> Dik T. Winter wrote: ....
>
> Why. Any conversion that can be done in fixed point
> math can be done in floating point.

Can it? Easily?
>
> In which case it probably was not a true inverse (i.e. there
> would be cases in which f(g(x)) =/= x). If it was a true inverse,
> stating it was redundant.

Indeed, it was not a true inverse.
>
> Why?. Equality consideration are a bit more direct with integer
> math, but changing scaling is easier with floating point.

You think so. Decimal scaling is *not* easy with binary floating point.
>
> And a double contant of 2.20371 may not be "exact" but it is certainly
> sufficiently precise that the integer calculations and the floating
> point calculations give the same answer. (Your minimum spacing is 10^-7)

In fixed point the calculations are straightforward, in floating point
there are quite a few surprising subtleties. The formulation you gave
gives correct results. But a slightly different formulation gives wrong
results. You wrote:
if(fmod(dp, 0.01) > 0.00495) {
dp += .01 - fmod(dp, 0.01);
} else {
dp -= fmod(dp, 0.01);
}
changing that only slightly will give a wrong result. Like:
if(fmod(dp, 0.01) > 0.00495) {
dp += .01;
}
dp -= fmod(dp, 0.01);
and I suspect that even the formulation:
f = fmod(dp, 0.01);
if(f > 0.00495) {
dp += .01;
}
dp -= f;
can give wrong results with different inputs. It is just those subtleties
about floatin point arithmetic that make doing it in floating point needs
quite a bit of numerical analysis to get it correct. That is not something
that the programmer of financial programs is equipped with.
 
W

William Hughes

Dik said:
Can it? Easily?


Indeed, it was not a true inverse.

Hmm. It is interesting to note that ceil( (x-0.00499995)/c_f)) is
an inverse to round_to_cent(c_f*x). (If there is more than one
possible
inverse the smallest value is chosen). Was the inverse at least
"one sided" (e.g. g(f(x)) == x, even though f(g(x)) is not always
equal to x)?
You think so. Decimal scaling is *not* easy with binary floating point.

Of course it is. Simply divide/multiply by 10.0. Equality constraints
are no harder (or easier). Decimal scaling is only "easy" in binary
fixed point if you are willing to accept slightly strange equality
conditions.
In fixed point the calculations are straightforward, in floating point
there are quite a few surprising subtleties. The formulation you gave
gives correct results. But a slightly different formulation gives wrong
results. You wrote:
if(fmod(dp, 0.01) > 0.00495) {
dp += .01 - fmod(dp, 0.01);
} else {
dp -= fmod(dp, 0.01);
}
changing that only slightly will give a wrong result. Like:
if(fmod(dp, 0.01) > 0.00495) {
dp += .01;
}
dp -= fmod(dp, 0.01);
and I suspect that even the formulation:
f = fmod(dp, 0.01);
if(f > 0.00495) {
dp += .01;
}
dp -= f;
can give wrong results with different inputs.

No, the results may be different (at least if, as usual
and contrary to the C standard (note the desparate try
to get back on topic), extended precision results are not
rounded to double precision before further computation) .
They will not be wrong
(assuming you meant if(f > 0.004995) ). If you make allowance for
the fact that floating point operations are not "exact", you need
not be concerned with the precise nature of the
"error".
It is just those subtleties
about floatin point arithmetic that make doing it in floating point needs
quite a bit of numerical analysis to get it correct.

"Quite a bit" is an exaggeration. Basically, you need to
know the minimum spacing of your problem, and to be
aware that a floating point values may take on a value slightly
abover or below the "true" value.
That is not something
that the programmer of financial programs is equipped with.

We can make provision for naive programmers by using either
a decimal fixed point system, or a very high precision
floating point system (and telling the programmers to use
0.0049999999995 when rounding to the nearest cent,
or better, providing a round_financial() function).
Both solutions incur efficiency cost. Both of the more
efficient binary fixed point, and floating point (using
a native floating point type) require some sophistication
to do correctly (for one thing, both cases require an analysis
to see if a native type is adequate).

In any case, "all my programmers know BASIC and
none of them know C, therefore this problem can be solved in
BASIC but cannot be solved in C" is not much of an argument.

- William Hughes
 
D

Dik T. Winter

> Dik T. Winter wrote: ....
>
> Hmm. It is interesting to note that ceil( (x-0.00499995)/c_f)) is
> an inverse to round_to_cent(c_f*x). (If there is more than one
> possible
> inverse the smallest value is chosen). Was the inverse at least
> "one sided" (e.g. g(f(x)) == x, even though f(g(x)) is not always
> equal to x)?

Yes, it is. Converting euro to gulden and back to euro gives the
original. That is logical because the smallest unit in gulden has
smaller value than the smallest unit in euro.
>
> No, the results may be different (at least if, as usual
> and contrary to the C standard (note the desparate try
> to get back on topic), extended precision results are not
> rounded to double precision before further computation) .

If they are different, they are *wrong* (also using the proper rounding
constant). Note that converting 6611.13 euro to gulden with the first
formulation gives the proper 3000.00 gulden, with the second formulation
it gives 3000.01 gulden; which is wrong. And the topic was somebody who
was trying to do financial calculations using floating point.
> They will not be wrong
> (assuming you meant if(f > 0.004995) ).

0.004995 is not good enough. You need 0.004999995 to get it working for
the calculations I did (converting euro to gulden). It will not work
correctly with fewer 9's.
> If you make allowance for
> the fact that floating point operations are not "exact", you need
> not be concerned with the precise nature of the
> "error".

Financial institutions do not make allowance for the inexactness of
floating point. They give precise rules how to do conversions. If
your floating point implementation gives results that do not match
the results exactly with those obtained using the precise rules, the
results are wrong.
>
> We can make provision for naive programmers by using either
> a decimal fixed point system, or a very high precision
> floating point system (and telling the programmers to use
> 0.0049999999995 when rounding to the nearest cent,
> or better, providing a round_financial() function).
> Both solutions incur efficiency cost. Both of the more
> efficient binary fixed point, and floating point (using
> a native floating point type) require some sophistication
> to do correctly (for one thing, both cases require an analysis
> to see if a native type is adequate).

And I did some timing and the scaled integer method was much faster
than the floating point method. And consider adding a sequence of
amounts using floating point. How many times should you do the
rounding when you are using floating point?
 
W

William Hughes

Dik said:
Yes, it is. Converting euro to gulden and back to euro gives the
original. That is logical because the smallest unit in gulden has
smaller value than the smallest unit in euro.


If they are different, they are *wrong*

No. Here we are talking about floating point values.
More than one floating point value will give the
same result after decoding to dollars and cents
(or whatever). It does not follow that if two
floating point values are different one of them
represents the wrong answer. However, PHB's
cannot be convinced of this. The solution is never
to show PHB's raw floating point results, but
only results after "rounding". In this case you
have to make sure that the rounded answers are
identical, even if the floating point numbers are not.
This differs from the financial problem only
in the fact that it is usually much more difficult.
(also using the proper rounding
constant). Note that converting 6611.13 euro to gulden with the first
formulation gives the proper 3000.00 gulden, with the second formulation
it gives 3000.01 gulden; which is wrong. And the topic was somebody who
was trying to do financial calculations using floating point.


0.004995 is not good enough. You need 0.004999995 to get it working for
the calculations I did (converting euro to gulden). It will not work
correctly with fewer 9's.

The number 0.004995 was for interest calculation to the nearest cent,
when interest rates were specified to .1 percent. It may not be
sufficient
if you increase the precision of the calculation.

In general, "exact" answers will be integral multiples of
smallest_step. Thus if a floating point approximation to the
correct answer is greater than (0.5 - smallest_step/2) then the correct
answer is known to be greater than or equal to 0.5.

For the interest calculation example, the smallest step was
..00001 dollars, hence 0.0049995.

For the euro/gulden conversion you have stated that the conversion
factors were given to four decimal places. Your example gives a value
to five decimal places. Assuming the smallest unit is .01 of a
currency unit
and the conversion factors are indeed given to five decimal places
the smallest step is .0000001 currency units and we would need
0.00499995.

Determination of the smallest step is exactly analagous to the
determination
of the scaling factors needed for the fixed point arithmetic. There is
no advantage to either method.

Financial institutions do not make allowance for the inexactness of
floating point. They give precise rules how to do conversions. If
your floating point implementation gives results that do not match
the results exactly with those obtained using the precise rules, the
results are wrong.

Yes, but the results the financial institutions see are the results
after the floating point values are converted back into currency
units. These results conform to the precise rules on how to
do conversions.

- William Hughes
 
R

Richard Bos

Dik T. Winter said:
It is extremely relevant. Dividing the interest rate for a period by two
to get the interest rate for a half period is almost certainly wrong.
I would not be surprised if there are rules in some branches that state the
a 5% yearly interest rate should be converted to a 2.24695% interest rate
for a half period.

You, being a scientific person, would say so, but... I wouldn't be
surprised if the interest rate for a half period is exactly half the
full period rate. Remember, it's bankers we're talking about. If they
can find another way to screw you over, they will, oh yes, they will.

Richard
 
D

Dik T. Winter

> Dik T. Winter wrote: ....
>
> No. Here we are talking about floating point values.

No. If I use the second formulation the results are *wrong*. Rounding
the resulting floating point values will give *different* results.

Look at this, and try it.
 
D

Dik T. Winter

>
> You, being a scientific person, would say so, but... I wouldn't be
> surprised if the interest rate for a half period is exactly half the
> full period rate. Remember, it's bankers we're talking about. If they
> can find another way to screw you over, they will, oh yes, they will.

If it is the interest you have to pay to the bank they might do that.
If it is the interest you receive from the bank, they will almost
certainly not do that.
 
W

William Hughes

Dik said:
No. If I use the second formulation the results are *wrong*. Rounding
the resulting floating point values will give *different* results.

My appologies. I though you were referring to the possible errors
due to register change. However, you have also made the assumption
that fmod(dp, 0.01) equals fmod(dp+.01, 0.01). This as you correctly
note
is wrong. Yes, if you use fmod to round a floating point number you
have to be aware of a couple of quirks, but I do not see that this
has any real significance. For practical work you would use
a round_to_n_places(dp,n) function or macro which would hide
the ugly details and use the (machine dependent) most
efficient form.. The general point still applies. Except for equality
comparisons,
the small differences between floating point and true mathematical
answers are not important. And equality comparisons can be done
"exactly", if you keep the differences to less than half
of you smallest step.

- William Hughes
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,997
Messages
2,570,239
Members
46,827
Latest member
DMUK_Beginner

Latest Threads

Top