Precision issue

  • Thread starter Ladvánszky Károly
  • Start date
L

Ladvánszky Károly

Entering 3.4 in Python yields 3.3999999999999999.
I know it is due to the fact that 3.4 can not be precisely expressed by the
powers of 2. Can the float handling rules of the underlying layers be set
from Python so that 3.4 yield 3.4?

Thanks,

Károly
 
?

=?ISO-8859-1?Q?Gerhard_H=E4ring?=

Ladvánszky Károly said:
Entering 3.4 in Python yields 3.3999999999999999.
I know it is due to the fact that 3.4 can not be precisely expressed by the
powers of 2. Can the float handling rules of the underlying layers be set
from Python so that 3.4 yield 3.4?

A float is a float is a float ;)

What can be done is to change the formatting of floats in print
statements, for example. IIRC there was some magic in Python to that
effect that was removed somewhere in the 2.x line.

If you're concerned about the output, why don't you just explicitely
format your float numbers? Something like:
3.40

-- Gerhard
 
A

Alex Martelli

Ladvánszky Károly said:
Entering 3.4 in Python yields 3.3999999999999999.
I know it is due to the fact that 3.4 can not be precisely expressed by
the powers of 2. Can the float handling rules of the underlying layers be
set from Python so that 3.4 yield 3.4?

It seems, from the question, that you might not have entirely understood
and grasped the explanations you can find at:
http://www.python.org/doc/current/tut/node14.html
and I quote, in particular:
"""
no matter how many base 2 digits you're willing to use, the decimal value
0.1 cannot be represented exactly as a base 2 fraction.
"""
and the same holds for 3.4 for exactly the same reason. As long as
binary is used -- and today's machines don't offer options -- that's it.

Only by using Decimal or Rational fractional numbers would that be possible,
and today's hardware doesn't really support them, so you would need to do
everything in software. If you don't mind the resulting huge slowdown in
computation speed (many apps don't really do many computations, so don't
care) there are quite a few packages on the net, though none, AFAIK, which
is considered "ready for production use". The speediest way to do Rational
arithmetic is, I suspect, with gmpy (the mpq type) -- but "speedy" is in
the eye of the beholder. Let me give you an example...:

according to timeit.py, after x=3.4 (a native float), int(x*10) takes
2.46 microseconds; but after x=mpq(3.4) [having imported mpq fm gmpy],
int(x*10) takes 9.72 microseconds! That's FOUR times slower...

Also, mpq(3.4)'s default representation is as a fraction, 17/5; so,
you would still need some formatting work to display it as 3.4 instead.


Alex
 
D

Duncan Booth

It seems, from the question, that you might not have entirely
understood and grasped the explanations you can find at:
http://www.python.org/doc/current/tut/node14.html
and I quote, in particular:

I know this is an FAQ, but the one thing I've never seen explained
satisfactorily is why repr(3.4) has to be '3.3999999999999999' rather than
'3.4'?

Surely the important thing is that the equality eval(repr(x))==x has to
hold for floating point numbers, and that holds just as true for the short
3.4 as it does for the 17 digit version?

Microsoft .Net has a numeric format "R" which does a similar job. The R
specifier guarantees that a floating point numeric value converted to a
string will be parsed back into the same numeric value. It does this by
first trying a general format with 15 digits of precision then parsing that
back to a number. If the result is not the same as the original it then
falls back to the 17 digit value. There's no reason why Python couldn't do
the same:

def float_repr(x):
s = "%.15g" % x
if float(s)==x: return s
return "%.17g" % x

This would be MUCH friendlier for newcomers to the language.
 
M

Michael Hudson

Duncan Booth said:
I know this is an FAQ, but the one thing I've never seen explained
satisfactorily is why repr(3.4) has to be '3.3999999999999999' rather than
'3.4'?

I believe "computational and code complexity" is the main answer to
that one.

Start here

http://citeseer.nj.nec.com/gay90correctly.html

?
Surely the important thing is that the equality eval(repr(x))==x has to
hold for floating point numbers, and that holds just as true for the short
3.4 as it does for the 17 digit version?

Microsoft .Net has a numeric format "R" which does a similar job. The R
specifier guarantees that a floating point numeric value converted to a
string will be parsed back into the same numeric value. It does this by
first trying a general format with 15 digits of precision then parsing that
back to a number. If the result is not the same as the original it then
falls back to the 17 digit value. There's no reason why Python couldn't do
the same:

def float_repr(x):
s = "%.15g" % x
if float(s)==x: return s
return "%.17g" % x

This would be MUCH friendlier for newcomers to the language.

It would be nice, but I think it's pretty hard to do efficiently. Tim
Peters would be more certain than me :)

"Patches welcome" might apply, too. I don't think your suggested
float repr will fly, I'm afraid...

Cheers,
mwh
 
S

Stephen Horne

It seems, from the question, that you might not have entirely understood
and grasped the explanations you can find at:
http://www.python.org/doc/current/tut/node14.html
and I quote, in particular:
"""
no matter how many base 2 digits you're willing to use, the decimal value
0.1 cannot be represented exactly as a base 2 fraction.
"""

There are simple workarounds for this, though. For instance, if
someone needs one or two decimal digits of precision, they can simply
hold all values scaled by 10 or 100 - while neither 0.01 nor 0.1 can
be precisely represented as a binary value, 1 can be.

Actually, scaling by 100 is overkill - the nearest power of two is
128, and 100/128 is equivalent to 25/32, so a scale factor of 25
should be sufficient to allow two decimal digits of precision.
However, there is probably no advantage to scaling by 25 instead of
100 - just the disadvantage that the purpose of the scaling is less
obvious.

Anyway, this could be what Ladvánszky Károly meant, I suppose, by
'float handling rules of the underlying layers'. Of course this can't
be done using the existing float class as Python doesn't define the
float handling rules - they are presumably defined in most cases by
the floating point logic built into the CPU.

Perhaps Ladvánszky Károly has used Ada, where you can request a fixed
point or floating point type with particular properties and it is up
to the compiler to find or create one to suit. Though IIRC its floats
are still always binary floats - only its fixed point values can
handle decimals as Ladvánszky Károly has requested.

There are also, of course, languages which support different numeric
types such as a decimal type - Java has BigDecimal and C# has Decimal
(the C# one works using a fixed point scaling where the scaling must
be a power of 10, Java BigDecimal is IIRC more powerful - arbitrary
scale and precision, I think).

The issue of alternate numeric representations does get raised from
time to time, as I'm sure Alex knows better than me. There are
packages around. One key problem is that different people want
different things. A person who wants a fixed-point number class, for
instance, is not going to want the additional overhead from a rational
number class. Even a symbolic expression class has been suggested in
the past.

One common need for decimals is for currency values. This need can be
avoided by simply storing currency values in pence/cents rather than
pounds/dollars. Similarly, percentages can be handled using integer
calculations. For example, adding 17.5% (for UK VAT, perhaps) can be
handled using floats as follows...

result = value * 1.175

or using integers as follows...

result = (value * 1175) / 1000

In the example above, the parentheses are unnecessary but included to
emphasise the order of the calculations, which is important.

In my experience, this method handles most cases where results need to
be consistent with decimal arithmetic - store values using appropriate
units and the problem usually goes away.
 
D

Duncan Booth

I believe "computational and code complexity" is the main answer to
that one.

Start here

http://citeseer.nj.nec.com/gay90correctly.html
<snip>

The code I gave isn't exactly complex, even when you rewrite it in C.
It would be nice, but I think it's pretty hard to do efficiently. Tim
Peters would be more certain than me :)

"Patches welcome" might apply, too. I don't think your suggested
float repr will fly, I'm afraid...
I'm happy to do a patch if there is any chance of it being accepted.
Obviously doing the conversion twice makes the code slower, but I'm not
sure how critical it would be given that its a pretty fast operation to
begin with:

C:\Pythonsrc\python\dist\src\PCbuild>python ..\lib\timeit.py "repr(3.4)"
100000 loops, best of 3: 10.5 usec per loop

C:\Pythonsrc\python\dist\src\PCbuild>\python23\python ..\lib\timeit.py
"repr(3.4)"
100000 loops, best of 3: 7.58 usec per loop

So its about a third slower, but you have to do 300,000 reprs before you
lose 1 second of cpu time.
 
B

Ben Finney

I know this is an FAQ, but the one thing I've never seen explained
satisfactorily is why repr(3.4) has to be '3.3999999999999999' rather
than '3.4'?

Because '3.4' is what str(3.4) returns. If repr(3.4) lies about the
value stored, what function will you leave us to discover the actual
value?

The str() function is for getting the working output of the value. The
repr() function is for discovering, as precisely as possible, the actual
value.
 
D

Duncan Booth

Because '3.4' is what str(3.4) returns. If repr(3.4) lies about the
value stored, what function will you leave us to discover the actual
value?

In what way is 3.3999999999999999 any more the value than 3.4?
True

The exact value stored is neither of these, it is somewhere in between the
two (perhaps 3.399999999999999911182158029987476766109466552734375 if I
counted it right). repr gives a representation of the float which is
guaranteed to convert back to the same sequence of bits, 3.4 will do just
as well for this case as the longer value.

Try a different value, say 3.333*30. Repr gives you 99.990000000000009, str
gives you 99.99. I'm not proposing that should change because
99.990000000000009 != 99.99.
The str() function is for getting the working output of the value. The
repr() function is for discovering, as precisely as possible, the actual
value.

It doesn't do that. It currently shows you the value to sufficient
precision to allow you to reconstruct the bits exactly.

Documentation on repr:
repr(...)
repr(object) -> string

Return the canonical string representation of the object.
For most object types, eval(repr(object)) == object.
 
S

Stephen Horne

Because '3.4' is what str(3.4) returns. If repr(3.4) lies about the
value stored, what function will you leave us to discover the actual
value?

The str() function is for getting the working output of the value. The
repr() function is for discovering, as precisely as possible, the actual
value.

Is there a basis for that claim?

My impression has always been that 'repr' gives a representation of
the value which, when parsed (using 'eval', for instance),
reconstructs the original value. In this respect, '3.4' is just as
good as '3.3999999999'.

IIRC, a binary float can always be given a precise decimal
representation - it simply tends to take a lot of digits. The fact
that repr doesn't give a perfect representation of the binary float
value suggests that it is not 'for discovering, as precisely as
possible, the actual value'.

Out of curiosity, I wrote the function at the bottem of this post to
convert a Python float into two string representations - a rational
and a decimal - both having precisely the same value as the float. I
got the following results starting with 3.4...

Rational : 7656119366529843/2251799813685248
Decimal : 3.399999999999999911182158029987476766109466552734375

I don't guarantee that the code is bug free - it may well be very
fragile, depending on platform specific float handling - but I believe
these results are accurate. For the record, I'm running Python 2.3
under Windows 2000 on a Pentium 4.

I am not aware of a Python standard function which will give this
rather impractical level of precision. But if Pythons repr function
was intended 'for discovering, as precisely as possible, the actual
value', it really should give the decimal value from above which it is
clearly possible to discover. The truth is, however, that such
discovery is rarely if ever useful - floats are inherently approximate
values.

Converting float values to decimal is almost always either for the
benefit of human readers, or for creating text representations that
will be converted back to floats at some point. str serves the first
purpose well. For the second, the important identity is that
eval(repr(x)) == x (or at least a sufficiently close approximation -
I'm not sure if repr currently preserves the full precision of the
float).


Here's the code...

def perfect (val) :
# Convert to rational

num = 0
denom = 1

# handle integer part

num = int(val)
val -= num

# handle fractional part

while val != 0 :
val *= 2
num *= 2
denom *= 2

if val >= 1 :
num += 1
val -= 1

rat = str(num)+"/"+str(denom)

# convert to decimal form

dec = str(num/denom) + "."
num = num % denom

while num > 0 :
num *= 10
dec += str(num / denom)
num = num % denom

return (rat, dec)
 
T

Terry Reedy

Duncan Booth said:
In what way is 3.3999999999999999 any more the value than 3.4?

In the same way that 0 is a better approximation of .3 than 1, and
vice versa for .7. repr(<float>) attemps to return closest 17 digit
decimal, or perhaps closest that will yield same binary when evaled.
Sometime adding or substracting 1 to or from last digits will give a
decimal that also evals to same, sometimes not.

Let's turn question around. Suppose you started with
a=3.3999999999999999
Now, would you want repr(a) to be number entered, or less accurate
3.4?

Or suppose 'a' resulted from calculation rather than entered literal.
Why should repr() do anything but report closest approximation
possible? Especially given that one can explicitly choose any level
of rounding one wants. As Ben said, if repr() fudged output, then we
would need another function to replace it. But we already have
round(), formats, and str() to do fudging.

Terry J. Reedy
 
T

Terry Reedy

Stephen Horne said:
My impression has always been that 'repr' gives a representation of
the value which, when parsed (using 'eval', for instance),
reconstructs the original value. In this respect, '3.4' is just as
good as '3.3999999999'.

Not just *a* representation, but the *most accurate*. '3.4' is (as
you show below) less accurate, or that would have been chosen instead.
The internal value is what it is, regardless of whether it results
from this literal or that literal or from calculation. Why the
opposition to having a way to get the closest 17-digit decimal
approximation?
IIRC, a binary float can always be given a precise decimal
representation - it simply tends to take a lot of digits. The fact
that repr doesn't give a perfect representation of the binary float
value suggests that it is not 'for discovering, as precisely as
possible, the actual value'.

It is for decimally representing, as precisely as possible *with 17
digits*, the actual value. I presume that 17 in the minimum necessary
to guarantee a unique, back-convertible prepresentation for every
float.
Out of curiosity, I wrote the function at the bottem of this post to
convert a Python float into two string representations - a rational
and a decimal - both having precisely the same value as the float. I
got the following results starting with 3.4...

Rational : 7656119366529843/2251799813685248
Decimal : 3.399999999999999911182158029987476766109466552734375

If this is correct, then rounding up to 3.4 would be like rounding .11
to 1 instead of 0.

Terry J. Reedy
 
A

Alex Martelli

Duncan said:
I know this is an FAQ, but the one thing I've never seen explained
satisfactorily is why repr(3.4) has to be '3.3999999999999999' rather than
'3.4'?

Surely the important thing is that the equality eval(repr(x))==x has to
hold for floating point numbers, and that holds just as true for the short
3.4 as it does for the 17 digit version?

Microsoft .Net has a numeric format "R" which does a similar job. The R
specifier guarantees that a floating point numeric value converted to a
string will be parsed back into the same numeric value. It does this by
first trying a general format with 15 digits of precision then parsing
that back to a number. If the result is not the same as the original it
then falls back to the 17 digit value. There's no reason why Python
couldn't do the same:

def float_repr(x):
s = "%.15g" % x
if float(s)==x: return s
return "%.17g" % x

This would be MUCH friendlier for newcomers to the language.

I like this idea, actually. Care to try your hand at a patch for
2.4 ...?


Alex
 
C

Cameron Laird

.
.
.
It is for decimally representing, as precisely as possible *with 17
digits*, the actual value. I presume that 17 in the minimum necessary
to guarantee a unique, back-convertible prepresentation for every
float.
.
.
.
Stuff in this area is difficult to express precisely. I'm
not sure what your, "I presume that ..." means. Here's one
way to think about that magic number: there are "floats"
which are distinct, but agree to sixteen (decimal-)digits
of accuracy.

Some (seventeen-digit) decimals canNOT be achieved through
a round trip.
 
S

Stephen Horne

Not just *a* representation, but the *most accurate*. '3.4' is (as
you show below) less accurate, or that would have been chosen instead.
The internal value is what it is, regardless of whether it results
from this literal or that literal or from calculation. Why the
opposition to having a way to get the closest 17-digit decimal
approximation?

I'm not strongly opposed - in fact, I'm not really opposed at all. I
didn't start the discussion, I just countered an argument which I
still believe is simply wrong.

Even so, what is so advantageous about using the closest 17-digit
decimal approximation? That doesn't seem to me to be particulary
suited to the purpose of repr - alternative schemes for choosing the
repr may potentially be better suited.

Certainly it is *not* the most accurate representation possible.
I presume that 17 in the minimum necessary
to guarantee a unique, back-convertible prepresentation for every
float.

In other words, the choice of 17 digits precision is supporting the
goal of a sufficient (ie not overkill) backward-compatible
representation.

The given result is not the optimum in either sufficiency or
precision. If precision is the goal, the result should be
'3.399999999999999911182158029987476766109466552734375'. If
sufficiency is the goal, the result should be '3.4'.

This isn't a criticism of the current system - a balance between
extremes is often appropriate, and in this case the key advantage is
presumably a simpler and faster algorithm. But it may well be valid to
discuss alternate schemes and their rationales.
If this is correct, then rounding up to 3.4 would be like rounding .11
to 1 instead of 0.

Yes, if the logic *must* be about rounding. But that isn't necessarily
the best scheme given the purpose of repr. As I said, there are other
possible rationales that give different best representations - the
ones relevant here being 'most precise possible' (which Benn Finney
wrongly seemed to think repr provides - the whole point of my reply)
or 'sufficient'.

Using the representation '3.4' instead of '3.399999...' has advantages
both for human readers and for use in files/data packets - in the
latter case, for instance, it saves bytes. 'Sufficient' does not mean
providing 17 digits of precision when two will do.

Of course, I wouldn't mind a function which could give me the exact
level of precision I want. At present, the '%' operator gives the
closest thing to this, but even that refuses to give more digits
precision than those 17 (or whatever) that repr gives - extra digits
just get filled in as zeros irrespective of the precise value.

Whether there is a need for this, of course, is a different thing.

If I were to argue against, my argument would be that there is the
risk of introducing bugs - either in the repr function itself
(conversion to decimal can be more fiddly than some people realise,
especially when optimised) or in code which relies on the way the repr
function currently works (which I believe has been fixed since Python
prehistory).

The truth is, however, that I really don't care much either way. Just
because I disagree with an argument made by one clan, that doesn't
automatically mean I've joined the other clan. I was simply pointing
out what I see as an error - not taking sides.
 
T

Terry Reedy

Stuff in this area is difficult to express precisely. I'm
not sure what your, "I presume that ..." means. Here's one
way to think about that magic number: there are "floats"
which are distinct, but agree to sixteen (decimal-)digits
of accuracy.

That is what I meant. 16 digits is not enough for binary float=>
decimal rep to be one-to-one
Some (seventeen-digit) decimals canNOT be achieved through
a round trip.

If you mean s != (sometimes) repr(eval(s)), of course; there are (I
believe) fewer than 10**17 floats (ignoring exponents), so mapping in
that direction cannot be onto. This is the fundamental problem; for
any positive number of bits and decimals, the two sets have different
sizes.

Terry J. Reedy
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,166
Messages
2,570,907
Members
47,448
Latest member
DeanaQ4445

Latest Threads

Top