Comparing float and decimal

D

D'Arcy J.M. Cain

I'm not sure I follow this logic. Can someone explain why float and
integer can be compared with each other and decimal can be compared to
integer but decimal can't be compared to float?
False

This seems to break the rule that if A is equal to B and B is equal to
C then A is equal to C.
 
R

Robert Lehmann

I'm not sure I follow this logic. Can someone explain why float and
integer can be compared with each other and decimal can be compared to
integer but decimal can't be compared to float?

In comparisons, `Decimal` tries to convert the other type to a `Decimal`.
If this fails -- and it does for floats -- the equality comparison
renders to False. For ordering comparisons, eg. ``D("10") < 10.0``, it
fails more verbosely::

TypeError: unorderable types: Decimal() < float()

The `decimal tutorial`_ states:

"To create a Decimal from a float, first convert it to a string. This
serves as an explicit reminder of the details of the conversion
(including representation error)."

See the `decimal FAQ`_ for a way to convert floats to Decimals.

False

This seems to break the rule that if A is equal to B and B is equal to C
then A is equal to C.

I don't see why transitivity should apply to Python objects in general.

HTH,

... _decimal tutorial: http://docs.python.org/lib/decimal-tutorial.html
... _decimal FAQ: http://docs.python.org/lib/decimal-faq.html
 
M

Michael Palmer

I don't see why transitivity should apply to Python objects in general.

Well, for numbers it surely would be a nice touch, wouldn't it.
May be the reason for Decimal to accept float arguments is that
irrational numbers or very long rational numbers cannot be converted
to a Decimal without rounding error, and Decimal doesn't want any part
of it. Seems pointless to me, though.
 
M

Marc 'BlackJack' Rintsch

Well, for numbers it surely would be a nice touch, wouldn't it. May be
the reason for Decimal to accept float arguments is that irrational
numbers or very long rational numbers cannot be converted to a Decimal
without rounding error, and Decimal doesn't want any part of it. Seems
pointless to me, though.

Is 0.1 a very long number? Would you expect ``0.1 == Decimal('0.1')`` to
be `True` or `False` given that 0.1 actually is

In [98]: '%.50f' % 0.1
Out[98]: '0.10000000000000000555111512312578270211815834045410'

?

Ciao,
Marc 'BlackJack' Rintsch
 
T

Tim Roberts

Marc 'BlackJack' Rintsch said:
Well, for numbers it surely would be a nice touch, wouldn't it. May be
the reason for Decimal to accept float arguments is that irrational
numbers or very long rational numbers cannot be converted to a Decimal
without rounding error, and Decimal doesn't want any part of it. Seems
pointless to me, though.

Is 0.1 a very long number? Would you expect ``0.1 == Decimal('0.1')`` to
be `True` or `False` given that 0.1 actually is

In [98]: '%.50f' % 0.1
Out[98]: '0.10000000000000000555111512312578270211815834045410'
?

Actually, it's not. Your C run-time library is generating random digits
after it runs out of useful information (which is the first 16 or 17
digits). 0.1 in an IEEE 784 double is this:

0.100000000000000088817841970012523233890533447265625
 
M

Mark Dickinson

Marc 'BlackJack' Rintsch said:
0.1 actually is
In [98]: '%.50f' % 0.1
Out[98]: '0.10000000000000000555111512312578270211815834045410'
?

Actually, it's not.  Your C run-time library is generating random digits
after it runs out of useful information (which is the first 16 or 17
digits).  0.1 in an IEEE 784 double is this:

     0.100000000000000088817841970012523233890533447265625

I get (using Python 2.6):
Decimal('0.1000000000000000055511151231257827021181583404541015625')

which is a lot closer to Marc's answer. Looks like your float
approximation to 0.1 is 6 ulps out. :)

Mark
 
M

Mark Dickinson

I don't see why transitivity should apply to Python objects in general.

Hmmm. Lack of transitivity does produce some, um, interesting
results when playing with sets and dicts. Here are sets s and
t such that the unions s | t and t | s have different sizes:
from decimal import Decimal
s = set([Decimal(2), 2.0])
t = set([2])
len(s | t) 2
len(t | s)
1

This opens up some wonderful possibilities for hard-to-find bugs...

Mark
 
T

Tim Roberts

Mark Dickinson said:
Marc 'BlackJack' Rintsch said:
0.1 actually is
In [98]: '%.50f' % 0.1
Out[98]: '0.10000000000000000555111512312578270211815834045410'
?

Actually, it's not.  Your C run-time library is generating random digits
after it runs out of useful information (which is the first 16 or 17
digits).  0.1 in an IEEE 784 double is this:

     0.100000000000000088817841970012523233890533447265625

I get (using Python 2.6):
Decimal('0.1000000000000000055511151231257827021181583404541015625')

which is a lot closer to Marc's answer. Looks like your float
approximation to 0.1 is 6 ulps out. :)

Hmmph, that makes the vote 3 to 1 against me. I need to go re-examine my
"extreme float converter".
 
T

Tim Roberts

Mark Dickinson said:
Marc 'BlackJack' Rintsch said:
0.1 actually is
In [98]: '%.50f' % 0.1
Out[98]: '0.10000000000000000555111512312578270211815834045410'
?

....  0.1 in an IEEE 784 double is this:

     0.100000000000000088817841970012523233890533447265625

I get (using Python 2.6):
Decimal('0.1000000000000000055511151231257827021181583404541015625')

which is a lot closer to Marc's answer. Looks like your float
approximation to 0.1 is 6 ulps out. :)

Yes, foolishness on my part. The hex is 3FB99999_9999999A,
so we're looking at 19999_9999999A / 2^56 or
7205759403792794
-------------------
72057594037927936

which is the number that Marc, Nick, and you all describe. Apologies all
around. I actually dropped one 9 the first time around.

Adding one more weird data point, here's what I get trying Marc's original
sample on my Windows box:

C:\tmp>python
Python 2.4.4 (#71, Oct 18 2006, 08:34:43) [MSC v.1310 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.I assume this is the Microsoft C run-time library at work.
 
G

Gabriel Genellina

I don't see why transitivity should apply to Python objects in general.

Hmmm. Lack of transitivity does produce some, um, interesting
results when playing with sets and dicts. Here are sets s and
t such that the unions s | t and t | s have different sizes:
from decimal import Decimal
s = set([Decimal(2), 2.0])
t = set([2])
len(s | t) 2
len(t | s)
1

Ouch!

This opens up some wonderful possibilities for hard-to-find bugs...

And I was thinking all this thread was just a theoretical question without
practical consequences...
 
T

Terry Reedy

Gabriel said:
I don't see why transitivity should apply to Python objects in general.

Hmmm. Lack of transitivity does produce some, um, interesting
results when playing with sets and dicts. Here are sets s and
t such that the unions s | t and t | s have different sizes:
from decimal import Decimal
s = set([Decimal(2), 2.0])
t = set([2])
len(s | t) 2
len(t | s) 1

Ouch!

This opens up some wonderful possibilities for hard-to-find bugs...

And I was thinking all this thread was just a theoretical question
without practical consequences...

To explain this anomaly more clearly, here is a recursive definition of
set union.

if b: a|b = a.add(x)|(b-x) where x is arbitrary member of b
else: a|b = a

Since Python only defines set-set and not set-ob, we would have to
subtract {x} to directly implement the above. But b.pop() subtracts an
arbitrary members and returns it so we can add it. So here is a Python
implementation of the definition.

def union(a,b):
a = set(a) # copy to preserve original
b = set(b) # ditto
while b:
a.add(b.pop())
return a

from decimal import Decimal
d1 = Decimal(1)
fd = set((1.0, d1))
i = set((1,))
print(union(fd,i))
print(union(i,fd))

# prints

{1.0, Decimal('1')}
{1}

This is a bug in relation to the manual:
"union(other, ...)
set | other | ...
Return a new set with elements from both sets."


Transitivity is basic to logical deduction:
equations: a == b == c ... == z implies a == z
implications: (a implies b) and (b implies c)implies (a implies c)
The latter covers syllogism and other deduction rules.

The induction part of an induction proof of set union commutivity is a
typical equality chain:

if b:
a | b
= a.add(x)| b-x for x in b # definition for non-empty b
= b-x | a.add(x) # induction hypothesis
= (b-x).add(x) | a.add(x)-x # definition for non-empty a
= b | a.add(x)-x # definitions of - and .add
if x not in a:
= b | a # .add and -
if x in a:
= b | a-x # .add and -
= b.add(x) | a-x # definition of .add for x in b
= b | a # definition for non-empty a
= b | a # in either case, by case analysis

By transitivity of =, a | b = b | a !

So where does this go wrong for our example? This shows the problems.set()

This pretty much says that 2-1=0, or that 2=1. Not good.

The fundamental idea of a set is that it only contains something once.
This definition assumes that equality is defined sanely, with the usual
properties. So, while fd having two members implies d1 != 1.0, the fact
that f1 == 1 and 1.0 == 1 implies that they are really the same thing,
so that d1 == 1.0, a contradiction.

To put this another way: The rule of substitution is that if E, F, and G
are expressions and E == F and E is a subexpression of G and we
substitute F for E in G to get H, then G == H. Again, this rule, which
is a premise of all formal expressional systems I know of, assumes the
normal definition of =. When we apply this,

fd == {f1, 1.0} == {1,1.0} == {1} == i

But Python saysFalse

Conclusion: fd is not a mathematical set.

Yet another anomaly:
False

So much for "adding the same thing to equals yields equals", which is a
special case of "doing the same thing to equals, where the thing done
only depends on the properties that make the things equal, yields equals."


And another
False

Manual: "set <= other
Test whether every element in the set is in other"

I bet Python first tests the sizes because the implementer *assumed*
that every member of a larger set could not be in a smaller set. I
presume the same assumption is used for equality testing.

Or

Manual: "symmetric_difference(other)
set ^ other
Return a new set with elements in either the set or other but not both."
{Decimal('1')}

If no one beats me to it, I will probably file a bug report or two, but
I am still thinking about what to say and to suggest.

Terry Jan Reedy
 
M

Mark Dickinson

If no one beats me to it, I will probably file a bug report or two, but
I am still thinking about what to say and to suggest.

I can't see many good options here. Some possibilities:

(0) Do nothing besides documenting the problem
somewhere (perhaps in a manual section entitled
'Infrequently Asked Questions', or
'Uncommon Python Pitfalls'). I guess the rule is
simply that Decimals don't mix well with other
numeric types besides integers: if you put both
floats and Decimals into a set, or compare a
Decimal with a Fraction, you're asking for
trouble. I suppose the obvious place for such
a note would be in the decimal documentation,
since non-users of decimal are unlikely to encounter
these problems.

(1) 'Fix' the Decimal type to do numerical comparisons
with other numeric types correctly, and fix up the
Decimal hash appropriately.

(2) I wonder whether there's a way to make Decimals
and floats incomparable, so that an (in)equality check
between them always raises an exception, and any
attempt to have both Decimals and floats in the same
set (or as keys in the same dict) also gives an error.
(Decimals and integers should still be allowed to
mix happily, of course.) But I can't see how this could
be done without adversely affecting set performance.

Option (1) is certainly technically feasible, but I
don't like it much: it means adding a whole load
of code to the Decimal module that benefits few users
but slows down hash computations for everyone.
And then any new numeric type that wants to fit in
with Python's rules had better worry about hashing
equal to ints, floats, Fractions, complexes, *and*
Decimals...

Option (2) appeals to me, but I can't see how to
implement it.

So I guess that just leaves updating the docs.
Other thoughts?

Mark
 
T

Terry Reedy

Mark said:
I can't see many good options here. Some possibilities:

Thanks for responding. Agreeing on a fix would make it more likely to
happen sooner ;-)
(0) Do nothing besides documenting the problem
somewhere (perhaps in a manual section entitled
'Infrequently Asked Questions', or
'Uncommon Python Pitfalls'). I guess the rule is
simply that Decimals don't mix well with other
numeric types besides integers: if you put both
floats and Decimals into a set, or compare a
Decimal with a Fraction, you're asking for
trouble. I suppose the obvious place for such
a note would be in the decimal documentation,
since non-users of decimal are unlikely to encounter
these problems.

Documenting the problem properly would mean changing the set
documentation to change at least the definitions of union (|), issubset
(<=), issuperset (>=), and symmetric_difference (^) from their current
math set based definitions to implementation based definitions that
describe what they actually do instead of what they intend to do. I do
not like this option.
(1) 'Fix' the Decimal type to do numerical comparisons
with other numeric types correctly, and fix up the
Decimal hash appropriately.

(1A) All that is needed for fix equality transitivity corruption and the
consequent set/dictview problems is to correctly compare integral
values. For this, Decimal hash seems fine already. For the int i I
tried, hash(i) == hash(float(i)) == hash(Decimal(i)) ==
hash(Fraction(i)) == i.

It is fine for transitivity that all fractional decimals are unequal to
all fractional floats (and all fractions) since there is no integer (or
fraction) that either is equal to, let alone both.

This is what I would choose unless there is some 'hidden' problem. But
it seem to me this should work: when a float and decimal are both
integral (easy to determine) convert either to an int and use the
current int-whichever comparison.
(2) I wonder whether there's a way to make Decimals
and floats incomparable, so that an (in)equality check
between them always raises an exception, and any
attempt to have both Decimals and floats in the same
set (or as keys in the same dict) also gives an error.
(Decimals and integers should still be allowed to
mix happily, of course.) But I can't see how this could
be done without adversely affecting set performance.

I pretty strongly believe that equality checks should always work (at
least in Python as delivered) just as boolean checks should (and do).
Option (1) is certainly technically feasible, but I
don't like it much: it means adding a whole load
of code to the Decimal module that benefits few users
but slows down hash computations for everyone.
And then any new numeric type that wants to fit in
with Python's rules had better worry about hashing
equal to ints, floats, Fractions, complexes, *and*
Decimals...

I believe (1A) would be much easier both to implement and for new
numeric types.
Option (2) appeals to me, but I can't see how to
implement it.

So I guess that just leaves updating the docs.
Other thoughts?

(3) Further isolate decimals by making decimals also unequal to all
ints. Like (1A), this would easily fix transitivity breakage, but I
would consider the result less desirable.

My ranking: 1A > 3 > 0 > 2. I might put 1 between 1A and 3, but I am
not sure.

Terry Jan Reedy
 
M

Mark Dickinson

Documenting the problem properly would mean changing the set
documentation to change at least the definitions of union (|), issubset
(<=), issuperset (>=), and symmetric_difference (^) from their current
math set based definitions to implementation based definitions that
describe what they actually do instead of what they intend to do.  I do
not like this option.

I was thinking more of a single-line warning in the set documentation
to the effect that funny things happen in the absence of transitivity
of equality, perhaps pointing the finger at Decimal as the most
obvious troublemaker; the Decimal documentation could elaborate on
this.
That is, rather than documenting exactly what the set operations do,
document what they're supposed to do (just as now) and declare that
behaviour is undefined for sets of elements for which transitivity
fails.
(1A) All that is needed for fix equality transitivity corruption and the
consequent set/dictview problems is to correctly compare integral
values.  For this, Decimal hash seems fine already.  For the int i I
tried, hash(i) == hash(float(i)) == hash(Decimal(i)) ==
hash(Fraction(i)) == i.

Good point. Though I'd be a bit uncomfortable with having
Decimal(1) == 1.0 return True, but Decimal('0.5') == 0.5 return False.
Not sure what the source of my discomfort is; partly I think it's
that I want to be able to explain the comparison rules at the
level of types; having some floats behave one way and some behave
another feels odd. And explaining to confused users exactly
why Decimal behaves this way could be fun.

I think I'd prefer option 1 to option 1a.
(3) Further isolate decimals by making decimals also unequal to all
ints.  Like (1A), this would easily fix transitivity breakage, but I
would consider the result less desirable.

I'd oppose this. I think having decimals play nicely with integers
is important, both practically and theoretically. There's probably
also already existing code that depends on comparisons between
integers and Decimals working as expected.

So I guess my ranking is 0 > 1 > 1a > 3, though I could live
with any of 0, 1, or 1a.

It's really the decimal module that's breaking the rules here;
I feel it's the decimal module's responsibility to either
fix or document the resulting problems.

It would also be nice if it were made more obvious somewhere
in the docs that transitivity of equality is important
for correct set and dict behaviour.

Mark
 
G

greg

Mark said:
Option (2) appeals to me, but I can't see how to
implement it.

It could be implemented for the special case of floats
and Decimals by keeping flags in each set indicating
whether any elements of those types have been added.

But doing this just for those two types would be
rather hackish, and wouldn't do anything for any
other incomparable types that might come along.
 
G

greg

Terry said:
Documenting the problem properly would mean changing the set
documentation ... from their current
math set based definitions to implementation based definitions

It could be documented that the mathematical definitions
hold only if the equality relations between all the elements
involved are transitive, and leave the semantics in other
cases undefined.

Then in the Decimal module it could be warned that the
equality relations between int-float and int-Decimal are
not transitive, perhaps noting that this can cause
problems with sets and dicts.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,736
Latest member
zacharyharris

Latest Threads

Top