I intended to reply to this yesterday, but circumstances
(see timeit results) prevented it.
But there are no exceptions.
<quote emphasis added>
Sec 2.2.3:
Objects of different types, *--->except<---* different numeric types
and different string types, never compare equal;
X == Y tests for equality. If it returns
True, then the objects are equal by definition. That's what equal means in
Python.
One can abuse the technology to give nonsensical results:
class EqualToEverything(object):
def __eq__(self, other):
return True
x = EqualToEverything()
x == 1.0 True
x == [2.9, "hello world"]
True
but that's no different from any language that allows you to override
operators.
No, it's not, as I show further down.
But you show no such thing.
Or, to put it another way:
Did! Did not! Did! Did not! Did! Did not! ...
a = 1
b = (1,)
c = [1]
d = gmpy.mpz(1)
[snip]
a==d
True
See, a and d are equal.
No, they are not "equal".
Of course they are. It says so right there: "a equals d" is true.
Ok, but they are an exception to the rule "different types compare
False".
Why ever not? If you need an mpz value in order to do something, and no
other data type will do, what would you suggest? Just give up and say
"Don't do this, because it is Bad, m'kay?"
It's not the mpzs you shouldn't use, its the ints. I also stessed
"in loops". Replacing an integer literal with a variable still
requires a coercion, so it doesn't matter if n + 1 occurs outside
a loop.
Really? Just how much time?
Can't say, had to abort the following.
Returns the count of n/2 and 3n+1 operations [1531812, 854697].
import gmpy
def collatz(a):
ONE = gmpy.mpz(1)
TWO = gmpy.mpz(2)
TWE = gmpy.mpz(3)
a = gmpy.mpz(a)
t = 0
u = 0
done = 0
while done==0:
f = gmpy.scan1(a,0)
if f>0:
a = a >> f
u += f
else:
if a==1:
done = 1
else:
a = a*TWE + ONE
t += 1
return [u,t]
def collatz2(a):
t = 0
u = 0
done = 0
while done==0:
f = gmpy.scan1(a,0)
if f>0:
a = a >> f
u += f
else:
if a==1:
done = 1
else:
a = a*3 + 1
t += 1
return [u,t]
def test():
collatz(2**177149-1)
def test2():
collatz2(2**177149-1)
if __name__=='__main__':
from timeit import Timer
t = Timer("a = test()", "from __main__ import test")
u = Timer("b = test2()", "from __main__ import test2")
print t.timeit(10)
print u.timeit(10)
## 723.430377542
## *ABORTED after 20 hours*
timeit.Timer("x == y", "import gmpy; x = 1; y = gmpy.mpz(1)").repeat()
timeit.Timer("x == y", "x = 1; y = 1").repeat()
I don't have gmpy installed here,
Good Lord! How do you solve a Linear Congruence?
so I can't time it, but I look forward
to seeing the results, if you would be so kind.
I had a lot of trouble with this, but I think I finally got a
handle on it. I had to abort the previous test after 20+ hours
and abort a second test (once I figured out to do your example)
on another machine after 14+ hours. I had forgotten just how
significant the difference is.
import timeit
## t = timeit.Timer("a == b", "a = 1; b = 1")
## u = timeit.Timer("c == d", "import gmpy; c = 1; d =
gmpy.mpz(1)")
## t.repeat()
## [0.22317417437132372, 0.22519314605627253, 0.22474588250741367]
## u.repeat()
## [0.59943819675405763, 0.5962260566636246, 0.60122920650529466]
Unfortunately, this is not a very useful test, since mpz
coercion appears to vary ny the size of the number involved.
Although changing t to
## t = timeit.Timer("a == b", "a = 2**177149-1; b = 2**177149-1")
still produces tractable results
## t.repeat()
## [36.323597552202841, 34.727026758987506, 34.574566320579862]
the same can't be said for mpz coercion:
## u = timeit.Timer("c == d", "import gmpy; c = 2**177149-1; d =
gmpy.mpz(2**177149-1)")
## u.repeat()
## *ABORTED after 14 hours*
So I changed it to (using yet a third machine)
for i in xrange(8):
e = 2*i*100
n = 2**e-1
r = 'a = %d; b = %d' % (n,n)
s = 'import gmpy; a = %d; b = gmpy.mpz(%d)' % (n,n)
print 'For 2**e-1',e
t = timeit.Timer("a == b",r)
u = timeit.Timer("a == b",s)
print t.repeat()
print u.repeat()
print
which clearly shows the growth rate of the mpz coercion.
## int==int vs. int==mpz
##
## For 2**e-1 0
## [0.054264941118974445, 0.054553378257723141,
0.054355515455681791]
## [0.16161957500399435, 0.16188363643198839, 0.16197491752897064]
##
## For 2**e-1 200
## [0.093393746299376912, 0.093660961833065492,
0.092977494572419439]
## [1.0425794607193544, 1.0436544844503342, 1.0451038279715417]
##
## For 2**e-1 400
## [0.10496130299527184, 0.10528292779203152, 0.10497603593951155]
## [2.2687503839249636, 2.2685411490493506, 2.2691453463783233]
##
## For 2**e-1 600
## [0.11724617625774236, 0.11701867087715279, 0.11747874550051129]
## [3.616420796797021, 3.617562537946073, 3.6152373342355801]
##
## For 2**e-1 800
## [0.13156379733273482, 0.1310266632832402, 0.13168082630802047]
## [5.2398534562645089, 5.2389728893525458, 5.2353889230364388]
##
## For 2**e-1 1000
## [0.153719968797283, 0.15383679852633492, 0.15352625633217798]
## [6.967458038928207, 6.9640038947002409, 6.9675019294931388]
##
## For 2**e-1 1200
## [0.16716219584402126, 0.16743472335786436, 0.16782637005291434]
## [11.603391791430532, 11.601063020084396, 11.603106936964878]
##
## For 2**e-1 1400
## [0.179120966908215, 0.17908259508838853, 0.17934175430681876]
## [14.753954507946347, 14.755623642634944, 14.756064585859164]
And, just for laughs, I compared mpzs to mpzs,
s = 'import gmpy; a = gmpy.mpz(%d); b = gmpy.mpz(%d)' % (n,n)
which ended up faster than comparing ints to ints.
## int==int vs. mpz==mpz
##
## For 2**e-1 0
## [0.054301433257206225, 0.054502401293220933,
0.054274144039999611]
## [0.12487657446828507, 0.099130500653189346,
0.094799646619862565]
##
## For 2**e-1 200
## [0.10013419046813476, 0.10156139134030695, 0.10151083166511599]
## [0.091683807483012414, 0.091326269489948375,
0.091261281378934411]
##
## For 2**e-1 400
## [0.10716937998703036, 0.10704403530042028, 0.10705119312788414]
## [0.099165500324245093, 0.097540568227742153,
0.10131808159697742]
##
## For 2**e-1 600
## [0.12060785142996777, 0.11720683828159517, 0.11800506010281886]
## [0.11328210449149934, 0.1146064679843235, 0.11307050873582014]
##
## For 2**e-1 800
## [0.12996358680839437, 0.13021352430898236, 0.12973684081916526]
## [0.12344120825932059, 0.11454960385710677, 0.12339954699673861]
##
## For 2**e-1 1000
## [0.15328649918703752, 0.15362917265815135, 0.15313422618208516]
## [0.12753811336359666, 0.12534907002753748, 0.12588097104350471]
##
## For 2**e-1 1200
## [0.16756264696760326, 0.16747118166182684, 0.167885034915086]
## [0.12162660501311073, 0.13368267591470051, 0.13387503876843265]
##
## For 2**e-1 1400
## [0.17867761017283623, 0.17829534684824377, 0.17826312158720281]
## [0.13718761665773815, 0.13779106963280441, 0.13708166276632738]
Even if it is terribly slow, that's just an implementation detail. What
happens when Python 2.7 comes out (or Python 3.0 or Python 99.78) and
coercion from int to mpz is lightning fast? Would you then say "Well,
int(1) and mpz(1) used to be unequal, but now they are equal?".
Are you saying I should be unconcerned about implementation details?
That it's silly of me to be concerned about implementation side
effects
due to mis-matched types?
Me, I'd say they always were equal, but previously it used to be slow to
coerce one to the other.
So, when you're giving advice to the OP you don't feel any need to
point
this out? That's all I'm trying to do, supply some "yes, but you
should
be aware of..." commentary.
The absolute stupidest
thing you can do (assuming n is an mpz) is:
while n >1:
if n % 2 == 0:
n = n/2
else:
n = 3*n + 1
Oh, I can think of much stupider things to do.
while len([math.sin(random.random()) for i in range(n)[:]][:]) > 1:
if len( "+" * \
int(len([math.cos(time.time()) for i in \
range(1000, n+1000)[:]][:])/2.0)) == 0:
n = len([math.pi**100/i for i in range(n) if i % 2 == 1][:])
else:
s = '+'
for i in range(n - 1):
s += '+'
s += s[:] + ''.join(reversed(s[:]))
s += s[:].replace('+', '-')[0:1]
n = s[:].count('+') + s[:].count('-')
ZED = gmpy.mpz(0)
ONE = gmpy.mpz(1)
TWO = gmpy.mpz(2)
TWE = gmpy.mpz(3)
while n >ONE:
if n % TWO == ZED:
n = n/TWO
else:
n = TWE*n + ONE
This way, no coercion is performed.
I know that algorithm, but I don't remember what it is called...
The Collatz Conjecture. If true, it means the while loop
terminates for any n.
In any case, what you describe is a local optimization. Its probably a
good optimization, but in no way, shape or form does it imply that mpz(1)
is not equal to 1.
It's a different type. It is an exception to the "different types
compare
False" rule. That exception is not without cost, the type mis-match
causes coercion.
Why should the int 1 return True when compared to mpz(1)?
Because they both represent the same mathematical number, where as a list
containing 1 and a tuple containing 1 are different containers. Even if
the contents are the same, lists aren't equal to tuples.
That's because both are the same kind of container, and they both have the
same contents.
After all, it returns false if b is [2],
so it looks at the content in this case. So for numerics,
it's the value that matters, not the type. And this creates
a false sense of "equality" when a==d returns True.
There's nothing false about it. Ask any mathematician, does 1 equal 1.0,
and they will say "of course".
And if you ask any mathematician, he'll say that (1,) is equal to [1].
That's the difference between a mathematician and a programmer.
A programmer will say "of course not, the int has to be coered."
Ints are not containers. An int doesn't contain values, an int is the
value.
Numeric values are automatically coerced because that's more practical.
That's a design decision, and it works well.
And I'm not saying it shouldn't be that way. But when I wrote my
Collatz Functions library, I wasn't aware of the performance issues
when doing millions of loop cycles with numbers having millions
of digits. I only found that out later. Would I have gotten a
proper answer on this newgroup had I asked here? Sure doesn't look
like it.
BTW, in reviewing my Collatz Functions library, I noticed a coercion
I had overlooked, so as a result of this discussion, my library is
now slightly faster. So some good comes out of this argument after
all.
As for gmpy.mpz, since equality tests are completely under the control of
the class author, the gmpy authors obviously wanted mpz values to compare
equal with ints.
And they chose to do a silent coercion rather than raise a type
exception.
It says right in the gmpy documentation that this coercion will be
performed.
What it DOESN'T say is what the implications of this silent coercion
are.
It means they are mathematically equivalent, which is not the same as
being programatically equivalent. Mathematical equivalency is what
most
people want most of the time. Not all of the people all of the time,
however. For example, I can calculate my Hailstone Function
parameters
using either a list or a tuple:
import collatz_functions as cf
print cf.calc_xyz([1,2]) (mpz(8), mpz(9), mpz(5))
print cf.calc_xyz((1,2))
(mpz(8), mpz(9), mpz(5))
But [1,2]==(1,2) yields False, so although they are not equal,
they ARE interchangeable in this application because they are
mathematically equivalent.
Since they are mathematical values, what other sense is meaningful?
I never said that there was no efficiency differences. Comparing X with Y
might take 0.02ms or it could take 2ms depending on how much work needs
to be done. I just don't understand why you think that has a bearing on
whether they are equal or not.
The bearing it has matters when you're writing a function library that
you want to execute efficiently.