Interesting list Validity (True/False)

M

mensanator

I should point out that only applies to built-in types, not custom classes.


That still doesn't make sense. However, using my incredible psychic
ability to read between the lines, I think what Mensanator is trying (but
failing) to say is that "if arg==True" first tests whether arg is of type
bool, and if it is not, it knows they can't be equal. That's not actually
correct. We can check this:


...     return arg == True
...>>> dis.dis(test)

  2           0 LOAD_FAST                0 (arg)
              3 LOAD_GLOBAL              0 (True)
              6 COMPARE_OP               2 (==)
              9 RETURN_VALUE

As you can see, there is no explicit type test. (There may or may not be
an _implicit_ type test, buried deep in the Python implementation of the
COMPARE_OP operation, but that is neither here nor there.)

Also, since bool is a subclass of int, we can do this:



Not at all, it makes perfect sense. X == Y always tests whether the
argument X is equal to the object Y regardless of what X and Y are.

Except for the exceptions, that's why the statement is wrong.
That's actually wrong, as you show further down.

No, it's not, as I show further down.
a = 1
b = (1,)
c = [1]
d = gmpy.mpz(1)
type(a)
a==b False
b==c
False
a==d
True

See, a and d are equal.

No, they are not "equal". Ints and mpzs should NEVER
be used together in loops, even though it's legal. The ints
ALWAYS have to be coerced to mpzs to perform arithmetic
and this takes time...LOTS of it. The absolute stupidest
thing you can do (assuming n is an mpz) is:

while n >1:
if n % 2 == 0:
n = n/2
else:
n = 3*n + 1

You should ALWAYS do:

ZED = gmpy.mpz(0)
ONE = gmpy.mpz(1)
TWO = gmpy.mpz(2)
TWE = gmpy.mpz(3)

while n >ONE:
if n % TWO == ZED:
n = n/TWO
else:
n = TWE*n + ONE

This way, no coercion is performed.
Why should they return true just because the contents are the same?

Why should the int 1 return True when compared to mpz(1)?

a = [1]
b = [1]

returns True for a==b? After all, it returns false if b is [2],
so it looks at the content in this case. So for numerics,
it's the value that matters, not the type. And this creates
a false sense of "equality" when a==d returns True.
A bag
of shoes is not the same as a box of shoes, even if they are the same
shoes.

Exactly. For the very reason I show above. The fact that the int
has the same shoes as the mpz doesn't mean the int should be
used, it has to be coerced.
Since both lists and tuples are containers, neither are strings or
numeric types, so the earlier rule applies: they are different types, so
they can't be equal.

But you can't trust a==d returning True to mean a and d are
"equal". To say the comparison means the two objects are
equal is misleading, in other words, wrong. It only takes one
turd to spoil the whole punchbowl.
gmpy.mpz(1) on the other hand, is both a numeric type and a custom class.
It is free to define equal any way that makes sense, and it treats itself
as a numeric type and therefore says that it is equal to 1, just like 1.0
and 1+0j are equal to 1.

They are equal in the mathematical sense, but not otherwise.
And to think that makes no difference is to be naive.
 
S

Steven D'Aprano

Except for the exceptions, that's why the statement is wrong.

But there are no exceptions. X == Y tests for equality. If it returns
True, then the objects are equal by definition. That's what equal means in
Python.

One can abuse the technology to give nonsensical results:

class EqualToEverything(object):
def __eq__(self, other):
return True
x = EqualToEverything()
x == 1.0 True
x == [2.9, "hello world"]
True

but that's no different from any language that allows you to override
operators.


No, it's not, as I show further down.

But you show no such thing.

Or, to put it another way:

Did! Did not! Did! Did not! Did! Did not! ...

a = 1
b = (1,)
c = [1]
d = gmpy.mpz(1)
[snip]
See, a and d are equal.

No, they are not "equal".

Of course they are. It says so right there: "a equals d" is true.

Ints and mpzs should NEVER
be used together in loops, even though it's legal.

Why ever not? If you need an mpz value in order to do something, and no
other data type will do, what would you suggest? Just give up and say
"Don't do this, because it is Bad, m'kay?"
The ints
ALWAYS have to be coerced to mpzs to perform arithmetic
and this takes time...LOTS of it.

Really? Just how much time?

timeit.Timer("x == y", "import gmpy; x = 1; y = gmpy.mpz(1)").repeat()
timeit.Timer("x == y", "x = 1; y = 1").repeat()

I don't have gmpy installed here, so I can't time it, but I look forward
to seeing the results, if you would be so kind.

Even if it is terribly slow, that's just an implementation detail. What
happens when Python 2.7 comes out (or Python 3.0 or Python 99.78) and
coercion from int to mpz is lightning fast? Would you then say "Well,
int(1) and mpz(1) used to be unequal, but now they are equal?".

Me, I'd say they always were equal, but previously it used to be slow to
coerce one to the other.

The absolute stupidest
thing you can do (assuming n is an mpz) is:

while n >1:
if n % 2 == 0:
n = n/2
else:
n = 3*n + 1

Oh, I can think of much stupider things to do.

while len([math.sin(random.random()) for i in range(n)[:]][:]) > 1:
if len( "+" * \
int(len([math.cos(time.time()) for i in \
range(1000, n+1000)[:]][:])/2.0)) == 0:
n = len([math.pi**100/i for i in range(n) if i % 2 == 1][:])
else:
s = '+'
for i in range(n - 1):
s += '+'
s += s[:] + ''.join(reversed(s[:]))
s += s[:].replace('+', '-')[0:1]
n = s[:].count('+') + s[:].count('-')


You should ALWAYS do:

ZED = gmpy.mpz(0)
ONE = gmpy.mpz(1)
TWO = gmpy.mpz(2)
TWE = gmpy.mpz(3)

while n >ONE:
if n % TWO == ZED:
n = n/TWO
else:
n = TWE*n + ONE

This way, no coercion is performed.

I know that algorithm, but I don't remember what it is called...

In any case, what you describe is a local optimization. Its probably a
good optimization, but in no way, shape or form does it imply that mpz(1)
is not equal to 1.

Why should the int 1 return True when compared to mpz(1)?

Because they both represent the same mathematical number, where as a list
containing 1 and a tuple containing 1 are different containers. Even if
the contents are the same, lists aren't equal to tuples.

a = [1]
b = [1]

returns True for a==b?

That's because both are the same kind of container, and they both have the
same contents.

After all, it returns false if b is [2],
so it looks at the content in this case. So for numerics,
it's the value that matters, not the type. And this creates
a false sense of "equality" when a==d returns True.

There's nothing false about it. Ask any mathematician, does 1 equal 1.0,
and they will say "of course".

Exactly. For the very reason I show above. The fact that the int
has the same shoes as the mpz doesn't mean the int should be
used, it has to be coerced.

Ints are not containers. An int doesn't contain values, an int is the
value.

Numeric values are automatically coerced because that's more practical.
That's a design decision, and it works well.

As for gmpy.mpz, since equality tests are completely under the control of
the class author, the gmpy authors obviously wanted mpz values to compare
equal with ints.


But you can't trust a==d returning True to mean a and d are
"equal".

What does it mean then?

To say the comparison means the two objects are
equal is misleading, in other words, wrong. It only takes one
turd to spoil the whole punchbowl.


They are equal in the mathematical sense, but not otherwise.

Since they are mathematical values, what other sense is meaningful?
And to think that makes no difference is to be naive.

I never said that there was no efficiency differences. Comparing X with Y
might take 0.02ms or it could take 2ms depending on how much work needs
to be done. I just don't understand why you think that has a bearing on
whether they are equal or not.
 
C

Carsten Haese

<type 'bool'>

All right, so what you meant was "Assuming that arg is a list, 'if
arg==True' will always fail because lists never compare equal to any
boolean."
Actually, it's this statement that's non-sensical.

<quote>
"if arg==True" tests whether the object known as arg is equal to the
object known as True.
</quote>

[snip examples of "surprising" equality tests...]

The statement I made is simply the meaning of "if arg==True" by
definition, so I don't see how it can be nonsensical.

The problem is that you consider equality tests in Python to be
nonsensical because they don't fit with your opinion of what equality
should mean.

Regards,
 
M

mensanator

<type 'bool'>

All right, so what you meant was "Assuming that arg is a list, 'if
arg==True' will always fail because lists never compare equal to any
boolean."
Actually, it's this statement that's non-sensical.
<quote>
"if arg==True" tests whether the object known as arg is equal to the
object known as True.
</quote>
[snip examples of "surprising" equality tests...]

The statement I made is simply the meaning of "if arg==True" by
definition, so I don't see how it can be nonsensical.

Because you didn't allow for exceptions, which are
prominently pointed out in the Python docs.
The problem is that you consider equality tests in Python to be
nonsensical because they don't fit with your opinion of what equality
should mean.

No, it has nothing to do with what it means. 1, [1], (1,)
and mpz(1) are all different types and all mathmatically
the same. Yet 1 and mpz(1) compare equal but (1,) and
[1] do not. The later fails due to type mis-match, the
former does not despite type mis-match due to the fact
they are the same mathematically.

I'm not saying the situation is wrong, what I'm saying
is that somone who doesn't understand why arg==True
is failing should be told ALL the rules, not just the easy
ones.
 
C

Carsten Haese

Because you didn't allow for exceptions, which are
prominently pointed out in the Python docs.

I said: "if arg==True" tests whether the object known as arg is equal to
the object known as True. There are no exceptions. "==" means "equal",
period! Your problem is that Python's notion of "equal" is different
from your notion of "equal".
The problem is that you consider equality tests in Python to be
nonsensical because they don't fit with your opinion of what equality
should mean.

No, it has nothing to do with what it means. 1, [1], (1,)
and mpz(1) are all different types and all mathmatically
the same. Yet 1 and mpz(1) compare equal but (1,) and
[1] do not.

And that just proves my point. You insist on the notion that equality
means "mathematically the same". Python's equality tests sometimes work
out that way, but that's not how equality actually works, nor how it is
actually defined in Python.

Regards,
 
G

Gabriel Genellina

En Sun, 13 May 2007 23:45:22 -0300, (e-mail address removed)
"...and when I say none, I mean there is a certain amount."

One of the beautiful things about Python that I like, is how few
exceptions it has; most things are rather regular.
 
M

mensanator

I intended to reply to this yesterday, but circumstances
(see timeit results) prevented it.
But there are no exceptions.

<quote emphasis added>
Sec 2.2.3:
Objects of different types, *--->except<---* different numeric types
and different string types, never compare equal;
X == Y tests for equality. If it returns
True, then the objects are equal by definition. That's what equal means in
Python.

One can abuse the technology to give nonsensical results:

class EqualToEverything(object):
def __eq__(self, other):
return True
x = EqualToEverything()
x == 1.0 True
x == [2.9, "hello world"]

True

but that's no different from any language that allows you to override
operators.
No, it's not, as I show further down.

But you show no such thing.

Or, to put it another way:

Did! Did not! Did! Did not! Did! Did not! ...
a = 1
b = (1,)
c = [1]
d = gmpy.mpz(1)
[snip]
a==d
True
See, a and d are equal.
No, they are not "equal".

Of course they are. It says so right there: "a equals d" is true.

Ok, but they are an exception to the rule "different types compare
False".
Why ever not? If you need an mpz value in order to do something, and no
other data type will do, what would you suggest? Just give up and say
"Don't do this, because it is Bad, m'kay?"

It's not the mpzs you shouldn't use, its the ints. I also stessed
"in loops". Replacing an integer literal with a variable still
requires a coercion, so it doesn't matter if n + 1 occurs outside
a loop.
Really? Just how much time?

Can't say, had to abort the following.
Returns the count of n/2 and 3n+1 operations [1531812, 854697].

import gmpy

def collatz(a):
ONE = gmpy.mpz(1)
TWO = gmpy.mpz(2)
TWE = gmpy.mpz(3)
a = gmpy.mpz(a)
t = 0
u = 0
done = 0
while done==0:
f = gmpy.scan1(a,0)
if f>0:
a = a >> f
u += f
else:
if a==1:
done = 1
else:
a = a*TWE + ONE
t += 1
return [u,t]

def collatz2(a):
t = 0
u = 0
done = 0
while done==0:
f = gmpy.scan1(a,0)
if f>0:
a = a >> f
u += f
else:
if a==1:
done = 1
else:
a = a*3 + 1
t += 1
return [u,t]

def test():
collatz(2**177149-1)

def test2():
collatz2(2**177149-1)

if __name__=='__main__':
from timeit import Timer
t = Timer("a = test()", "from __main__ import test")
u = Timer("b = test2()", "from __main__ import test2")
print t.timeit(10)
print u.timeit(10)

## 723.430377542
## *ABORTED after 20 hours*

timeit.Timer("x == y", "import gmpy; x = 1; y = gmpy.mpz(1)").repeat()
timeit.Timer("x == y", "x = 1; y = 1").repeat()

I don't have gmpy installed here,

Good Lord! How do you solve a Linear Congruence? :)
so I can't time it, but I look forward
to seeing the results, if you would be so kind.

I had a lot of trouble with this, but I think I finally got a
handle on it. I had to abort the previous test after 20+ hours
and abort a second test (once I figured out to do your example)
on another machine after 14+ hours. I had forgotten just how
significant the difference is.

import timeit

## t = timeit.Timer("a == b", "a = 1; b = 1")
## u = timeit.Timer("c == d", "import gmpy; c = 1; d =
gmpy.mpz(1)")
## t.repeat()
## [0.22317417437132372, 0.22519314605627253, 0.22474588250741367]
## u.repeat()
## [0.59943819675405763, 0.5962260566636246, 0.60122920650529466]

Unfortunately, this is not a very useful test, since mpz
coercion appears to vary ny the size of the number involved.
Although changing t to

## t = timeit.Timer("a == b", "a = 2**177149-1; b = 2**177149-1")

still produces tractable results
## t.repeat()
## [36.323597552202841, 34.727026758987506, 34.574566320579862]

the same can't be said for mpz coercion:

## u = timeit.Timer("c == d", "import gmpy; c = 2**177149-1; d =
gmpy.mpz(2**177149-1)")
## u.repeat()
## *ABORTED after 14 hours*

So I changed it to (using yet a third machine)

for i in xrange(8):
e = 2*i*100
n = 2**e-1
r = 'a = %d; b = %d' % (n,n)
s = 'import gmpy; a = %d; b = gmpy.mpz(%d)' % (n,n)
print 'For 2**e-1',e
t = timeit.Timer("a == b",r)
u = timeit.Timer("a == b",s)
print t.repeat()
print u.repeat()
print

which clearly shows the growth rate of the mpz coercion.

## int==int vs. int==mpz
##
## For 2**e-1 0
## [0.054264941118974445, 0.054553378257723141,
0.054355515455681791]
## [0.16161957500399435, 0.16188363643198839, 0.16197491752897064]
##
## For 2**e-1 200
## [0.093393746299376912, 0.093660961833065492,
0.092977494572419439]
## [1.0425794607193544, 1.0436544844503342, 1.0451038279715417]
##
## For 2**e-1 400
## [0.10496130299527184, 0.10528292779203152, 0.10497603593951155]
## [2.2687503839249636, 2.2685411490493506, 2.2691453463783233]
##
## For 2**e-1 600
## [0.11724617625774236, 0.11701867087715279, 0.11747874550051129]
## [3.616420796797021, 3.617562537946073, 3.6152373342355801]
##
## For 2**e-1 800
## [0.13156379733273482, 0.1310266632832402, 0.13168082630802047]
## [5.2398534562645089, 5.2389728893525458, 5.2353889230364388]
##
## For 2**e-1 1000
## [0.153719968797283, 0.15383679852633492, 0.15352625633217798]
## [6.967458038928207, 6.9640038947002409, 6.9675019294931388]
##
## For 2**e-1 1200
## [0.16716219584402126, 0.16743472335786436, 0.16782637005291434]
## [11.603391791430532, 11.601063020084396, 11.603106936964878]
##
## For 2**e-1 1400
## [0.179120966908215, 0.17908259508838853, 0.17934175430681876]
## [14.753954507946347, 14.755623642634944, 14.756064585859164]

And, just for laughs, I compared mpzs to mpzs,

s = 'import gmpy; a = gmpy.mpz(%d); b = gmpy.mpz(%d)' % (n,n)

which ended up faster than comparing ints to ints.

## int==int vs. mpz==mpz
##
## For 2**e-1 0
## [0.054301433257206225, 0.054502401293220933,
0.054274144039999611]
## [0.12487657446828507, 0.099130500653189346,
0.094799646619862565]
##
## For 2**e-1 200
## [0.10013419046813476, 0.10156139134030695, 0.10151083166511599]
## [0.091683807483012414, 0.091326269489948375,
0.091261281378934411]
##
## For 2**e-1 400
## [0.10716937998703036, 0.10704403530042028, 0.10705119312788414]
## [0.099165500324245093, 0.097540568227742153,
0.10131808159697742]
##
## For 2**e-1 600
## [0.12060785142996777, 0.11720683828159517, 0.11800506010281886]
## [0.11328210449149934, 0.1146064679843235, 0.11307050873582014]
##
## For 2**e-1 800
## [0.12996358680839437, 0.13021352430898236, 0.12973684081916526]
## [0.12344120825932059, 0.11454960385710677, 0.12339954699673861]
##
## For 2**e-1 1000
## [0.15328649918703752, 0.15362917265815135, 0.15313422618208516]
## [0.12753811336359666, 0.12534907002753748, 0.12588097104350471]
##
## For 2**e-1 1200
## [0.16756264696760326, 0.16747118166182684, 0.167885034915086]
## [0.12162660501311073, 0.13368267591470051, 0.13387503876843265]
##
## For 2**e-1 1400
## [0.17867761017283623, 0.17829534684824377, 0.17826312158720281]
## [0.13718761665773815, 0.13779106963280441, 0.13708166276632738]

Even if it is terribly slow, that's just an implementation detail. What
happens when Python 2.7 comes out (or Python 3.0 or Python 99.78) and
coercion from int to mpz is lightning fast? Would you then say "Well,
int(1) and mpz(1) used to be unequal, but now they are equal?".

Are you saying I should be unconcerned about implementation details?
That it's silly of me to be concerned about implementation side
effects
due to mis-matched types?
Me, I'd say they always were equal, but previously it used to be slow to
coerce one to the other.

So, when you're giving advice to the OP you don't feel any need to
point
this out? That's all I'm trying to do, supply some "yes, but you
should
be aware of..." commentary.
The absolute stupidest
thing you can do (assuming n is an mpz) is:
while n >1:
if n % 2 == 0:
n = n/2
else:
n = 3*n + 1

Oh, I can think of much stupider things to do.

while len([math.sin(random.random()) for i in range(n)[:]][:]) > 1:
if len( "+" * \
int(len([math.cos(time.time()) for i in \
range(1000, n+1000)[:]][:])/2.0)) == 0:
n = len([math.pi**100/i for i in range(n) if i % 2 == 1][:])
else:
s = '+'
for i in range(n - 1):
s += '+'
s += s[:] + ''.join(reversed(s[:]))
s += s[:].replace('+', '-')[0:1]
n = s[:].count('+') + s[:].count('-')
You should ALWAYS do:
ZED = gmpy.mpz(0)
ONE = gmpy.mpz(1)
TWO = gmpy.mpz(2)
TWE = gmpy.mpz(3)
while n >ONE:
if n % TWO == ZED:
n = n/TWO
else:
n = TWE*n + ONE
This way, no coercion is performed.

I know that algorithm, but I don't remember what it is called...

The Collatz Conjecture. If true, it means the while loop
terminates for any n.
In any case, what you describe is a local optimization. Its probably a
good optimization, but in no way, shape or form does it imply that mpz(1)
is not equal to 1.

It's a different type. It is an exception to the "different types
compare
False" rule. That exception is not without cost, the type mis-match
causes coercion.
Why should the int 1 return True when compared to mpz(1)?

Because they both represent the same mathematical number, where as a list
containing 1 and a tuple containing 1 are different containers. Even if
the contents are the same, lists aren't equal to tuples.
a = [1]
b = [1]
returns True for a==b?

That's because both are the same kind of container, and they both have the
same contents.
After all, it returns false if b is [2],
so it looks at the content in this case. So for numerics,
it's the value that matters, not the type. And this creates
a false sense of "equality" when a==d returns True.

There's nothing false about it. Ask any mathematician, does 1 equal 1.0,
and they will say "of course".

And if you ask any mathematician, he'll say that (1,) is equal to [1].
That's the difference between a mathematician and a programmer.
A programmer will say "of course not, the int has to be coered."
Ints are not containers. An int doesn't contain values, an int is the
value.

Numeric values are automatically coerced because that's more practical.
That's a design decision, and it works well.

And I'm not saying it shouldn't be that way. But when I wrote my
Collatz Functions library, I wasn't aware of the performance issues
when doing millions of loop cycles with numbers having millions
of digits. I only found that out later. Would I have gotten a
proper answer on this newgroup had I asked here? Sure doesn't look
like it.

BTW, in reviewing my Collatz Functions library, I noticed a coercion
I had overlooked, so as a result of this discussion, my library is
now slightly faster. So some good comes out of this argument after
all.
As for gmpy.mpz, since equality tests are completely under the control of
the class author, the gmpy authors obviously wanted mpz values to compare
equal with ints.

And they chose to do a silent coercion rather than raise a type
exception.
It says right in the gmpy documentation that this coercion will be
performed.
What it DOESN'T say is what the implications of this silent coercion
are.
What does it mean then?

It means they are mathematically equivalent, which is not the same as
being programatically equivalent. Mathematical equivalency is what
most
people want most of the time. Not all of the people all of the time,
however. For example, I can calculate my Hailstone Function
parameters
using either a list or a tuple:
import collatz_functions as cf
print cf.calc_xyz([1,2]) (mpz(8), mpz(9), mpz(5))
print cf.calc_xyz((1,2))
(mpz(8), mpz(9), mpz(5))

But [1,2]==(1,2) yields False, so although they are not equal,
they ARE interchangeable in this application because they are
mathematically equivalent.
Since they are mathematical values, what other sense is meaningful?


I never said that there was no efficiency differences. Comparing X with Y
might take 0.02ms or it could take 2ms depending on how much work needs
to be done. I just don't understand why you think that has a bearing on
whether they are equal or not.

The bearing it has matters when you're writing a function library that
you want to execute efficiently.
 
C

Carsten Haese

<quote emphasis added>
Sec 2.2.3:
Objects of different types, *--->except<---* different numeric types
and different string types, never compare equal;
</quote>

The exceptions you mean are not exceptions to "'X==Y' means 'X equals
Y'". They are exceptions to "'X equals Y' means 'X is mathematically the
same as Y'," but that is not how equality is actually defined.
 
M

mensanator

The exceptions you mean are not exceptions to "'X==Y' means 'X equals
Y'".

I never said they were. I said they were exceptions to
"Obbjects of different types never compare equal".
They are exceptions to "'X equals Y' means 'X is mathematically the
same as Y',"

Who's "they"?. (1,2) and [1,2] are mathematically equal but
the == comparison returns False. They are not an exception
to "mathematically equal", neither are they exceptions to
"different types never compare equal".

1 and mpz(1) compare equal so aren't an exception to
"mathematically equal" although they are an exception
to "different types never compare equal".

You need to be more explicit about what you're
talking about, as this last argument makes no sense.
but that is not how equality is actually defined.

Ok, I'll bite. How is "equality" defined?

Are you implying that I can interchange 1 and mpz(1)
because the == comparison returns True?

Are you implying that I can't interchange (1,2) and [1,2]
because the == comparison returns False?

Please make sure your definition deals with these cases.
 
G

Gabriel Genellina

En Tue, 15 May 2007 01:37:07 -0300, (e-mail address removed)
I never said they were. I said they were exceptions to
"Obbjects of different types never compare equal".

This is an unfortunate wording, and perhaps should read: "For most builtin
types, objects of different types never compare equal; such objects are
ordered consistently but arbitrarily (so that sorting a heterogeneous
sequence yields a consistent result). The exceptions being different
numeric types and different string types, that have a special treatment;
see section 5.9 in the Reference Manual for details."

And said section 5.9 should be updated too: "The objects need not have the
same type. If both are numbers or strings, they are converted to a common
type. Otherwise, objects of different builtin types always compare
unequal, and are ordered consistently but arbitrarily. You can control
comparison behavior of objects of non-builtin types by defining a __cmp__
method or rich comparison methods like __gt__, described in section 3.4."

I hope this helps a bit. Your performance issues don't have to do with the
*definition* of equal or not equal, only with how someone decided to write
the mpz class.
 
M

mensanator

En Tue, 15 May 2007 01:37:07 -0300, (e-mail address removed)



This is an unfortunate wording, and perhaps should read: "For most builtin
types, objects of different types never compare equal; such objects are
ordered consistently but arbitrarily (so that sorting a heterogeneous
sequence yields a consistent result). The exceptions being different
numeric types and different string types, that have a special treatment;
see section 5.9 in the Reference Manual for details."

And said section 5.9 should be updated too: "The objects need not have the
same type. If both are numbers or strings, they are converted to a common
type.

Except when they aren't.

Traceback (most recent call last):
File "<pyshell#4>", line 1, in <module>
print '%d' % (b)
TypeError: int argument required

So although the comparison operator is smart enough to realize
the equivalency of numeric types and do the type conversion,
the print statement isn't so smart.
Otherwise, objects of different builtin types always compare
unequal, and are ordered consistently but arbitrarily. You can control
comparison behavior of objects of non-builtin types by defining a __cmp__
method or rich comparison methods like __gt__, described in section 3.4."

I hope this helps a bit. Your performance issues don't have to do with the
*definition* of equal or not equal,

I didn't say that, I said the performance issues were related
to type conversion. Can you explain how the "definition" of
equal does not involve type conversion?
only with how someone decided to write the mpz class.

I'm beginning to think there's a problem there.
 
G

Gabriel Genellina

En Tue, 15 May 2007 14:01:20 -0300, (e-mail address removed)
Except when they aren't.

I think you don't get the difference between a builtin object, fully under
the Python developers' control, and a user defined class that can behave
arbitrarily at wish of its writer and for which the Python documentation
barely can say a word.
The docs say how will the Python interpreter try to compare objects
(invoke the rich comparison methods, invoke __cmp__, etc) and how the
*builtin* objects behave. For other objects, it's up to the object
*writer* to provide such methods, and he can do whatever he wishes:

py> class Reversed(int):
.... def __lt__(self, other): return cmp(int(self),other)>0
.... def __gt__(self, other): return cmp(int(self),other)<0
.... def __le__(self, other): return cmp(int(self),other)>=0
.... def __ge__(self, other): return cmp(int(self),other)<=0
....
py>
py> j=Reversed(6)
py> j==6
True
py> j>5
False
py> j>10
True
py> j<=5
True

You can't blame Python for this.
Traceback (most recent call last):
File "<pyshell#4>", line 1, in <module>
print '%d' % (b)
TypeError: int argument required

So although the comparison operator is smart enough to realize
the equivalency of numeric types and do the type conversion,
the print statement isn't so smart.

This is up to the gmpy designers/writers/maintainers. Anyone writing a
class chooses which features to implement, which ones to omit, how to
implement them, etc. The code may contain bugs, may not be efficient, may
not behave exactly as the users expect, may not have anticipated all usage
scenarios, a long etc. In this case, probably the gmpy writers have chosen
not to allow to convert to int, and they may have good reasons to not do
that (I don't know what platform are you working in, but I feel that your
b object is somewhat larger than sys.maxint...).
I didn't say that, I said the performance issues were related
to type conversion. Can you explain how the "definition" of
equal does not involve type conversion?

There is no type conversion involved for user defined classes, *unless*
the class writer chooses to do so.
Let's invent some new class Number; they can be added and have basic
str/repr support

py> class Number(object):
.... def __init__(self, value): self.value=value
.... def __add__(self, other): return Number(self.value+other.value)
.... def __str__(self): return str(self.value)
.... def __repr__(self): return 'Number(%s)' % self.value
....
py> x = Number(2)
py> y = Number(3)
py> z = x+y
py> z
Number(5)
py> z == 5
False
py> 5 == z
False
py> z == Number(5)
False
py> int(z)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: int() argument must be a string or a number
py> "%d" % z
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: int argument required

You can't compare them to anything, convert to integer, still nothing.
Let's add "int conversion" first:

py> Number.__int__ = lambda self: int(self.value)
py> int(z)
5
py> "%d" % z
'5'
py> z == 5
False
py> 5 == z
False

Ok, a Number knows how to convert itself to integer, but still can't be
compared successfully to anything. (Perhaps another language would try to
convert automagically z to int, to compare against 5, but not Python).
Let's add basic comparison support:

py> Number.__cmp__ = lambda self, other: cmp(self.value, other.value)
py> z == Number(5)
True
py> z > Number(7)
False
py> z == z
True
py> z == 5
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<stdin>", line 1, in <lambda>
AttributeError: 'int' object has no attribute 'value'

Now, a Number can be compared to another Number, but still not compared to
integers. Let's make the comparison a bit smarter (uhm, I'll write it as a
regular function because it's getting long...)

py> def NumberCmp(self, other):
.... if isinstance(other, Number): return cmp(self.value, other.value)
.... else: return cmp(self.value, other)
....
py> Number.__cmp__ = NumberCmp
py> z == 5
True
py> z == 6
False
py> 5 == z
True

As you can see, until I wrote some code explicitely to do the comparison,
and allow other types of comparands, Python will not "convert" anything.
If you find that some class appears to do a type conversion when comparing
instances, it's because the class writer has explicitely coded it that
way, not because Python does the conversion automagically.
I'm beginning to think there's a problem there.

Yes: you don't recognize that gmpy is not a builtin package, it's an
external package, and its designers/writers/implementors/coders/whatever
decide how it will behave, not Python itself nor the Python developers.
 
S

Steve Holden

On May 12, 11:02�pm, Steven D'Aprano [ ... ]

But you can't trust a==d returning True to mean a and d are
"equal". To say the comparison means the two objects are
equal is misleading, in other words, wrong. It only takes one
turd to spoil the whole punchbowl.
Unfortunately that is the very *definition* of "equal".
They are equal in the mathematical sense, but not otherwise.
And to think that makes no difference is to be naive.
Perhaps so, but you are a long way from the original question now!

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
------------------ Asciimercial ---------------------
Get on the web: Blog, lens and tag your way to fame!!
holdenweb.blogspot.com squidoo.com/pythonology
tagged items: del.icio.us/steve.holden/python
All these services currently offer free registration!
-------------- Thank You for Reading ----------------
 
S

Steven D'Aprano

I intended to reply to this yesterday, but circumstances (see timeit
results) prevented it.


<quote emphasis added>
Sec 2.2.3:
Objects of different types, *--->except<---* different numeric types and
different string types, never compare equal; </quote>

Yes, and all swans are white, except for the black swans from Australia,
but we're not talking about swans, nor are we talking about objects of
different type comparing unequal, we're talking about whether X == Y
means X is equal to Y.

THERE ARE NO EXCEPTIONS TO THIS, BECAUSE IT IS TRUE BY DEFINITION.

In Python, the meaning of "equal" is nothing more and nothing less than
"does X == Y return True?". End of story, there is nothing more to
discuss. If it returns True, they are equal. If it doesn't, they aren't.

If you want to drag in non-Python meanings of "equal", you are wrong to
do so. "Lizzie Windsor", "Queen Elizabeth the Second", "the Queen of
England" and "Her Royal Majesty, Queen Elizabeth II" are all equal in the
sense that they refer to the same person, but it would be crazy to expect
Python to compare those strings equal.

If you want to complain that lists and tokens should compare equal if
their contents are the same, that's a different issue. I don't believe
you'll have much support for that.

If you want to complain that numeric types shouldn't compare equal, so
that 1.0 != 1 != 1L != gmpy.mpz(1), that's also a different issue. I
believe you'll have even less support for that suggestion.

[snip]

Ok, but they are an exception to the rule "different types compare
False".

You are only quoting part of the rule. The rule says that numeric types
and strings are not included in the "different types" clause. If you
quote the full rule, you will see that it is not an exception to the
rule, it matches perfectly.

Although, the rule as given is actually incomplete, because it only
applies to built-in types. It does not apply to classes, because the
class designer has complete control over the behaviour of his class. If
the designer wants his class to compare equal to lists on Wednesdays and
unequal on other days, he can. (That would be a stupid thing to do, but
possible.)


[snip]
Really? Just how much time?

Can't say, had to abort the following. Returns the count of n/2 and 3n+1
operations [1531812, 854697].

Maybe you should use a test function that isn't so insane then. Honestly,
if you want to time something, time something that actually completes!
You don't gain any accuracy by running a program for twenty hours instead
of twenty minutes.

[snip functions generating the Collatz sequence]

Good Lord! How do you solve a Linear Congruence? :)

In my head of course. Don't you?

*wink*

so I can't time it, but I look forward to seeing the results, if you
would be so kind.

I had a lot of trouble with this, but I think I finally got a handle on
it. I had to abort the previous test after 20+ hours and abort a second
test (once I figured out to do your example) on another machine after
14+ hours. I had forgotten just how significant the difference is.

import timeit

## t = timeit.Timer("a == b", "a = 1; b = 1") ## u =
timeit.Timer("c == d", "import gmpy; c = 1; d = gmpy.mpz(1)")
## t.repeat()
## [0.22317417437132372, 0.22519314605627253, 0.22474588250741367] ##
u.repeat()
## [0.59943819675405763, 0.5962260566636246, 0.60122920650529466]


Comparisons between ints take about 0.2 microseconds, compared to about
0.6 microseconds for small gmpy.mpz values. That's an optimization worth
considering, but certainly not justifying your claim that one should
NEVER compare an int and a mpz "in a loop". If the rest of the loop takes
five milliseconds, who cares about a fraction of a microsecond difference?

Unfortunately, this is not a very useful test, since mpz coercion
appears to vary ny the size of the number involved.

No, it is a very useful test. It's not an EXHAUSTIVE test.

(By the way, you're not testing coercion. You're testing the time it
takes to compare the two. There may or may not be any coercion involved.)

Although changing t to

## t = timeit.Timer("a == b", "a = 2**177149-1; b = 2**177149-1")

still produces tractable results
## t.repeat()
## [36.323597552202841, 34.727026758987506, 34.574566320579862]

About 36 microseconds per comparison, for rather large longints.

the same can't be said for mpz coercion:

## u = timeit.Timer("c == d", "import gmpy; c = 2**177149-1; d =
gmpy.mpz(2**177149-1)")
## u.repeat()
## *ABORTED after 14 hours*

This tells us that a comparison between large longints and large gmpz.mpz
vales take a minimum of 14 hours divided by three million, or roughly 17
milliseconds each. That's horribly expensive if you have a lot of them.

It isn't clear _why_ the comparison takes so long.


[snip]
And, just for laughs, I compared mpzs to mpzs,

s = 'import gmpy; a = gmpy.mpz(%d); b = gmpy.mpz(%d)' % (n,n)

which ended up faster than comparing ints to ints.

I'm hardly surprised. If speed is critical, gmpy is likely to be faster
than anything you can do in pure Python.



[snip]

Are you saying I should be unconcerned about implementation details?
That it's silly of me to be concerned about implementation side effects
due to mis-matched types?

Of course not. But the discussion isn't about optimization, that's just
an irrelevant side-track.

So, when you're giving advice to the OP you don't feel any need to point
this out? That's all I'm trying to do, supply some "yes, but you should
be aware of..." commentary.

Why on Earth would I need to mention gmpy.mpz()? Does the OP even use
gmpy? You were the one who brought gmpy into the discussion, not him. Why
not give him a lecture about not repeatedly adding strings together, or
using << instead of multiplication by two, or any other completely
irrelevant optimization? My favorite, by the way, is that you can save
anything up to an hour of driving time by avoiding Hoddle Street during
peak hour and using the back-streets through Abbotsford, next to Yarra
Bend Park and going under the Eastern Freeway. Perhaps I should have
raised that as well?

It's a different type. It is an exception to the "different types
compare False" rule.

What does this have to do with your ridiculous claim that mpz(1) is not
equal to 1? It clearly is equal.

That exception is not without cost, the type mis-match
causes coercion.

Any comparison has a cost. Sometimes its a lot, sometimes a little. That
has nothing to do with equality.


There's nothing false about it. Ask any mathematician, does 1 equal
1.0, and they will say "of course".

And if you ask any mathematician, he'll say that (1,) is equal to [1].

I'd like to find the mathematician who says that. The first thing he'd
say is "what is this (1,) notation you are using?" and the second thing
he'd ask is "equal in what sense?".

Perhaps you should ask a mathematician if the set {1, 2} and the vector
[1, 2] are equal, and if either of them are equal to the coordinate pair
(1, 2).

That's the difference between a mathematician and a programmer. A
programmer will say "of course not, the int has to be coered."

A C programmer maybe.

[snip]
And I'm not saying it shouldn't be that way. But when I wrote my Collatz
Functions library, I wasn't aware of the performance issues when doing
millions of loop cycles with numbers having millions of digits. I only
found that out later. Would I have gotten a proper answer on this
newgroup had I asked here? Sure doesn't look like it.

If you had asked _what_? Unless you tell me what question you asked, how
can anyone guess what answer you would have received?

If you had asked a question about optimization, you surely would have
received an answer about optimization.

If you asked about string concatenation, you would have received a
question about string concatenation.

If you had asked a question about inheritance, you would have received an
answer about inheritance.

See the pattern?

BTW, in reviewing my Collatz Functions library, I noticed a coercion I
had overlooked, so as a result of this discussion, my library is now
slightly faster. So some good comes out of this argument after all.



And they chose to do a silent coercion rather than raise a type
exception.
It says right in the gmpy documentation that this coercion will be
performed.
What it DOESN'T say is what the implications of this silent coercion
are.

OF COURSE a coercion takes time. This is Python, where everything is a
rich object, not some other language where a coercion merely tells the
compiler to consider bytes to be some other type. If you need your hand-
held to the point that you need somebody to tell you that operations take
time, maybe you need to think about changing professions.

The right way to do this is to measure first, then worry about
optimizations. The wrong way is to try to guess the bottlenecks ahead of
time. The worse way is to expect other people to tell you were your
bottlenecks are ahead of time.


It means they are mathematically equivalent, which is not the same as
being programatically equivalent. Mathematical equivalency is what most
people want most of the time.

I think that by "most people", you mean you.

Not all of the people all of the time,
however. For example, I can calculate my Hailstone Function parameters
using either a list or a tuple:
import collatz_functions as cf
print cf.calc_xyz([1,2]) (mpz(8), mpz(9), mpz(5))
print cf.calc_xyz((1,2))
(mpz(8), mpz(9), mpz(5))

But [1,2]==(1,2) yields False, so although they are not equal, they ARE
interchangeable in this application because they are mathematically
equivalent.

No, they aren't mathematically equivalent, because Python data structures
aren't mathematical entities. (They may be _similar to_ mathematical
entities, but they aren't the same. Just ask a mathematician about the
difference between a Real number and a float.)

They are, however, both sequences, and so if your function expects any
sequence, they will both work.


[snip]
The bearing it has matters when you're writing a function library that
you want to execute efficiently.

Which is true, but entirely irrelevant to the question in hand, which is
"are they equal?".
 
M

mensanator

I intended to reply to this yesterday, but circumstances (see timeit
results) prevented it.
<quote emphasis added>
Sec 2.2.3:
Objects of different types, *--->except<---* different numeric types and
different string types, never compare equal; </quote>

Yes, and all swans are white, except for the black swans from Australia,
but we're not talking about swans, nor are we talking about objects of
different type comparing unequal, we're talking about whether X == Y
means X is equal to Y.

THERE ARE NO EXCEPTIONS TO THIS, BECAUSE IT IS TRUE BY DEFINITION.

In Python, the meaning of "equal" is nothing more and nothing less than
"does X == Y return True?". End of story, there is nothing more to
discuss. If it returns True, they are equal. If it doesn't, they aren't.

If you want to drag in non-Python meanings of "equal", you are wrong to
do so. "Lizzie Windsor", "Queen Elizabeth the Second", "the Queen of
England" and "Her Royal Majesty, Queen Elizabeth II" are all equal in the
sense that they refer to the same person, but it would be crazy to expect
Python to compare those strings equal.

If you want to complain that lists and tokens should compare equal if
their contents are the same, that's a different issue. I don't believe
you'll have much support for that.

If you want to complain that numeric types shouldn't compare equal, so
that 1.0 != 1 != 1L != gmpy.mpz(1), that's also a different issue. I
believe you'll have even less support for that suggestion.

[snip]
Ok, but they are an exception to the rule "different types compare
False".

You are only quoting part of the rule. The rule says that numeric types
and strings are not included in the "different types" clause. If you
quote the full rule, you will see that it is not an exception to the
rule, it matches perfectly.

Uh...ok, I get it...I think.

I always thought that when someone said "all primes are
odd except 2" it meant that 2 was was an exception.
But since the rule specifically says 2 is an exception,
it's not an exception.
Although, the rule as given is actually incomplete, because it only
applies to built-in types. It does not apply to classes, because the
class designer has complete control over the behaviour of his class. If
the designer wants his class to compare equal to lists on Wednesdays and
unequal on other days, he can. (That would be a stupid thing to do, but
possible.)

[snip]
Can't say, had to abort the following. Returns the count of n/2 and 3n+1
operations [1531812, 854697].

Maybe you should use a test function that isn't so insane then. Honestly,
if you want to time something, time something that actually completes!
You don't gain any accuracy by running a program for twenty hours instead
of twenty minutes.

Actually, I misunderstood the timeit tests, didn't quite realize the
difference between .timeit() and .repeat(). And although that number
may look insane, it's one I'm quite familiar with so I can tell that
everything's working right. My Collatz research tends to be on the
fringe, in places where angels fear to tread.
[snip functions generating the Collatz sequence]
Good Lord! How do you solve a Linear Congruence? :)

In my head of course. Don't you?

*wink*
I had a lot of trouble with this, but I think I finally got a handle on
it. I had to abort the previous test after 20+ hours and abort a second
test (once I figured out to do your example) on another machine after
14+ hours. I had forgotten just how significant the difference is.
import timeit
##    t = timeit.Timer("a == b", "a = 1; b = 1") ##    u =
timeit.Timer("c == d", "import gmpy; c = 1; d = gmpy.mpz(1)")
##    t.repeat()
##    [0.22317417437132372, 0.22519314605627253, 0.22474588250741367] ##
   u.repeat()
##    [0.59943819675405763, 0.5962260566636246, 0.60122920650529466]

Comparisons between ints take about 0.2 microseconds, compared to about
0.6 microseconds for small gmpy.mpz values. That's an optimization worth
considering, but certainly not justifying your claim that one should
NEVER compare an int and a mpz "in a loop". If the rest of the loop takes
five milliseconds, who cares about a fraction of a microsecond difference?
Unfortunately, this is not a very useful test, since mpz coercion
appears to vary ny the size of the number involved.

No, it is a very useful test. It's not an EXHAUSTIVE test.

(By the way, you're not testing coercion. You're testing the time it
takes to compare the two. There may or may not be any coercion involved.)

But isn't the difference between t.repeat() and u.repeat() due to
coercion?
Although changing t to
##    t = timeit.Timer("a == b", "a = 2**177149-1; b = 2**177149-1")
still produces tractable results
##    t.repeat()
##    [36.323597552202841, 34.727026758987506, 34.574566320579862]

About 36 microseconds per comparison, for rather large longints.
the same can't be said for mpz coercion:
##    u = timeit.Timer("c == d", "import gmpy; c = 2**177149-1; d =
gmpy.mpz(2**177149-1)")
##    u.repeat()
##    *ABORTED after 14 hours*

This tells us that a comparison between large longints and large gmpz.mpz
vales take a minimum of 14 hours divided by three million,

I thought it was 14 hours divided by 3. I said I didn't quite
understand how timeit worked.
or roughly 17
milliseconds each. That's horribly expensive if you have a lot of them.

Yeah, and that will be the case for large numbers which is
why I chose that insane number. In the Collatz test, that
works out to about 1.7 million loop cycles. Run time is
logarithmic to number size, so truly insane values still have
tractable run times. Provided you don't mistakenly ask for
3 million tests thinking it's 3.
It isn't clear _why_ the comparison takes so long.

I'm thinking there may be something wrong.
[snip]
And, just for laughs, I compared mpzs to mpzs,
    s = 'import gmpy; a = gmpy.mpz(%d); b = gmpy.mpz(%d)' % (n,n)
which ended up faster than comparing ints to ints.

I'm hardly surprised. If speed is critical, gmpy is likely to be faster
than anything you can do in pure Python.

[snip]
Are you saying I should be unconcerned about implementation details?
That it's silly of me to be concerned about implementation side effects
due to mis-matched types?

Of course not. But the discussion isn't about optimization, that's just
an irrelevant side-track.
So, when you're giving advice to the OP you don't feel any need to point
this out? That's all I'm trying to do, supply some "yes, but you should
be aware of..." commentary.

Why on Earth would I need to mention gmpy.mpz()? Does the OP even use
gmpy? You were the one who brought gmpy into the discussion, not him. Why
not give him a lecture about not repeatedly adding strings together, or
using << instead of multiplication by two, or any other completely
irrelevant optimization? My favorite, by the way, is that you can save
anything up to an hour of driving time by avoiding Hoddle Street during
peak hour and using the back-streets through Abbotsford, next to Yarra
Bend Park and going under the Eastern Freeway. Perhaps I should have
raised that as well?
It's a different type. It is an exception to the "different types
compare False" rule.

What does this have to do with your ridiculous claim that mpz(1) is not
equal to 1? It clearly is equal.
That exception is not without cost, the type mis-match
causes coercion.

Any comparison has a cost. Sometimes its a lot, sometimes a little. That
has nothing to do with equality.
And if you ask any mathematician, he'll say that (1,) is equal to [1].

I'd like to find the mathematician who says that. The first thing he'd
say is "what is this (1,) notation you are using?" and the second thing
he'd ask is "equal in what sense?".

Perhaps you should ask a mathematician if the set {1, 2} and the vector
[1, 2] are equal, and if either of them are equal to the coordinate pair
(1, 2).
That's the difference between a mathematician and a programmer. A
programmer will say "of course not, the int has to be coered."

A C programmer maybe.

[snip]
And I'm not saying it shouldn't be that way. But when I wrote my Collatz
Functions library, I wasn't aware of the performance issues when doing
millions of loop cycles with numbers having millions of digits. I only
found that out later. Would I have gotten a proper answer on this
newgroup had I asked here? Sure doesn't look like it.

If you had asked _what_? Unless you tell me what question you asked, how
can anyone guess what answer you would have received?

If you had asked a question about optimization, you surely would have
received an answer about optimization.

If you asked about string concatenation, you would have received a
question about string concatenation.

If you had asked a question about inheritance, you would have received an
answer about inheritance.

See the pattern?
BTW, in reviewing my Collatz Functions library, I noticed a coercion I
had overlooked, so as a result of this discussion, my library is now
slightly faster. So some good comes out of this argument after all.
And they chose to do a silent coercion rather than raise a type
exception.
It says right in the gmpy documentation that this coercion will be
performed.
What it DOESN'T say is what the implications of this silent coercion
are.

OF COURSE a coercion takes time. This is Python, where everything is a
rich object, not some other language where a coercion merely tells the
compiler to consider bytes to be some other type. If you need your hand-
held to the point that you need somebody to tell you that operations take
time, maybe you need to think about changing professions.

The right way to do this is to measure first, then worry about
optimizations. The wrong way is to try to guess the bottlenecks ahead of
time. The worse way is to expect other people to tell you were your
bottlenecks are ahead of time.


It means they are mathematically equivalent, which is not the same as
being programatically equivalent. Mathematical equivalency is what most
people want most of the time.

I think that by "most people", you mean you.
Not all of the people all of the time,
however. For example, I can calculate my Hailstone Function parameters
using either a list or a tuple:
import collatz_functions as cf
print cf.calc_xyz([1,2]) (mpz(8), mpz(9), mpz(5))
print cf.calc_xyz((1,2))
(mpz(8), mpz(9), mpz(5))
But [1,2]==(1,2) yields False, so although they are not equal, they ARE
interchangeable in this application because they are mathematically
equivalent.

No, they aren't mathematically equivalent, because Python data structures
aren't mathematical entities. (They may be _similar to_ mathematical
entities, but they aren't the same. Just ask a mathematician about the
difference between a Real number and a float.)

They are, however, both sequences, and so if your function expects any
sequence, they will both work.

[snip]
The bearing it has matters when you're writing a function library that
you want to execute efficiently.

Which is true, but entirely irrelevant to the question in hand, which is
"are they equal?".

Hey, here's an idea...let's forget the whole thing.
 
M

mensanator

En Tue, 15 May 2007 14:01:20 -0300, (e-mail address removed)



I think you don't get the difference between a builtin object, fully under
the Python developers' control, and a user defined class that can behave
arbitrarily at wish of its writer and for which the Python documentation
barely can say a word.
The docs say how will the Python interpreter try to compare objects
(invoke the rich comparison methods, invoke __cmp__, etc) and how the
*builtin* objects behave. For other objects, it's up to the object
*writer* to provide such methods, and he can do whatever he wishes:

py> class Reversed(int):
... def __lt__(self, other): return cmp(int(self),other)>0
... def __gt__(self, other): return cmp(int(self),other)<0
... def __le__(self, other): return cmp(int(self),other)>=0
... def __ge__(self, other): return cmp(int(self),other)<=0
...
py>
py> j=Reversed(6)
py> j==6
True
py> j>5
False
py> j>10
True
py> j<=5
True

You can't blame Python for this.




This is up to the gmpy designers/writers/maintainers. Anyone writing a
class chooses which features to implement, which ones to omit, how to
implement them, etc. The code may contain bugs, may not be efficient, may
not behave exactly as the users expect, may not have anticipated all usage
scenarios, a long etc. In this case, probably the gmpy writers have chosen
not to allow to convert to int, and they may have good reasons to not do
that (I don't know what platform are you working in, but I feel that your
b object is somewhat larger than sys.maxint...).

Then how does this work?
1454...<53320 digits snipped>...3311

I honestly don't understand why there's a problem here.
If print can handle arbitrary precision longs without
a problem, why does it fail on mpzs > sys.maxint?
If the gmpy writers are not allowing the conversion,
then why do small mpz values work? Something smells
inconsistent here.

How is it that
1

doesn't make a type mismatch? Obviously, the float
got changed to an int and this had nothing to do with
gmpy. Is it the print process responsible for doing
the conversion? Maybe I should say invoking the
conversion? Maybe the gmpy call tries to literally
convert to an integer rather than sneakily substitute
a long?

How else can this phenomena be explained?
There is no type conversion involved for user defined classes, *unless*
the class writer chooses to do so.
Let's invent some new class Number; they can be added and have basic
str/repr support

py> class Number(object):
... def __init__(self, value): self.value=value
... def __add__(self, other): return Number(self.value+other.value)
... def __str__(self): return str(self.value)
... def __repr__(self): return 'Number(%s)' % self.value
...
py> x = Number(2)
py> y = Number(3)
py> z = x+y
py> z
Number(5)
py> z == 5
False
py> 5 == z
False
py> z == Number(5)
False
py> int(z)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: int() argument must be a string or a number
py> "%d" % z
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: int argument required

You can't compare them to anything, convert to integer, still nothing.
Let's add "int conversion" first:

py> Number.__int__ = lambda self: int(self.value)
py> int(z)
5
py> "%d" % z
'5'
py> z == 5
False
py> 5 == z
False

Ok, a Number knows how to convert itself to integer, but still can't be
compared successfully to anything. (Perhaps another language would try to
convert automagically z to int, to compare against 5, but not Python).
Let's add basic comparison support:

py> Number.__cmp__ = lambda self, other: cmp(self.value, other.value)
py> z == Number(5)
True
py> z > Number(7)
False
py> z == z
True
py> z == 5
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<stdin>", line 1, in <lambda>
AttributeError: 'int' object has no attribute 'value'

Now, a Number can be compared to another Number, but still not compared to
integers. Let's make the comparison a bit smarter (uhm, I'll write it as a
regular function because it's getting long...)

py> def NumberCmp(self, other):
... if isinstance(other, Number): return cmp(self.value, other.value)
... else: return cmp(self.value, other)
...
py> Number.__cmp__ = NumberCmp
py> z == 5
True
py> z == 6
False
py> 5 == z
True

As you can see, until I wrote some code explicitely to do the comparison,
and allow other types of comparands, Python will not "convert" anything.
If you find that some class appears to do a type conversion when comparing
instances, it's because the class writer has explicitely coded it that
way, not because Python does the conversion automagically.

Ok, ok. But how does the subroutine that the class
writer created to do the actual conversion get invoked?
 
G

Gabriel Genellina

En Wed, 16 May 2007 03:16:59 -0300, (e-mail address removed)
On May 15, 7:07 pm, "Gabriel Genellina" <[email protected]>
wrote:

Then how does this work?

1454...<53320 digits snipped>...3311

I honestly don't understand why there's a problem here.
If print can handle arbitrary precision longs without
a problem, why does it fail on mpzs > sys.maxint?
If the gmpy writers are not allowing the conversion,
then why do small mpz values work? Something smells
inconsistent here.

Python (builtin) "integral numbers" come on two flavors: int and long.
ints correspond to the C `long` type usually, and have a limited range, at
least from -2**31 to 2**31-1; most operations have hardware support (or at
least it's up to the C compiler). Long integers are a totally different
type, they have unlimited range but are a lot slower, and all operations
must be done "by hand". See http://docs.python.org/ref/types.html

If you say "%d" % something, Python first tries to see if `something` is a
long integer -not to *convert* it to a long integer, just to see if the
object *is* a long integer. If it's a long, it's formatted accordingly.
If not, Python sees if `something` is a plain integer. If not, it sees if
it's a number (in this context, that means that the structure describing
its type contains a non-NULL tp_as_number member) and tries to *convert*
it to an integer. Notice that if the object whas not originally a long
integer, no attempt is made to convert it to a long using the nb_long
member - just a plain integer conversion is attempted.
It's at this stage that a large mpz object may fail - when its value can't
fit in a plain integer, it raises an OverflowError and the "%d" formatting
fails.
If you force a conversion to long integer, using long(mpz(...)) as above,
the % operator sees a long integer from start and it can be formatted
without problems.

I don't know if this asymmetric behavior is a design decision, a historic
relic, a change in protocol (is nb_int allowed now to return a
PyLongObject, but not before?), a "nobody cares" issue, or just a bug.
Perhaps someone else can give an opinion - and certainly I may be wrong, I
had never looked at the PyString_Format function internal details before
(thanks for providing an excuse!).

As a workaround you can always write "%d" % long(mpznumber) when you want
to print them (or perhaps "%s" % mpznumber, which might be faster).
How is it that

1

doesn't make a type mismatch? Obviously, the float
got changed to an int and this had nothing to do with
gmpy. Is it the print process responsible for doing
the conversion? Maybe I should say invoking the
conversion? Maybe the gmpy call tries to literally
convert to an integer rather than sneakily substitute
a long?

Same as above: is the argument a long integer? no. is it a number? yes.
Convert to int. No errors? Apply format.
 
M

mensanator

En Wed, 16 May 2007 03:16:59 -0300, (e-mail address removed)
<[email protected]> escribió:








Python (builtin) "integral numbers" come on two flavors: int and long.
ints correspond to the C `long` type usually, and have a limited range, at
least from -2**31 to 2**31-1; most operations have hardware support (or at
least it's up to the C compiler). Long integers are a totally different
type, they have unlimited range but are a lot slower, and all operations
must be done "by hand". Seehttp://docs.python.org/ref/types.html

If you say "%d" % something, Python first tries to see if `something` is a
long integer -not to *convert* it to a long integer, just to see if the
object *is* a long integer. If it's a long, it's formatted accordingly.
If not, Python sees if `something` is a plain integer. If not, it sees if
it's a number (in this context, that means that the structure describing
its type contains a non-NULL tp_as_number member) and tries to *convert*
it to an integer. Notice that if the object whas not originally a long
integer, no attempt is made to convert it to a long using the nb_long
member - just a plain integer conversion is attempted.
It's at this stage that a large mpz object may fail - when its value can't
fit in a plain integer, it raises an OverflowError and the "%d" formatting
fails.
If you force a conversion to long integer, using long(mpz(...)) as above,
the % operator sees a long integer from start and it can be formatted
without problems.

I don't know if this asymmetric behavior is a design decision, a historic
relic, a change in protocol (is nb_int allowed now to return a
PyLongObject, but not before?), a "nobody cares" issue, or just a bug.
Perhaps someone else can give an opinion - and certainly I may be wrong, I
had never looked at the PyString_Format function internal details before
(thanks for providing an excuse!).

Ah, thanks for the info, I know nothing about Python internals.

That implies that although this works:
1234567890

this does not:

Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
print '%d' %(12345678901234567890.0)
TypeError: int argument required

So we can work around it by doing the long conversion
ourselves since print only knows how to invoke int conversion.
12345678901234567168

which demonstartes the problem is not with gmpy.
As a workaround you can always write "%d" % long(mpznumber) when you want
to print them (or perhaps "%s" % mpznumber, which might be faster).


Same as above: is the argument a long integer? no. is it a number? yes.
Convert to int. No errors? Apply format.

Thanks again, as long as I know why the behaviour is strange,
I know how to work around it
 
A

Alex Martelli

Gabriel Genellina said:
This is up to the gmpy designers/writers/maintainers. Anyone writing a
class chooses which features to implement, which ones to omit, how to
implement them, etc. The code may contain bugs, may not be efficient, may
not behave exactly as the users expect, may not have anticipated all usage
scenarios, a long etc. In this case, probably the gmpy writers have chosen
not to allow to convert to int, and they may have good reasons to not do
that (I don't know what platform are you working in, but I feel that your
b object is somewhat larger than sys.maxint...).

The gmpy designer, writer and maintainer (all in the singular -- that's
me) has NOT chosen anything of the sort. gmpy.mpz does implement
__int__ and __long__ -- but '%d'%somempzinstance chooses not to call
either of them. sys.maxint has nothing to do with the case:
'%d'%somelonginstance DOES work just fine -- hey, even a *float*
instance formats just fine here (it gets truncated). I personally
consider this a bug in %d-formatting, definitely NOT in gmpy.


Alex
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,274
Messages
2,571,368
Members
48,060
Latest member
JerrodSimc

Latest Threads

Top