checking if an object IS in a list

N

nicolas.pourcelot

Hi,

I want to test if an object IS in a list (identity and not equality
test).
I can if course write something like this :

test = False
myobject = MyCustomClass(*args, **kw)
for element in mylist:
if element is myobject:
test = True
break

and I can even write a isinlist(elt, mylist) function.

But most of the time, when I need some basic feature in python, I
discover later it is in fact already implemented. ;-)

So, is there already something like that in python ?
I tried to write :
'element is in mylist'
but this appeared to be incorrect syntax...

All objects involved all have an '__eq__' method.


Thanks,

N. P.

PS: Btw, how is set element comparison implemented ? My first
impression was that 'a' and 'b' members are considered equal if and
only if hash(a) == hash(b), but I was obviously wrong :.... def __eq__(self,y):
.... return False
.... def __hash__(self):
.... return 5
....
a=A();b=A()
a==b False
hash(b)==hash(a) True
b in set([a]) False
S=set([a])
S.difference()

set([<__main__.A object at 0xb7a91dac>])

So there is some equality check also, maybe only if '__eq__' is
implemented ?
 
P

Peter Otten

Hi,

I want to test if an object IS in a list (identity and not equality
test).
I can if course write something like this :

test = False
myobject = MyCustomClass(*args, **kw)
for element in mylist:
if element is myobject:
test = True
break

and I can even write a isinlist(elt, mylist) function.

But most of the time, when I need some basic feature in python, I
discover later it is in fact already implemented. ;-)

So, is there already something like that in python ?
I tried to write :
'element is in mylist'
but this appeared to be incorrect syntax...

There is no "is in" operator in Python, but you can write your test more
concisely as

any(myobject is element for element in mylist)

PS: Btw, how is set element comparison implemented ? My first
impression was that 'a' and 'b' members are considered equal if and
only if hash(a) == hash(b), but I was obviously wrong :... def __eq__(self,y):
... return False
... def __hash__(self):
... return 5
...
a=A();b=A()
a==b False
hash(b)==hash(a) True
b in set([a]) False
S=set([a])
S.difference()

set([<__main__.A object at 0xb7a91dac>])

So there is some equality check also, maybe only if '__eq__' is
implemented ?


In general equality is determined by __eq__() or __cmp__(). By default
object equality checks for identity.

Some containers (like the built-in set and dict) assume that a==b implies
hash(a) == hash(b).

Peter
 
N

nicolas.pourcelot

There is no "is in" operator in Python, but you can write your test more
concisely as

any(myobject is element for element in mylist)

Thanks a lot
However, any() is only available if python version is >= 2.5, but I
may define a any() function on initialisation, if python version < 2.5

I think something likewould also work, also it's not so readable, and maybe not so fast
(?)...

An "is in" operator would be nice...
PS: Btw, how is set element comparison implemented ? My first
impression was that 'a' and 'b' members are considered equal if and
only if hash(a) == hash(b), but I was obviously wrong :
class A(object):
... def __eq__(self,y):
... return False
... def __hash__(self):
... return 5
...
a=A();b=A()
a==b False
hash(b)==hash(a)
True
b in set([a]) False
S=set([a])
S.difference()

set([<__main__.A object at 0xb7a91dac>])

So there is some equality check also, maybe only if '__eq__' is
implemented ?

In general equality is determined by __eq__() or __cmp__(). By default
object equality checks for identity.

Some containers (like the built-in set and dict) assume that a==b implies
hash(a) == hash(b).

Peter


So, precisely, you mean that if hash(a) != hash(b), a and b are
considered distinct, and else [ie. if hash(a) == hash(b)], a and b are
the same if and only if a == b ?
 
P

Peter Otten

I think something like
would also work, also it's not so readable, and maybe not so fast
(?)...

An "is in" operator would be nice...

And rarely used. Probably even less than the (also missing)

< in, | in, you-name-it

operators...
So, precisely, you mean that if hash(a) != hash(b), a and b are
considered distinct, and else [ie. if hash(a) == hash(b)], a and b are
the same if and only if a == b ?

Correct for set, dict. For lists etc. the hash doesn't matter:
.... def __hash__(self):
.... return nexthash()
.... def __eq__(self, other):
.... return True
....
from itertools import count
nexthash = count().next
A() in [A() for _ in range(3)] True
d = dict.fromkeys([A() for a in range(3)])
d.keys()[0] in d
False

Peter
 
N

nicolas.pourcelot

In fact, 'any(myobject is element for element in mylist)' is 2 times
slower than using a for loop, and 'id(myobject) in (id(element) for
element in mylist)' is 2.4 times slower.
 
P

Peter Otten

In fact, 'any(myobject is element for element in mylist)' is 2 times
slower than using a for loop, and 'id(myobject) in (id(element) for
element in mylist)' is 2.4 times slower.

This is not a meaningful statement unless you at least qualify with the
number of item that are actually checked. For sufficently long sequences
both any() and the for loop take roughly the same amount of time over here.

$ python -m timeit -s"items=range(1000); x = 1000" "any(x is item for item
in items)"
1000 loops, best of 3: 249 usec per loop
$ python -m timeit -s"items=range(1000); x = 1000" "for item in items:" "
if x is item: break"
1000 loops, best of 3: 276 usec per loop

$ python -m timeit -s"items=range(1000); x = 0" "any(x is item for item in
items)"
100000 loops, best of 3: 3 usec per loop
$ python -m timeit -s"items=range(1000); x = 0" "for item in items:" " if x
is item: break"
1000000 loops, best of 3: 0.317 usec per loop


Peter

PS: Take these numbers with a grain of salt, they vary a lot between runs.
 
N

nicolas.pourcelot

This is not a meaningful statement unless you at least qualify with the
number of item that are actually checked. For sufficently long sequences
both any() and the for loop take roughly the same amount of time over here.

Sorry. I used short lists (a list of 20 floats) and the element
checked was not in the list.
(That was the case I usually deals with in my code.)
 
P

Peter Otten

Sorry. I used short lists (a list of 20 floats) and the element
checked was not in the list.
(That was the case I usually deals with in my code.)

What is your (concrete) use case, by the way?

If you want efficiency you should use a dictionary instead of the list
anyway:

$ python -m timeit -s"d=dict((id(i), i) for i in range(1000)); x =
1000" "id(x) in d"
1000000 loops, best of 3: 0.275 usec per loop

Peter
 
B

bearophileHUGS

Peter Otten:
PS: Take these numbers with a grain of salt, they vary a lot between runs.

Another possibility :)
from itertools import imap
id(x) in imap(id, items)

If you want efficiency you should use a dictionary instead of the list anyway:

I agree, but sometimes you have few items to look for, so building the
whole dict (that requires memory too) may be a waste of time.

In theory this may be faster to build, but in practice you need a
benchmark:
ids = set(imap(id, items))
followed by:
id(x) in ids

Bye,
bearophile
 
N

nicolas.pourcelot

What is your (concrete) use case, by the way?



I try to make it simple (there is almost 25000 lines of code...)
I have a sheet with geometrical objects (points, lines, polygons,
etc.)
The sheet have an object manager.

So, to simplify :

Then we have :
True

since have and B have the same coordinates.
But of course A and B objects are not same python objects.
In certain cases, some geometrical objects are automatically
referenced in the sheet, without being defined by the user.
(Edges for polygons, for example...)
But they must not be referenced twice. So if the edge of the polygon
is already referenced (because the polygon uses an already referenced
object for its construction...), it must not be referenced again.
However, if there is an object, which accidentally have the same
coordinates, it must be referenced with a different name.

So, I use something like this in 'sheet.objects.__setattr__(self,
name, value)':
if type(value) == Polygon:
for edge in value.edges:
if edge is_in sheet.objects.__dict__.itervalues():
object.__setattr__(self, self.__new_name(), edge)

Ok, I suppose it's confused, but it's difficult to sum up. ;-)
Another possibility :)
from itertools import imap
id(x) in imap(id, items)

I didn't know itertools.
Thanks :)
 
M

Marc 'BlackJack' Rintsch

So, I use something like this in 'sheet.objects.__setattr__(self,
name, value)':
if type(value) == Polygon:
for edge in value.edges:
if edge is_in sheet.objects.__dict__.itervalues():
object.__setattr__(self, self.__new_name(), edge)

Ok, I suppose it's confused, but it's difficult to sum up. ;-)

You are setting attributes with computed names? How do you access them?
Always with `gettattr()` or via the `__dict__`? If the answer is yes, why
don't you put the objects the into a dictionary instead of the extra
redirection of an objects `__dict__`?

Oh and the `type()` test smells like you are implementing polymorphism
in a way that should be replaced by OOP techniques.

Ciao,
Marc 'BlackJack' Rintsch
 
P

Peter Otten

I try to make it simple (there is almost 25000 lines of code...)
I have a sheet with geometrical objects (points, lines, polygons,
etc.)
The sheet have an object manager.

So, to simplify :


Then we have :

True

since have and B have the same coordinates.
But of course A and B objects are not same python objects.
In certain cases, some geometrical objects are automatically
referenced in the sheet, without being defined by the user.
(Edges for polygons, for example...)
But they must not be referenced twice. So if the edge of the polygon
is already referenced (because the polygon uses an already referenced
object for its construction...), it must not be referenced again.
However, if there is an object, which accidentally have the same
coordinates, it must be referenced with a different name.

So, I use something like this in 'sheet.objects.__setattr__(self,
name, value)':
if type(value) == Polygon:
for edge in value.edges:
if edge is_in sheet.objects.__dict__.itervalues():
object.__setattr__(self, self.__new_name(), edge)

Ok, I suppose it's confused, but it's difficult to sum up. ;-)

I won't pretend I understand ;)

If you make Point immutable you might be able to drop the "must not be
referenced twice" requirement.

Peter
 
T

Terry Reedy

Peter said:
So, precisely, you mean that if hash(a) != hash(b), a and b are
considered distinct, and else [ie. if hash(a) == hash(b)], a and b are
the same if and only if a == b ?

Correct for set, dict. For lists etc. the hash doesn't matter:

Since CPython saves strings hashes as part of the string object (last I
read, as part of internal string caching), it does something similar.
Compare lengths, then hashes, then C array.
 
N

nicolas.pourcelot

You are setting attributes with computed names?  How do you access them?
Always with `gettattr()` or via the `__dict__`?  If the answer is yes, why
don't you put the objects the into a dictionary instead of the extra
redirection of an objects `__dict__`?

Yes, I may subclass dict, and change its __getitem__ and __setitem__
methods, instead of changing objets __setattr__ and __getattr__... But
I prefer
sheet.objects.A = Point(0, 0) than
sheet.objects["A"] = Point(0, 0)


Oh and the `type()` test smells like you are implementing polymorphism
in a way that should be replaced by OOP techniques.

I wrote 'type' here by mistake, but I used 'isinstance' in my
code. ;-)

If you make Point immutable you might be able to drop the "must not be
referenced twice" requirement.

Yes, but unfortunately I can't (or it would require complete
redesign...)
 
J

John Machin

You are setting attributes with computed names? How do you access them?
Always with `gettattr()` or via the `__dict__`? If the answer is yes, why
don't you put the objects the into a dictionary instead of the extra
redirection of an objects `__dict__`?

Yes, I may subclass dict, and change its __getitem__ and __setitem__
methods, instead of changing objets __setattr__ and __getattr__... But
I prefer
sheet.objects.A = Point(0, 0) than
sheet.objects["A"] = Point(0, 0)
Oh and the `type()` test smells like you are implementing polymorphism
in a way that should be replaced by OOP techniques.

I wrote 'type' here by mistake, but I used 'isinstance' in my
code. ;-)
If you make Point immutable you might be able to drop the "must not be

referenced twice" requirement.

Yes, but unfortunately I can't (or it would require complete
redesign...)

(1) You are searching through lists to find float objects by identity,
not by value
(2) Peter says he doesn't understand
(3) Marc thinks it smells

IOW, the indications are that it *already* requires complete redesign.
 
M

Marc 'BlackJack' Rintsch

You are setting attributes with computed names?  How do you access them?
Always with `gettattr()` or via the `__dict__`?  If the answer is yes, why
don't you put the objects the into a dictionary instead of the extra
redirection of an objects `__dict__`?

Yes, I may subclass dict, and change its __getitem__ and __setitem__
methods, instead of changing objets __setattr__ and __getattr__... But
I prefer
sheet.objects.A = Point(0, 0) than
sheet.objects["A"] = Point(0, 0)

But with computed names isn't the difference more like

setattr(sheet.objects, name, Point(0, 0))
vs.
sheet.objects[name] = Point(0, 0)

and

getattr(sheet.objects, name)
vs.
sheet.objects[name]

Or do you really have ``sheet.objects.A`` in your code, in the hope that
an attribute named 'A' exists?
I wrote 'type' here by mistake, but I used 'isinstance' in my code. ;-)

Doesn't change the "code smell". OOP approach would be a method on the
geometric objects that know what to do instead of a type test to decide
what to do with each type of geometric object.

Ciao,
Marc 'BlackJack' Rintsch
 
J

John Machin


You wrote """
I used short lists (a list of 20 floats) and the element
checked was not in the list.
(That was the case I usually deals with in my code.)
"""
 
N

nicolas.pourcelot

You wrote """
I used short lists (a list of 20 floats) and the element
checked was not in the list.
(That was the case I usually deals with in my code.)
"""

:-D
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,186
Members
46,744
Latest member
CortneyMcK

Latest Threads

Top