Py2.3: Feedback on Sets

Christos TZOTZIOY Georgiou · Aug 20, 2003

[replying only to those that I have something substantial to say]

* Is the support for sets of sets necessary for your work
and, if so, then is the implementation sufficiently
powerful?

I have used sets in:
- Unix sysadm tasks (comparing usernames between passwd and shadow,
finding common files in sync requests et al)
- a hangman game (when the computer guesses words, to continuously
restrict the possibilities based on the human input)
- an image recognition program (comparing haar coefficients)

These come to mind at the moment, but I have used them even in the
python command line; and mostly I care about intersections.

* Does the performance meet your expectations?

In the game and image recognition programs I could use more power;-)

* Are sets helpful in your daily work or does the need arise
only rarely?

I use them often, it's a very helpful construct.

User feedback is essential to determining the future direction
of sets (whether it will be implemented in C, change API,
and/or be given supporting language syntax).

Reimplementation in C sounds appropriate, and supporting language syntax
would be nice.

A quick thought, in the spirit of C implementation: there are cases
where I would like to get the intersection of dicts (based on the keys),
without having to create sets from the dict keys and then getting the
relevant values. That is, given dicts a and b, I'd like:

to mean

dict([x, a[x] for x in sets.Set(a) & sets.Set(b)]) # real

Click to expand...

Click to expand...

You may notice that a&b wouldn't be equivalent to b&a.
Perhaps the speed difference would not be much; I'll grow a function in
dictobject.c, run some benchmarks and come back with results for you.

Another thought: it is unfortunate that an intersection *has* to be
through continuous lookups (talking about the ordering of dict keys re
their hash values, I'll have to delve into dictobject.c it seems), even
taking into account the great speed of key lookups... although building
the result dict should account for more processing cycles than the
comparisons; and in some cases doing a dict.copy() and then removing the
uncommon elements would be faster. Hm, food for thought, and no more
than two hours to sleep now.

Another slogan: Python keeps your mind awake (and c.l.py keeps your body
away from bed

Christos TZOTZIOY Georgiou · Aug 20, 2003

A quick thought, in the spirit of C implementation: there are cases
where I would like to get the intersection of dicts (based on the keys),
without having to create sets from the dict keys and then getting the
relevant values. That is, given dicts a and b, I'd like:

to mean

dict([x, a[x] for x in sets.Set(a) & sets.Set(b)]) # real

Click to expand...

Click to expand...

You may notice that a&b wouldn't be equivalent to b&a.
Perhaps the speed difference would not be much; I'll grow a function in
dictobject.c, run some benchmarks and come back with results for you.

I implemented dict.intersect(), and it is *quite* faster than the
equivalent Python code.

**********************************************************************

Python 2.4a0 (#3, Aug 20 2003, 16:31:22)
[GCC 3.2 (Mandrake Linux 9.0 3.2-1mdk)] on linux2
Type "help", "copyright", "credits" or "license" for more information.Help on method_descriptor:

intersect(...)
D.intersect(E) -> a subset of D having common keys with E
{'a': 1, 'c': 5, 'b': 3, 'e': 9, 'd': 7, 'g': 13, 'f': 11, 'i': 17, 'h':
15, 'k': 21, 'j': 19, 'm': 25, 'l': 23, 'n': 27}

evens {'a': 2, 'd': 6, 'g': 10, 'f': 8, 'h': 12, 'j': 14, 's': 4}

dict([(k, odds[k]) for k in sets.Set(odds) & sets.Set(evens)]) {'a': 1, 'd': 7, 'g': 13, 'f': 11, 'h': 15, 'j': 19}
odds.intersect(evens) {'a': 1, 'h': 15, 'j': 19, 'd': 7, 'g': 13, 'f': 11}
dict([(k, evens[k]) for k in sets.Set(odds) & sets.Set(evens)]) {'a': 2, 'd': 6, 'g': 10, 'f': 8, 'h': 12, 'j': 14}
evens.intersect(odds) {'a': 2, 'h': 12, 'j': 14, 'd': 6, 'g': 10, 'f': 8}

my_setup= 'import sets; odds=dict(zip("abcdefghijklmn", range(1, 55, 2))); evens=dict(zip("asdfghj", range(2, 55, 2)))'
from timeit import Timer

Timer(stmt="odds.intersect(evens)", setup=my_setup).repeat() [1.3545670509338379, 1.3367550373077393, 1.3366960287094116]
Timer(stmt="evens.intersect(odds)", setup=my_setup).repeat() [1.321492075920105, 1.2869999408721924, 1.320341944694519]
Timer(stmt="dict([(k, odds[k]) for k in sets.Set(odds) & sets.Set(evens)])", setup=my_setup).repeat() [63.413245916366577, 63.526772975921631, 63.503224968910217]
Timer(stmt="dict([(k, evens[k]) for k in sets.Set(odds) & sets.Set(evens)])", setup=my_setup).repeat()

Click to expand...

Click to expand...

[63.498296976089478, 63.49311900138855, 63.425426959991455]

**********************************************************************

A substantial difference, over 50x on an Athlon XP 1700. Also note the
difference in the key order of the results.

I believe that dicts should grow such a method, perhaps with another
name.

Attached is the diff -u for dictobject.c compared to the one in last
night's python-latest.tgz

Feedback on Sets, and Partitions	7	Apr 30, 2004
Request for feedback on API design	4	Dec 9, 2010
More user feedback on Sets.py	11	Nov 7, 2003
Feedback on a design decision?	15	Oct 5, 2006
Request for Feedback; a module making it easier to use regular expressions.	1	Jan 31, 2005
suggestions on intelligent processing of data sets in a file	2	May 9, 2007
ANN: eGenix mxODBC Connect 2.1.0 - Python ODBC Database Interface	0	May 28, 2014
set, dict and other structures	8	Jan 31, 2005

Py2.3: Feedback on Sets

Christos TZOTZIOY Georgiou

Christos TZOTZIOY Georgiou

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads