Py2.3: Feedback on Sets

C

Christos TZOTZIOY Georgiou

[replying only to those that I have something substantial to say]
* Is the support for sets of sets necessary for your work
and, if so, then is the implementation sufficiently
powerful?

I have used sets in:
- Unix sysadm tasks (comparing usernames between passwd and shadow,
finding common files in sync requests et al)
- a hangman game (when the computer guesses words, to continuously
restrict the possibilities based on the human input)
- an image recognition program (comparing haar coefficients)

These come to mind at the moment, but I have used them even in the
python command line; and mostly I care about intersections.
* Does the performance meet your expectations?

In the game and image recognition programs I could use more power;-)
* Are sets helpful in your daily work or does the need arise
only rarely?

I use them often, it's a very helpful construct.
User feedback is essential to determining the future direction
of sets (whether it will be implemented in C, change API,
and/or be given supporting language syntax).

Reimplementation in C sounds appropriate, and supporting language syntax
would be nice.


A quick thought, in the spirit of C implementation: there are cases
where I would like to get the intersection of dicts (based on the keys),
without having to create sets from the dict keys and then getting the
relevant values. That is, given dicts a and b, I'd like:

to mean
dict([x, a[x] for x in sets.Set(a) & sets.Set(b)]) # real

You may notice that a&b wouldn't be equivalent to b&a.
Perhaps the speed difference would not be much; I'll grow a function in
dictobject.c, run some benchmarks and come back with results for you.

Another thought: it is unfortunate that an intersection *has* to be
through continuous lookups (talking about the ordering of dict keys re
their hash values, I'll have to delve into dictobject.c it seems), even
taking into account the great speed of key lookups... although building
the result dict should account for more processing cycles than the
comparisons; and in some cases doing a dict.copy() and then removing the
uncommon elements would be faster. Hm, food for thought, and no more
than two hours to sleep now.

Another slogan: Python keeps your mind awake (and c.l.py keeps your body
away from bed :)
 
C

Christos TZOTZIOY Georgiou

A quick thought, in the spirit of C implementation: there are cases
where I would like to get the intersection of dicts (based on the keys),
without having to create sets from the dict keys and then getting the
relevant values. That is, given dicts a and b, I'd like:

to mean
dict([x, a[x] for x in sets.Set(a) & sets.Set(b)]) # real

You may notice that a&b wouldn't be equivalent to b&a.
Perhaps the speed difference would not be much; I'll grow a function in
dictobject.c, run some benchmarks and come back with results for you.

I implemented dict.intersect(), and it is *quite* faster than the
equivalent Python code.

**********************************************************************

Python 2.4a0 (#3, Aug 20 2003, 16:31:22)
[GCC 3.2 (Mandrake Linux 9.0 3.2-1mdk)] on linux2
Type "help", "copyright", "credits" or "license" for more information.Help on method_descriptor:

intersect(...)
D.intersect(E) -> a subset of D having common keys with E
{'a': 1, 'c': 5, 'b': 3, 'e': 9, 'd': 7, 'g': 13, 'f': 11, 'i': 17, 'h':
15, 'k': 21, 'j': 19, 'm': 25, 'l': 23, 'n': 27}
evens {'a': 2, 'd': 6, 'g': 10, 'f': 8, 'h': 12, 'j': 14, 's': 4}


dict([(k, odds[k]) for k in sets.Set(odds) & sets.Set(evens)]) {'a': 1, 'd': 7, 'g': 13, 'f': 11, 'h': 15, 'j': 19}
odds.intersect(evens) {'a': 1, 'h': 15, 'j': 19, 'd': 7, 'g': 13, 'f': 11}
dict([(k, evens[k]) for k in sets.Set(odds) & sets.Set(evens)]) {'a': 2, 'd': 6, 'g': 10, 'f': 8, 'h': 12, 'j': 14}
evens.intersect(odds) {'a': 2, 'h': 12, 'j': 14, 'd': 6, 'g': 10, 'f': 8}


my_setup= 'import sets; odds=dict(zip("abcdefghijklmn", range(1, 55, 2))); evens=dict(zip("asdfghj", range(2, 55, 2)))'
from timeit import Timer

Timer(stmt="odds.intersect(evens)", setup=my_setup).repeat() [1.3545670509338379, 1.3367550373077393, 1.3366960287094116]
Timer(stmt="evens.intersect(odds)", setup=my_setup).repeat() [1.321492075920105, 1.2869999408721924, 1.320341944694519]
Timer(stmt="dict([(k, odds[k]) for k in sets.Set(odds) & sets.Set(evens)])", setup=my_setup).repeat() [63.413245916366577, 63.526772975921631, 63.503224968910217]
Timer(stmt="dict([(k, evens[k]) for k in sets.Set(odds) & sets.Set(evens)])", setup=my_setup).repeat()
[63.498296976089478, 63.49311900138855, 63.425426959991455]

**********************************************************************

A substantial difference, over 50x on an Athlon XP 1700. Also note the
difference in the key order of the results.

I believe that dicts should grow such a method, perhaps with another
name.

Attached is the diff -u for dictobject.c compared to the one in last
night's python-latest.tgz
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,086
Messages
2,570,599
Members
47,222
Latest member
jspanther

Latest Threads

Top