filter()ing a dict

R

Robin Cull

Imagine I have a dict looking something like this:

myDict = {"key 1": ["value 1", "value 2", "value 3", "value 4"], "key
2": ["value 1", "value 2"], "key 3": ["value2", "value 3", "value 4"],
"key 4": ["value 1", "value 2", "value 3", "value 4"]}

That is, a set of keys which have a variable length list of associated
values after them. What I want to do is filter out a subset of this
dict to produce another dict that satisfies a set of criteria (in this
case whether it contains all four values) to end up with something
like this:

{"key 1": ["value 1", "value 2", "value 3", "value 4"], "key 4":
["value 1", "value 2", "value 3", "value 4"]}

I've tried using the filter() function so I wrote a small function
that returned True or False depending on whether the length of the
list was what I wanted e.g.

def f(item):
if len(item) == 4:
return True
else:
return False

I then called filter(f, myDict).

I was expecting to be returned a dict which was the subset of myDict
that satisfied f.

This did not happen; the item passed to f was each key of the dict,
not each value list. The call to filter returned an empty list (as
obviously none of the keys satisfied the criteria). If anything I
would have expected an empty dict back as according to the docs:

""filter(function, sequence)" returns a sequence (of the same
type, if possible) consisting of those items from the sequence for
which function(item) is true."

The important bit here is "of the same type, if possible" which I
understood to mean if passed a dict it would return a dict.

I've tried calling filter(f, myDict.values()) which works and I get a
list like this returned:

[["value 1", "value 2", "value 3", "value 4"], ["value 1", "value 2",
"value 3", "value 4"]]

Sort of what I want but I've lost the associativity with the key which
was kind of the point in using a dict in the first place.

I'm guessing that filter() does not work on a dict in the way I think
it should.

I can write something to get around the behaviour in this particular
case (although suggestions would be welcome).

My question is, does anybody else think that when calling filter(f,
dict), it should return a dict of the keys and values of the former
dict where the values satisfy the conditions defined by f()?

Are there reasons for it not to work this way? I must admit I've not
looked into this carefully, just in the context of my script it'd be
useful to.

What are the side effects of making filter() work this way?

Thanks all.

Regards,

Robin
 
J

John Roth

According to the doc, filter() processes sequences,
iterators and objects that support the iterator protocol.
A dict is none of those things, unless it supports the iterator
protocol, which is not mentioned in the documentation
as of 2.2.3. Possibly it does in 2.3.0.

If you want to use filter() to process a dict, use the .items()
method. That will get you a list of tuples, where each tuple
contains the key and value. .iteritems() will also work,
of course.

What is returned will be a list of tuples, which you can
use to build a dict.

I'm surprised that what you tried worked at all. The
2.2.3 documentation on dictionaries doesn't suggest that
a dictionary supports the iteration protocol, or what it
returns in that context.

John Roth

Robin Cull said:
Imagine I have a dict looking something like this:

myDict = {"key 1": ["value 1", "value 2", "value 3", "value 4"], "key
2": ["value 1", "value 2"], "key 3": ["value2", "value 3", "value 4"],
"key 4": ["value 1", "value 2", "value 3", "value 4"]}

That is, a set of keys which have a variable length list of associated
values after them. What I want to do is filter out a subset of this
dict to produce another dict that satisfies a set of criteria (in this
case whether it contains all four values) to end up with something
like this:

{"key 1": ["value 1", "value 2", "value 3", "value 4"], "key 4":
["value 1", "value 2", "value 3", "value 4"]}

I've tried using the filter() function so I wrote a small function
that returned True or False depending on whether the length of the
list was what I wanted e.g.

def f(item):
if len(item) == 4:
return True
else:
return False

I then called filter(f, myDict).

I was expecting to be returned a dict which was the subset of myDict
that satisfied f.

This did not happen; the item passed to f was each key of the dict,
not each value list. The call to filter returned an empty list (as
obviously none of the keys satisfied the criteria). If anything I
would have expected an empty dict back as according to the docs:

""filter(function, sequence)" returns a sequence (of the same
type, if possible) consisting of those items from the sequence for
which function(item) is true."

The important bit here is "of the same type, if possible" which I
understood to mean if passed a dict it would return a dict.

I've tried calling filter(f, myDict.values()) which works and I get a
list like this returned:

[["value 1", "value 2", "value 3", "value 4"], ["value 1", "value 2",
"value 3", "value 4"]]

Sort of what I want but I've lost the associativity with the key which
was kind of the point in using a dict in the first place.

I'm guessing that filter() does not work on a dict in the way I think
it should.

I can write something to get around the behaviour in this particular
case (although suggestions would be welcome).

My question is, does anybody else think that when calling filter(f,
dict), it should return a dict of the keys and values of the former
dict where the values satisfy the conditions defined by f()?

Are there reasons for it not to work this way? I must admit I've not
looked into this carefully, just in the context of my script it'd be
useful to.

What are the side effects of making filter() work this way?

Thanks all.

Regards,

Robin
 
S

Sean Ross

d = {"key 1": ["value 1", "value 2", "value 3", "value 4"],
"key 2": ["value 1", "value 2"],
"key 3": ["value2", "value 3", "value 4"],
"key 4": ["value 1", "value 2", "value 3", "value 4"]}

# using list comprehension
fd = dict([(k,v) for k,v in d.items() if len(v)==4])
print fd

# using filter
value = 1
fd = dict(filter(lambda item:len(item[value])==4, d.items()))
print fd
 
D

Duncan Booth

According to the doc, filter() processes sequences,
iterators and objects that support the iterator protocol.
A dict is none of those things, unless it supports the iterator
protocol, which is not mentioned in the documentation
as of 2.2.3. Possibly it does in 2.3.0.

It looks to me like there is a hole in the Python documentation. The
"what's new in Python 2.2" said:
Iterator support has been added to some of Python's basic types.
Calling iter() on a dictionary will return an iterator which loops
over its keys:

However, the library reference documentation for mapping types doesn't seem
to mention that dictionaries support the iterator protocol directly,
although it does mention the explicit ways to get iterators out of
dictionaries (such as iteritems()).
 
A

Alex Martelli

Robin said:
Imagine I have a dict looking something like this:

myDict = {"key 1": ["value 1", "value 2", "value 3", "value 4"], "key
2": ["value 1", "value 2"], "key 3": ["value2", "value 3", "value 4"],
"key 4": ["value 1", "value 2", "value 3", "value 4"]}

That is, a set of keys which have a variable length list of associated
values after them. What I want to do is filter out a subset of this
dict to produce another dict that satisfies a set of criteria (in this
case whether it contains all four values) to end up with something
like this:

{"key 1": ["value 1", "value 2", "value 3", "value 4"], "key 4":
["value 1", "value 2", "value 3", "value 4"]}

I've tried using the filter() function so I wrote a small function
that returned True or False depending on whether the length of the
list was what I wanted e.g.

def f(item):
if len(item) == 4:
return True
else:
return False

This function expects to be passed the value (not the item).
I then called filter(f, myDict).

While this passes the keys (not the items, but not the values either).
I was expecting to be returned a dict which was the subset of myDict
that satisfied f.

This did not happen; the item passed to f was each key of the dict,
not each value list. The call to filter returned an empty list (as
obviously none of the keys satisfied the criteria). If anything I
would have expected an empty dict back as according to the docs:

""filter(function, sequence)" returns a sequence (of the same
type, if possible) consisting of those items from the sequence for
which function(item) is true."

The important bit here is "of the same type, if possible" which I
understood to mean if passed a dict it would return a dict.

Please check the current docs at
http://www.python.org/doc/current/lib/built-in-funcs.html
and let us know if you consider them clearer now. Specifically,
the sentence:
"""
If <i>list</i> is a string or a tuple, the result also has that type;
otherwise it is always a list.
"""
should help (despite the unfortunate name chosen for the argument).

I've tried calling filter(f, myDict.values()) which works and I get a
list like this returned:

[["value 1", "value 2", "value 3", "value 4"], ["value 1", "value 2",
"value 3", "value 4"]]

Sort of what I want but I've lost the associativity with the key which
was kind of the point in using a dict in the first place.

Exactly. It's impossible to reconstruct the right dict from this one.
I'm guessing that filter() does not work on a dict in the way I think
it should.

Doesn't and never will.
I can write something to get around the behaviour in this particular
case (although suggestions would be welcome).

You need to filter on items, which are key-value pairs; therefore:

def f(item):
return len(item[1]) == 4

the value is the 1-th element of the 'item' pair (tuple) while the
key is the 0-th element. I've also simplified the logical structure,
since the == already returns True or False just as you wish, there
is no reason to keep an if/else (though it's just redundant, not
directly harmful).

Then, you need to provide to filter the items, explicitly:

filter(f, myDict.items())

this gives you a list of items (pairs of key-value). Finally you
can pass that to dict to obtain what you want:

dict(filter(f, myDict.items()))

You could equivalently use myDict.iteritems() [may be faster, but
I doubt that matters for a reasonably small dictionary anyway].

Note that filter, these days, is often shunned in favour of a
list-comprehension, which saves the need to write a function as
the predicate for the filtering; with a list comprehension you
might write it all on one line as

dict([ (key,myDict[key]) for key in myDict if len(myDict[key])==4 ])

or, if you prefer:

dict([ (key,value) for key,value in myDict.items() if len(value)==4 ])

and other equivalent ways yet.

My question is, does anybody else think that when calling filter(f,
dict), it should return a dict of the keys and values of the former
dict where the values satisfy the conditions defined by f()?

According to "Ugol's Law", on Usenet, if you ever ask whether you
are the only one who "X" (where "X" is "thinks this", "does that",
"likes the other", etc, etc), the answer is invariably "no".

However, the Ugolian certainty that there must be another person
who shares your opinion in this matter isn't all that relevant.
More interesting is the next question you ask:
Are there reasons for it not to work this way? I must admit I've not
looked into this carefully, just in the context of my script it'd be
useful to.

'filter' unfortunately carries around the complication (for backwards
compatibility) of returning a string when passed a string, a tuple
when passed a tuple, a list in all other cases. It would obviously
be much simpler if it always returned a list. You, on the other hand,
would want to make it even _more_ complicated -- and for reasons that
escape me, but perhaps connected at looking just at the context of
this one script, would want the filtering function to be passed just
the values, rather than (as happens now if you pass filter a dict)
just the keys. Since you CAN recover the value from the key (if you
have access to the dictionary, e.g. as a global variable) but not
vice-versa, I think it's undisputable that such a hypothetical change
(hypothetical because it would break backwards compatibility, and
thus it's mathematically certain that it will not happen) would be
worse than the current situation. Passing a (key,value) pair, i.e
a whole item, would on the other hand not be just as unambiguously
bad -- it would "just" be a complication whose cost (complicating
things) needs to be weighed against its advantage (making it a tad
handier to perform dictionary-filtering in certain cases) - again
hypothetically speaking, of course.

What are the side effects of making filter() work this way?

Presumably that filter then would need to know about one more
special case (besides the two unfortunate, historical ones of
strings and tuples) rather than just being able to iterate on
its second argument. The incompatibility of the filtering function
needing to be passed the whole item [ (key,value) pair ] is the
definite reason this will never happen; whether it should have
been so in the first place is therefore moot (I opine filter's
very existence is unjustified in today's Python, except for
backwards compatibility, but others do differ, of course).

Python 3.0's main thrust (when it finally happens) will be to
*SIMPLIFY* Python by removing a lot of currently existing
basically redundant ways to do things -- the BDFL has said so
at this year's Python conferences. Thus, I expect the filter
function to go away, rather than gaining bells and whistles.

If you can make substantial use cases for the "filtering a
dictionary" functionality, for very large dictionaries where
dict([ (k,d[k]) for k in d if len(d[k])==4 ]) has intolerable
overhead, then one addition that _might_ have some (small)
chance of making it into future releases might be to add to
the dict class a filter method (taking just the predicate as
its argument). I'm still rather dubious that this would
actually stand any real chance, but can't be certain of that.


Alex
 
A

Aahz

I'm surprised that what you tried worked at all. The 2.2.3
documentation on dictionaries doesn't suggest that a dictionary
supports the iteration protocol, or what it returns in that context.

Check the What's New doc on iterators.
 
J

John Hunter

Robin> My question is, does anybody else think that when calling
Robin> filter(f, dict), it should return a dict of the keys and
Robin> values of the former dict where the values satisfy the
Robin> conditions defined by f()?

The problem is not with filter. When you do

d = {'k1':1, 'k2':2}
for x in d: print x

It prints the keys. That is: by default, iterating over a dictionary
iterates over the keys. So when you filter a dict, it iterates over
the dictionary which means iterating over the keys. SO filter and
dict are behaving as advertised. As you noted, you can call
dictionary functions to iterate over the keys, values, or key, value
pairs.

It is easy to do what you want with list comprehensions

myDict = {"key 1": ["value 1", "value 2", "value 3", "value 4"],
"key 2": ["value 1", "value 2"],
"key 3": ["value2", "value 3", "value 4"],
"key 4": ["value 1", "value 2", "value 3", "value 4"]}


len4Dict = dict([ (k,v) for k,v in myDict.items() if len(v)==4])
print len4Dict


Cheers,
John Hunter
 
C

Carl Banks

Robin said:
My question is, does anybody else think that when calling filter(f,
dict), it should return a dict of the keys and values of the former
dict where the values satisfy the conditions defined by f()?


Why shouldn't filter apply to the keys? You want it to apply to
values, but don't you think someone else might want it for the keys?
Who's to say what it should be? Better to just leave it take care of
sequences.
 
R

Robin Cull

[email protected] (Robin Cull) wrote in message news: said:
I can write something to get around the behaviour in this particular
case (although suggestions would be welcome).
<SNIP>

Thanks all for your explanations and examples. I quite like the list
comprehension methods. I didn't comprehend (pun intended) what list
comprehensions were until I saw these examples so I've learnt
something new! :)

For my particular script, which is likely to be used by pythonistas
even less experienced than me, I decided to solve the problem this
way:

def filterDict(testFunction, dictToFilter):
newDict = {}
for k, v in dictToFilter.items():
if testFunction(v):
newDict[k] = v
return newDict

Seems to work for what I want it to do and since it's a standalone
defined function which mimics filter() should be nice and easy to
understand for people who have to look at it later.

I appreciate all your replies.

Regards,

Robin
 
S

Sean Ross

Robin Cull said:
(e-mail address removed) (Robin Cull) wrote in message
For my particular script, which is likely to be used by pythonistas
even less experienced than me, I decided to solve the problem this
way:

def filterDict(testFunction, dictToFilter):
newDict = {}
for k, v in dictToFilter.items():
if testFunction(v):
newDict[k] = v
return newDict

Seems to work for what I want it to do and since it's a standalone
defined function which mimics filter() should be nice and easy to
understand for people who have to look at it later.


I actually find the following version atleast as easy to understand as the
version above, but I perhaps I lack perspective.

def filterdict(predicate, mapping):
"predicate() must take one (key,value) tuple as argument and return a
boolean"
return dict(filter(predicate, mapping.iteritems()))

# useage:
fd = filterdict(lambda item: len(item[1])==4, d)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,083
Messages
2,570,591
Members
47,212
Latest member
RobynWiley

Latest Threads

Top