Grouping items by a key?

M

Michael Fogleman

I feel like Python ought to have a built-in to do this. Take a list of items and turn them into a dictionary mapping keys to a list of items with that key in common.

It's easy enough to do:

# using defaultdict
lookup = collections.defaultdict(list)
for item in items:
lookup[key(item)].append(item)

# or, using plain dict
lookup = {}
for item in items:
lookup.setdefault(key(item), []).append(item)

But this is frequent enough of a use case that a built-in function would be nice. I could implement it myself, as such:

def grouped(iterable, key):
result = {}
for item in iterable:
result.setdefault(key(item), []).append(item)
return result

lookup = grouped(items, key)

This is different than `itertools.groupby` in a few important ways. To get the same result from `groupby`, you'd have to do this, which is a little ugly:

lookup = dict((k, list(v)) for k, v in groupby(sorted(items, key=key), key))

Some examples:

{0: [0, 2, 4, 6, 8], 1: [1, 3, 5, 7, 9]}
{8: ['overflow'], 3: ['how', 'are', 'you'], 5: ['hello', 'stack']}

Is there a better way?
 
S

Steven D'Aprano

I feel like Python ought to have a built-in to do this. Take a list of
items and turn them into a dictionary mapping keys to a list of items
with that key in common.

It's easy enough to do:

# using defaultdict
lookup = collections.defaultdict(list)
for item in items:
lookup[key(item)].append(item)

# or, using plain dict
lookup = {}
for item in items:
lookup.setdefault(key(item), []).append(item)

That's pretty much the reason setdefault was invented. So, in a sense,
there is a built-in for this.

But this is frequent enough of a use case that a built-in function would
be nice.

I'm not so sure I agree it's a frequent use-case. I don't think I've ever
needed to do it, or if I did, it was so rare and so long ago that I've
forgotten it.


I could implement it myself, as such:

def grouped(iterable, key):
result = {}
for item in iterable:
result.setdefault(key(item), []).append(item)
return result

lookup = grouped(items, key)

This is different than `itertools.groupby` in a few important ways.

Why do you care about itertools.groupby? That does something completely
different. It groups items that occur in *contiguous* groups, e.g.

[1, 2, 3, 2, 2, 2, 3, 3, 4, 5, 5, 2, 2, 5]

will be grouped into three separate groups of two:

[1], [2], [3], [2, 2, 2], [3, 3], [4], [5, 5], [2, 2], [5]

This is a feature of groupby. If you want to accumulate items regardless
of where they occur, e.g. for the above:

[1], [2, 2, 2, 2, 2, 2], [3, 3, 3], [4], [5, 5, 5]

then there's no need to use groupby.

Some examples:

{0: [0, 2, 4, 6, 8], 1: [1, 3, 5, 7, 9]}
{8: ['overflow'], 3: ['how', 'are', 'you'], 5: ['hello', 'stack']}

Is there a better way?

Looks perfectly fine to me. It's a five line helper function, it's
readable and simple and clear. The only improvements I would make would
be to give it a doc string describing what it does and showing some
examples:

def grouped(items, key):
"""Return a dict with items accumulated by key.
{0: [0, 2, 4, 6, 8], 1: [1, 3, 5, 7, 9]}
{8: ['overflow'], 3: ['how', 'are', 'you'], 5: ['hello', 'stack']}

"""
result = {}
for item in iterable:
result.setdefault(key(item), []).append(item)
return result



Now you have a nice, descriptive help string for when you call
help(grouped).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,736
Latest member
zacharyharris

Latest Threads

Top