Why defaultdict?

Steven D'Aprano · Jul 2, 2010

I would like to better understand some of the design choices made in
collections.defaultdict.

Firstly, to initialise a defaultdict, you do this:

from collections import defaultdict
d = defaultdict(callable, *args)

which sets an attribute of d "default_factory" which is called on key
lookups when the key is missing. If callable is None, defaultdicts are
*exactly* equivalent to built-in dicts, so I wonder why the API wasn't
added on to dict rather than a separate class that needed to be imported.
That is:

d = dict(*args)
d.default_factory = callable

If you failed to explicitly set the dict's default_factory, it would
behave precisely as dicts do now. So why create a new class that needs to
be imported, rather than just add the functionality to dict?

Is it just an aesthetic choice to support passing the factory function as
the first argument? I would think that the advantage of having it built-
in would far outweigh the cost of an explicit attribute assignment.

Second, why is the factory function not called with key? There are three
obvious kinds of "default values" a dict might want, in order of more-to-
less general:

(1) The default value depends on the key in some way: return factory(key)
(2) The default value doesn't depend on the key: return factory()
(3) The default value is a constant: return C

defaultdict supports (2) and (3):

defaultdict(factory, *args)
defaultdict(lambda: C, *args)

but it doesn't support (1). If key were passed to the factory function,
it would be easy to support all three use-cases, at the cost of a
slightly more complex factory function. E.g. the current idiom:

defaultdict(factory, *args)

would become:

defaultdict(lambda key: factory(), *args)

(There is a zeroth case as well, where the default value depends on the
key and what else is in the dict: factory(d, key). But I suspect that's
well and truly YAGNI territory.)

Thanks in advance,

Chris Rebert · Jul 2, 2010

I would like to better understand some of the design choices made in
collections.defaultdict.

Perhaps python-dev should've been CC-ed...

Firstly, to initialise a defaultdict, you do this:

from collections import defaultdict
d = defaultdict(callable, *args)

which sets an attribute of d "default_factory" which is called on key
lookups when the key is missing. If callable is None, defaultdicts are
*exactly* equivalent to built-in dicts, so I wonder why the API wasn't
added on to dict rather than a separate class that needed to be imported.
That is:

d = dict(*args)
d.default_factory = callable

If you failed to explicitly set the dict's default_factory, it would
behave precisely as dicts do now. So why create a new class that needs to
be imported, rather than just add the functionality to dict?

Don't know personally, but here's one thought: If it was done that
way, passing around a dict could result in it getting a
default_factory set where there wasn't one before, which could lead to
strange results if you weren't anticipating that. The defaultdict
solution avoids this.

Second, why is the factory function not called with key?

Agree, I've never understood this. Ruby's Hash::new does it better
(http://ruby-doc.org/core/classes/Hash.html), and even supports your
case 0; it calls the equivalent of default_factory(d, key) when
generating a default value.

There are three
obvious kinds of "default values" a dict might want, in order of more-to-
less general:

(1) The default value depends on the key in some way: return factory(key)
(2) The default value doesn't depend on the key: return factory()
(3) The default value is a constant: return C

defaultdict supports (2) and (3):

defaultdict(factory, *args)
defaultdict(lambda: C, *args)

but it doesn't support (1). If key were passed to the factory function,
it would be easy to support all three use-cases, at the cost of a
slightly more complex factory function.

(There is a zeroth case as well, where the default value depends on the
key and what else is in the dict: factory(d, key). But I suspect that's
well and truly YAGNI territory.)

Cheers,
Chris

Raymond Hettinger · Jul 2, 2010

I would like to better understand some of the design choices made in
collections.defaultdict. . . .
If callable is None, defaultdicts are
*exactly* equivalent to built-in dicts, so I wonder why the API wasn't
added on to dict rather than a separate class that needed to be imported. . . .
Second, why is the factory function not called with key? There are three
obvious kinds of "default values" a dict might want, in order of more-to-
less general:

(1) The default value depends on the key in some way: return factory(key)
(2) The default value doesn't depend on the key: return factory()
(3) The default value is a constant: return C

The __missing__() magic method lets you provide a factory with a key.
That method is supported by dict subclasses, making it easy to
create almost any desired behavior. A defaultdict is an example.
It is a dict subclass that calls a zero argument factory function.
But with __missing__() can roll your own dict subclass to meet your
other needs. A defaultdict was provided to meet one commonly
requested set of use cases (mostly ones using int() and list()
as factory functions).

From the docs at http://docs.python.org/library/stdtypes.html#mapping-types-dict
:

'''New in version 2.5: If a subclass of dict defines a method
__missing__(), if the key key is not present, the d[key] operation
calls that method with the key key as argument. The d[key] operation
then returns or raises whatever is returned or raised by the
__missing__(key) call if the key is not present. No other operations
or methods invoke __missing__(). If __missing__() is not defined,
KeyError is raised. __missing__() must be a method; it cannot be an
instance variable. For an example, see collections.defaultdict.'''

Raymond

Thomas Jollans · Jul 2, 2010

I would like to better understand some of the design choices made in
collections.defaultdict.

Firstly, to initialise a defaultdict, you do this:

from collections import defaultdict
d = defaultdict(callable, *args)

which sets an attribute of d "default_factory" which is called on key
lookups when the key is missing. If callable is None, defaultdicts are
*exactly* equivalent to built-in dicts, so I wonder why the API wasn't
added on to dict rather than a separate class that needed to be imported.
That is:

d = dict(*args)
d.default_factory = callable

That's just not what dicts, a very simple and elementary data type, do.
I know this isn't really a good reason. In addition to what Chris said,
I expect this would complicate the dict code a great deal.

If you failed to explicitly set the dict's default_factory, it would
behave precisely as dicts do now. So why create a new class that needs to
be imported, rather than just add the functionality to dict?

Is it just an aesthetic choice to support passing the factory function as
the first argument? I would think that the advantage of having it built-
in would far outweigh the cost of an explicit attribute assignment.

The cost of this feature would be over-complication of the built-in dict
type when a subclass would do just as well

Second, why is the factory function not called with key? There are three
obvious kinds of "default values" a dict might want, in order of more-to-
less general:

(1) The default value depends on the key in some way: return factory(key)

I agree, this is a strange choice. However, nothing's stopping you from
being a bit verbose about what you want and just doing it:

class mydict(defaultdict):
def __missing__(self, key):
# ...

the __missing__ method is really the more useful bit the defaultdict
class adds, by the looks of it.

-- Thomas

Chris Rebert · Jul 2, 2010

I agree, this is a strange choice. However, nothing's stopping you from
being a bit verbose about what you want and just doing it:

class mydict(defaultdict):
Â Â def __missing__(self, key):
Â Â Â Â # ...

the __missing__ method is really the more useful bit the defaultdict
class adds, by the looks of it.

Nitpick: You only need to subclass dict, not defaultdict, to use
__missing__(). See the part of the docs Raymond Hettinger quoted.

Cheers,
Chris

Thomas Jollans · Jul 2, 2010

Nitpick: You only need to subclass dict, not defaultdict, to use
__missing__(). See the part of the docs Raymond Hettinger quoted.

Sorry Raymond, I didn't see you.

This is where I cancel my "filter out google groups users" experiment.

Steven D'Aprano · Jul 3, 2010

I would like to better understand some of the design choices made in
collections.defaultdict.

[...]

Thanks to all who replied.

pprint defaultdict one record per line	0	Mar 17, 2013
Question about defaultdict	0	Feb 23, 2013
defaultdict of arbitrary depth	6	Aug 17, 2007
Insert missing keys using defaultdict	2	Mar 11, 2010
Fast constant functions for Py2.5's defaultdict()	3	Feb 13, 2007
Question about collections.defaultdict	0	Mar 26, 2012
default behavior	27	Jul 29, 2010
Defaultdict and speed	3	Nov 3, 2006

Why defaultdict?

Steven D'Aprano

Chris Rebert

Raymond Hettinger

Thomas Jollans

Chris Rebert

Thomas Jollans

Steven D'Aprano

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads