Why defaultdict?

  • Thread starter Steven D'Aprano
  • Start date
S

Steven D'Aprano

I would like to better understand some of the design choices made in
collections.defaultdict.

Firstly, to initialise a defaultdict, you do this:

from collections import defaultdict
d = defaultdict(callable, *args)

which sets an attribute of d "default_factory" which is called on key
lookups when the key is missing. If callable is None, defaultdicts are
*exactly* equivalent to built-in dicts, so I wonder why the API wasn't
added on to dict rather than a separate class that needed to be imported.
That is:

d = dict(*args)
d.default_factory = callable

If you failed to explicitly set the dict's default_factory, it would
behave precisely as dicts do now. So why create a new class that needs to
be imported, rather than just add the functionality to dict?

Is it just an aesthetic choice to support passing the factory function as
the first argument? I would think that the advantage of having it built-
in would far outweigh the cost of an explicit attribute assignment.



Second, why is the factory function not called with key? There are three
obvious kinds of "default values" a dict might want, in order of more-to-
less general:

(1) The default value depends on the key in some way: return factory(key)
(2) The default value doesn't depend on the key: return factory()
(3) The default value is a constant: return C

defaultdict supports (2) and (3):

defaultdict(factory, *args)
defaultdict(lambda: C, *args)

but it doesn't support (1). If key were passed to the factory function,
it would be easy to support all three use-cases, at the cost of a
slightly more complex factory function. E.g. the current idiom:

defaultdict(factory, *args)

would become:

defaultdict(lambda key: factory(), *args)


(There is a zeroth case as well, where the default value depends on the
key and what else is in the dict: factory(d, key). But I suspect that's
well and truly YAGNI territory.)

Thanks in advance,
 
C

Chris Rebert

I would like to better understand some of the design choices made in
collections.defaultdict.

Perhaps python-dev should've been CC-ed...
Firstly, to initialise a defaultdict, you do this:

from collections import defaultdict
d = defaultdict(callable, *args)

which sets an attribute of d "default_factory" which is called on key
lookups when the key is missing. If callable is None, defaultdicts are
*exactly* equivalent to built-in dicts, so I wonder why the API wasn't
added on to dict rather than a separate class that needed to be imported.
That is:

d = dict(*args)
d.default_factory = callable

If you failed to explicitly set the dict's default_factory, it would
behave precisely as dicts do now. So why create a new class that needs to
be imported, rather than just add the functionality to dict?

Don't know personally, but here's one thought: If it was done that
way, passing around a dict could result in it getting a
default_factory set where there wasn't one before, which could lead to
strange results if you weren't anticipating that. The defaultdict
solution avoids this.

Second, why is the factory function not called with key?

Agree, I've never understood this. Ruby's Hash::new does it better
(http://ruby-doc.org/core/classes/Hash.html), and even supports your
case 0; it calls the equivalent of default_factory(d, key) when
generating a default value.
There are three
obvious kinds of "default values" a dict might want, in order of more-to-
less general:

(1) The default value depends on the key in some way: return factory(key)
(2) The default value doesn't depend on the key: return factory()
(3) The default value is a constant: return C

defaultdict supports (2) and (3):

defaultdict(factory, *args)
defaultdict(lambda: C, *args)

but it doesn't support (1). If key were passed to the factory function,
it would be easy to support all three use-cases, at the cost of a
slightly more complex factory function.
(There is a zeroth case as well, where the default value depends on the
key and what else is in the dict: factory(d, key). But I suspect that's
well and truly YAGNI territory.)

Cheers,
Chris
 
R

Raymond Hettinger

I would like to better understand some of the design choices made in
collections.defaultdict. . . .
If callable is None, defaultdicts are
*exactly* equivalent to built-in dicts, so I wonder why the API wasn't
added on to dict rather than a separate class that needed to be imported. . . .
Second, why is the factory function not called with key? There are three
obvious kinds of "default values" a dict might want, in order of more-to-
less general:

(1) The default value depends on the key in some way: return factory(key)
(2) The default value doesn't depend on the key: return factory()
(3) The default value is a constant: return C

The __missing__() magic method lets you provide a factory with a key.
That method is supported by dict subclasses, making it easy to
create almost any desired behavior. A defaultdict is an example.
It is a dict subclass that calls a zero argument factory function.
But with __missing__() can roll your own dict subclass to meet your
other needs. A defaultdict was provided to meet one commonly
requested set of use cases (mostly ones using int() and list()
as factory functions).

From the docs at http://docs.python.org/library/stdtypes.html#mapping-types-dict
:

'''New in version 2.5: If a subclass of dict defines a method
__missing__(), if the key key is not present, the d[key] operation
calls that method with the key key as argument. The d[key] operation
then returns or raises whatever is returned or raised by the
__missing__(key) call if the key is not present. No other operations
or methods invoke __missing__(). If __missing__() is not defined,
KeyError is raised. __missing__() must be a method; it cannot be an
instance variable. For an example, see collections.defaultdict.'''

Raymond
 
T

Thomas Jollans

I would like to better understand some of the design choices made in
collections.defaultdict.

Firstly, to initialise a defaultdict, you do this:

from collections import defaultdict
d = defaultdict(callable, *args)

which sets an attribute of d "default_factory" which is called on key
lookups when the key is missing. If callable is None, defaultdicts are
*exactly* equivalent to built-in dicts, so I wonder why the API wasn't
added on to dict rather than a separate class that needed to be imported.
That is:

d = dict(*args)
d.default_factory = callable

That's just not what dicts, a very simple and elementary data type, do.
I know this isn't really a good reason. In addition to what Chris said,
I expect this would complicate the dict code a great deal.
If you failed to explicitly set the dict's default_factory, it would
behave precisely as dicts do now. So why create a new class that needs to
be imported, rather than just add the functionality to dict?

Is it just an aesthetic choice to support passing the factory function as
the first argument? I would think that the advantage of having it built-
in would far outweigh the cost of an explicit attribute assignment.

The cost of this feature would be over-complication of the built-in dict
type when a subclass would do just as well
Second, why is the factory function not called with key? There are three
obvious kinds of "default values" a dict might want, in order of more-to-
less general:

(1) The default value depends on the key in some way: return factory(key)

I agree, this is a strange choice. However, nothing's stopping you from
being a bit verbose about what you want and just doing it:

class mydict(defaultdict):
def __missing__(self, key):
# ...

the __missing__ method is really the more useful bit the defaultdict
class adds, by the looks of it.

-- Thomas
 
C

Chris Rebert

I agree, this is a strange choice. However, nothing's stopping you from
being a bit verbose about what you want and just doing it:

class mydict(defaultdict):
   def __missing__(self, key):
       # ...

the __missing__ method is really the more useful bit the defaultdict
class adds, by the looks of it.

Nitpick: You only need to subclass dict, not defaultdict, to use
__missing__(). See the part of the docs Raymond Hettinger quoted.

Cheers,
Chris
 
T

Thomas Jollans

Nitpick: You only need to subclass dict, not defaultdict, to use
__missing__(). See the part of the docs Raymond Hettinger quoted.

Sorry Raymond, I didn't see you.

This is where I cancel my "filter out google groups users" experiment.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,185
Members
46,736
Latest member
AdolphBig6

Latest Threads

Top