Pre-PEP: Dictionary accumulator methods

Raymond Hettinger · Mar 19, 2005

[Jeff Epler]

Maybe something for sets like 'appendlist' ('unionset'?)

While this could work and potentially be useful, I think it is better to keep
the proposal focused on the two common use cases. Adding a third would reduce
the chance of acceptance.

Also, in all of my code base, I've not run across a single opportunity to use
something like unionset(). This is surprising because I'm the set() author and
frequently use set based algorithms. Your example was a good one and I can
also image a graph represented as a dictionary of sets. Still, I don't mind
writing out the plain Python for this one if it only comes up once in a blue
moon.

Raymond

Raymond Hettinger · Mar 19, 2005

[Dan Sommers]

Curious that in this lengthy discussion, a method name of "accumulate"
never came up. I'm not sure how to separate the two cases (accumulating
scalars vs. accumulating a list), though.

Separating the two cases is essential. Also, the wording should contain strong
cues that remind you of addition and of building a list.

For the first, how about addup():

d = {}
for word in text.split():
d.addup(word)

Raymond

Denis S. Otkidach · Mar 19, 2005

On 18 Mar 2005 21:03:52 -0800 Michele Simionato wrote:

MS> +1 for inc instead of count.
MS> appendlist seems a bit too specific (I do not use dictionaries of
MS> lists that often).

inc is too specific too.

MS> The problem with setdefault is the name, not the functionality.

The problem with functionality: d.setdefault(k, v) can't be used as
lvalue. If it could, we wouldn't need count/inc/add/tally method.

MS> get_or_set would be a better name: we could use it as an alias for
MS> setdefault and then remove setdefault in Python 3000.

What about d.get(k, setdefault=v) alternative? Not sure whether it's
good idea to overload get() method, just an idea.

Ivan Van Laningham · Mar 19, 2005

Hi All--

Raymond said:
Separating the two cases is essential. Also, the wording should contain strong
cues that remind you of addition and of building a list.

For the first, how about addup():

d = {}
for word in text.split():
d.addup(word)

I still prefer tally(), despite perceived political connotations.
They're only connotations, after all, and tally() comprises both
positive and negative incrementing, whereas add() and addup() will tease
users into thinking they are only for incrementing.

What about adding another method, "setincrement()"?

d={}
d.setincrement(-1)
for word in text.split():
d.tally(word,1)
if word.lower() in ["a","an","the"]:
d.tally(word)

Not that there's any real utility in that.

Metta,
Ivan
----------------------------------------------
Ivan Van Laningham
God N Locomotive Works
http://www.pauahtun.org/
http://www.andi-holmes.com/
Army Signal Corps: Cu Chi, Class of '70
Author: Teach Yourself Python in 24 Hours

El Pitonero · Mar 19, 2005

Dan said:
Curious that in this lengthy discussion, a method name of "accumulate"
never came up. I'm not sure how to separate the two cases (accumulating
scalars vs. accumulating a list), though.

Is it even necessary to use a method name?

import copy
class safedict(dict):
def __init__(self, default=None):
self.default = default
def __getitem__(self, key):
try:
return dict.__getitem__(self, key)
except KeyError:
return copy.copy(self.default)

x = safedict(0)
x[3] += 1
y = safedict([])
y[5] += range(3)
print x, y
print x[123], y[234]

Raymond Hettinger · Mar 19, 2005

[Ivan Van Laningham]

What about adding another method, "setincrement()"?

. . .

Not that there's any real utility in that.

That was a short lived suggestion ;-)

Also, it would entail storing an extra value in the dictionary header. That
alone would be a killer.

Raymond

Paul McGuire · Mar 19, 2005

-1 on set increment.
I think this makes your intent much clearer:

..d={}
..for word in text.split():
.. d.tally(word)
.. if word.lower() in ["a","an","the"]:
.. d.tally(word,-1)

or perhaps simplest:

..d={}
..for word in text.split():
.. if word.lower() not in ["a","an","the"]:
.. d.tally(word)

Personally, I'm +1 for tally(), and possibly tallyList() and tallySet()
to complete the thought for the cumulative container cases. I think
there is something to be gained if these methods get named in some
similar manner.

For those dead set against tally() and its ilk, how about accum(),
accumList() and accumSet()?

-- Paul

Aahz · Mar 19, 2005

The proposed names could possibly be improved (perhaps tally() is more active
and clear than count()).

+1 tally()
--
Aahz ([email protected]) <*> http://www.pythoncraft.com/

"The joy of coding Python should be in seeing short, concise, readable
classes that express a lot of action in a small amount of clear code --
not in reams of trivial code that bores the reader to death." --GvR

El Pitonero · Mar 19, 2005

Raymond said:
Separating the two cases is essential. Also, the wording should contain strong
cues that remind you of addition and of building a list.

For the first, how about addup():

d = {}
for word in text.split():
d.addup(word)

import copy
class safedict(dict):
def __init__(self, default=None):
self.default = default
def __getitem__(self, key):
if not self.has_key(key):
self[key] = copy.copy(self.default)
return dict.__getitem__(self, key)

text = 'a b c b a'
words = text.split()
counts = safedict(0)
positions = safedict([])
for i, word in enumerate(words):
counts[word] += 1
positions[word].append(i)

print counts, positions

Aahz · Mar 19, 2005

How about countkey() or tabulate()?

Those rank roughly equal to tally() for me, with a slight edge to these
two for clarity and a slight edge to tally() for conciseness.
--
Aahz ([email protected]) <*> http://www.pythoncraft.com/

"The joy of coding Python should be in seeing short, concise, readable
classes that express a lot of action in a small amount of clear code --
not in reams of trivial code that bores the reader to death." --GvR

Raymond Hettinger · Mar 19, 2005

[El Pitonero]

Is it even necessary to use a method name?

import copy
class safedict(dict):
def __init__(self, default=None):
self.default = default
def __getitem__(self, key):
try:
return dict.__getitem__(self, key)
except KeyError:
return copy.copy(self.default)

x = safedict(0)
x[3] += 1
y = safedict([])
y[5] += range(3)
print x, y
print x[123], y[234]

safedict() and variants have been previously proposed with the name defaultdict
or some such.

For the most part, adding methods is much less disruptive than introducing a new
type.

As written out above, the += syntax works fine but does not work with append().

As written, the copy.copy() approach is dog slow but can be optimized for lists
and ints while retaining its type flexibility.

BTW, there is no need to make the same post three times.

Raymond

Do Re Mi chel La Si Do · Mar 19, 2005

Hi

if key not in d:
d[key] = {subkey:value}
else:
d[key][subkey] = value

and

d[(key,subkey)] = value

?

Michel Claveau

Brian van den Broek · Mar 19, 2005

Kent Johnson said unto the world upon 2005-03-19 07:19:

Brian said:
Brian said:

Raymond Hettinger said unto the world upon 2005-03-18 20:24:

I would like to get everyone's thoughts on two new dictionary methods:

def appendlist(self, key, *values):
try:
self[key].extend(values)
except KeyError:
self[key] = list(values)

Click to expand...

For appendlist, I would have expected

def appendlist(self, key, sequence):
try:
self[key].extend(sequence)
except KeyError:
self[key] = list(sequence)

Click to expand...

The original proposal reads better at the point of call when values is a
single item. In my experience this will be the typical usage:
d.appendlist(key, 'some value')

as opposed to your proposal which has to be written
d.appendlist(key, ['some value'])

The original allows values to be a sequence using
d.appendlist(key, *value_list)

Kent

Right. I did try the alternatives out and get the issue you point to.

But:

1) In my own code, cases where I'd use the proposed appendlist method
are typically cases where I'd want to add multiple items that have
already been collected in a sequence. But, since I've little coding
under my belt, I concede that considerations drawn from my experience
are not terribly probative.

2) Much more important, IMHO, is that the method name `appendlist'
really does suggest it's a list that will be appended. Hence my stated
expectation. While it would make the method name longer, given the
original interface Raymond posted, I would find appendtolist more
transparent.

out-of-my-depth-ly y'rs,

Brian vdB

George Sakkis · Mar 19, 2005

Raymond Hettinger said:
[Jeff Epler]

Maybe something for sets like 'appendlist' ('unionset'?)

Click to expand...

While this could work and potentially be useful, I think it is better to keep
the proposal focused on the two common use cases. Adding a third would reduce
the chance of acceptance.

Also, in all of my code base, I've not run across a single opportunity to use
something like unionset(). This is surprising because I'm the set() author and
frequently use set based algorithms. Your example was a good one and I can
also image a graph represented as a dictionary of sets. Still, I don't mind
writing out the plain Python for this one if it only comes up once in a blue
moon.

Good example. I actually have a directed graph and multigraph module that uses dictionary of sets
internally. It turns out I've used setdefault 8 times in this module alone !

George

Roose · Mar 19, 2005

Ah OK, I stand corrected. Whoops. I just read the web page and thought the
wrong thing, that makes sense.

Think about it. A key= function is quite a different thing. It provides a
*temporary* comparison key while retaining the original value. IOW, your
re-write is incorrect:

L = ['the', 'quick', 'brownish', 'toad']
max(L, key=len) 'brownish'
max(len(x) for x in L)

Click to expand...

Click to expand...

8

Remain calm. Keep the faith. Guido's design works fine.

No important use cases were left unserved by any() and all().

Raymond Hettinger

El Pitonero · Mar 19, 2005

Raymond said:
As written out above, the += syntax works fine but does not work with append().
...
BTW, there is no need to make the same post three times.

The append() syntax works, if you use the other definition of safedict
(*). There are more than one way of defining safedict, see the subtle
differences between the two versions of safedict, and you'll be glad
more than one version has been posted. At any rate, what has been
presented is a general idea, nitpicking details is kind of out of
place. Programmers know how to modify a general receipe to suit their
actual needs, right?

(*) In some cases, people do not want to create a dictionary entry when
an inquiry is done on a missing item. In some case, they do. A general
receipe cannot cater to the needs of everybody.

George Sakkis · Mar 19, 2005

Aahz said:
+1 tally()

-1 for count(): Implies an accessor, not a mutator.
-1 for tally(): Unfriendly to non-native english speakers.
+0.5 for add, increment. If incrementing a negative is unacceptable, how about
update/updateby/updateBy ?
+1 for accumulate. I don't think that separating the two cases -- adding to a scalar or appending to
a list -- is that essential; a self-respecting program should make this obvious by the name of the
parameter anyway ("dictionary.accumulate('hello', words)" vs "a.accumulate('hello', b)").

George

El Pitonero · Mar 19, 2005

George said:
-1 for count(): Implies an accessor, not a mutator.
-1 for tally(): Unfriendly to non-native english speakers.
+0.5 for add, increment. If incrementing a negative is unacceptable, how about
update/updateby/updateBy ?
+1 for accumulate. I don't think that separating the two cases --

adding to a scalar or appending to

a list -- is that essential; a self-respecting program should make

this obvious by the name of the

parameter anyway ("dictionary.accumulate('hello', words)" vs

"a.accumulate('hello', b)").

What about no name at all for the scalar case:

a['hello'] += 1
a['bye'] -= 2

and append() (or augmented assignment) for the list case:

a['hello'].append(word)
a['bye'] += [word]

?

Dan Sommers · Mar 19, 2005

[Dan Sommers]

Curious that in this lengthy discussion, a method name of
"accumulate" never came up. I'm not sure how to separate the two
cases (accumulating scalars vs. accumulating a list), though.

Click to expand...

Separating the two cases is essential. Also, the wording should
contain strong cues that remind you of addition and of building a
list.

Agreed, with a slight hedge towards accumulation or tabulation rather
than addition. I don't think "summation" gets us anywhere, either.

Are the use cases for qty != 1 for weighted averages (that's the only
one I can think of off the top of my head)? Is something like this:

def accumulate( self, key, *values ):
if values == ( ):
values = 1
try:
self[ key ] += values
except KeyError:
if type( key ) == int:
self[ key ] = 1
else
self[ key ] = *values

possible? It's more "klunky" than I thought it would be before I
started typing it out.

Then we'd have these two use cases:

histogram = { }
for word in text.split( ):
histogram.accumulate( word )

and

org_chart = { }
for employee in employees:
org_chart.accumulate( employee.manager, employee.name )

Regards,
Dan

Kay Schluehr · Mar 19, 2005

Raymond said:
I would like to get everyone's thoughts on two new dictionary methods:

def count(self, value, qty=1):
try:
self[key] += qty
except KeyError:
self[key] = qty

def appendlist(self, key, *values):
try:
self[key].extend(values)
except KeyError:
self[key] = list(values)

-1 form me.

I'm not very glad with both of them ( not a naming issue ) because i
think that the dict type should offer only methods that apply to each
dict whatever it contains. count() specializes to dict values that are
addable and appendlist to those that are extendable. Why not
subtractable, dividable or right-shiftable? Because of majority
approval? I'm mot a speed fetishist and destroying the clarity of a
very fundamental data structure for speedup rather arbitrary
accumulations seems to be a bad idea. I would move this stuff in a
subclass.

Regards Kay

pre-PEP generic objects	44	Nov 29, 2004
Optimized bytecode in exec statement	0	Feb 1, 2010
finding out the call (and not only the caller)	3	Oct 7, 2007
pre-PEP: Standard Microthreading Pattern	4	May 1, 2007
PEP 372 -- Adding an ordered directory to collections	28	Jun 16, 2008
Dictionary that uses regular expressions	3	Aug 21, 2003
Python profiler	0	Oct 3, 2005
Proposed PEP: Treating Builtins as Constants in the Standard Library	8	Apr 18, 2004

Pre-PEP: Dictionary accumulator methods

Raymond Hettinger

Raymond Hettinger

Denis S. Otkidach

Ivan Van Laningham

El Pitonero

Raymond Hettinger

Paul McGuire

Aahz

El Pitonero

Aahz

Raymond Hettinger

Do Re Mi chel La Si Do

Brian van den Broek

George Sakkis

Roose

El Pitonero

George Sakkis

El Pitonero

Dan Sommers

Kay Schluehr

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads