Pre-PEP: Dictionary accumulator methods

R

Raymond Hettinger

[Jeff Epler]
Maybe something for sets like 'appendlist' ('unionset'?)

While this could work and potentially be useful, I think it is better to keep
the proposal focused on the two common use cases. Adding a third would reduce
the chance of acceptance.

Also, in all of my code base, I've not run across a single opportunity to use
something like unionset(). This is surprising because I'm the set() author and
frequently use set based algorithms. Your example was a good one and I can
also image a graph represented as a dictionary of sets. Still, I don't mind
writing out the plain Python for this one if it only comes up once in a blue
moon.


Raymond
 
R

Raymond Hettinger

[Dan Sommers]
Curious that in this lengthy discussion, a method name of "accumulate"
never came up. I'm not sure how to separate the two cases (accumulating
scalars vs. accumulating a list), though.

Separating the two cases is essential. Also, the wording should contain strong
cues that remind you of addition and of building a list.

For the first, how about addup():

d = {}
for word in text.split():
d.addup(word)


Raymond
 
D

Denis S. Otkidach

On 18 Mar 2005 21:03:52 -0800 Michele Simionato wrote:

MS> +1 for inc instead of count.
MS> appendlist seems a bit too specific (I do not use dictionaries of
MS> lists that often).

inc is too specific too.

MS> The problem with setdefault is the name, not the functionality.

The problem with functionality: d.setdefault(k, v) can't be used as
lvalue. If it could, we wouldn't need count/inc/add/tally method.

MS> get_or_set would be a better name: we could use it as an alias for
MS> setdefault and then remove setdefault in Python 3000.

What about d.get(k, setdefault=v) alternative? Not sure whether it's
good idea to overload get() method, just an idea.
 
I

Ivan Van Laningham

Hi All--

Raymond said:
Separating the two cases is essential. Also, the wording should contain strong
cues that remind you of addition and of building a list.

For the first, how about addup():

d = {}
for word in text.split():
d.addup(word)

I still prefer tally(), despite perceived political connotations.
They're only connotations, after all, and tally() comprises both
positive and negative incrementing, whereas add() and addup() will tease
users into thinking they are only for incrementing.

What about adding another method, "setincrement()"?

d={}
d.setincrement(-1)
for word in text.split():
d.tally(word,1)
if word.lower() in ["a","an","the"]:
d.tally(word)

Not that there's any real utility in that.

Metta,
Ivan
----------------------------------------------
Ivan Van Laningham
God N Locomotive Works
http://www.pauahtun.org/
http://www.andi-holmes.com/
Army Signal Corps: Cu Chi, Class of '70
Author: Teach Yourself Python in 24 Hours
 
E

El Pitonero

Dan said:
Curious that in this lengthy discussion, a method name of "accumulate"
never came up. I'm not sure how to separate the two cases (accumulating
scalars vs. accumulating a list), though.

Is it even necessary to use a method name?

import copy
class safedict(dict):
def __init__(self, default=None):
self.default = default
def __getitem__(self, key):
try:
return dict.__getitem__(self, key)
except KeyError:
return copy.copy(self.default)


x = safedict(0)
x[3] += 1
y = safedict([])
y[5] += range(3)
print x, y
print x[123], y[234]
 
R

Raymond Hettinger

[Ivan Van Laningham]
What about adding another method, "setincrement()"?
. . .
Not that there's any real utility in that.

That was a short lived suggestion ;-)

Also, it would entail storing an extra value in the dictionary header. That
alone would be a killer.


Raymond
 
P

Paul McGuire

-1 on set increment.
I think this makes your intent much clearer:

..d={}
..for word in text.split():
.. d.tally(word)
.. if word.lower() in ["a","an","the"]:
.. d.tally(word,-1)

or perhaps simplest:

..d={}
..for word in text.split():
.. if word.lower() not in ["a","an","the"]:
.. d.tally(word)

Personally, I'm +1 for tally(), and possibly tallyList() and tallySet()
to complete the thought for the cumulative container cases. I think
there is something to be gained if these methods get named in some
similar manner.

For those dead set against tally() and its ilk, how about accum(),
accumList() and accumSet()?

-- Paul
 
E

El Pitonero

Raymond said:
Separating the two cases is essential. Also, the wording should contain strong
cues that remind you of addition and of building a list.

For the first, how about addup():

d = {}
for word in text.split():
d.addup(word)

import copy
class safedict(dict):
def __init__(self, default=None):
self.default = default
def __getitem__(self, key):
if not self.has_key(key):
self[key] = copy.copy(self.default)
return dict.__getitem__(self, key)

text = 'a b c b a'
words = text.split()
counts = safedict(0)
positions = safedict([])
for i, word in enumerate(words):
counts[word] += 1
positions[word].append(i)

print counts, positions
 
A

Aahz

How about countkey() or tabulate()?

Those rank roughly equal to tally() for me, with a slight edge to these
two for clarity and a slight edge to tally() for conciseness.
--
Aahz ([email protected]) <*> http://www.pythoncraft.com/

"The joy of coding Python should be in seeing short, concise, readable
classes that express a lot of action in a small amount of clear code --
not in reams of trivial code that bores the reader to death." --GvR
 
R

Raymond Hettinger

[El Pitonero]
Is it even necessary to use a method name?

import copy
class safedict(dict):
def __init__(self, default=None):
self.default = default
def __getitem__(self, key):
try:
return dict.__getitem__(self, key)
except KeyError:
return copy.copy(self.default)


x = safedict(0)
x[3] += 1
y = safedict([])
y[5] += range(3)
print x, y
print x[123], y[234]



safedict() and variants have been previously proposed with the name defaultdict
or some such.

For the most part, adding methods is much less disruptive than introducing a new
type.

As written out above, the += syntax works fine but does not work with append().

As written, the copy.copy() approach is dog slow but can be optimized for lists
and ints while retaining its type flexibility.

BTW, there is no need to make the same post three times.


Raymond
 
D

Do Re Mi chel La Si Do

Hi


if key not in d:
d[key] = {subkey:value}
else:
d[key][subkey] = value


and

d[(key,subkey)] = value

?


Michel Claveau
 
B

Brian van den Broek

Kent Johnson said unto the world upon 2005-03-19 07:19:
Brian said:
Raymond Hettinger said unto the world upon 2005-03-18 20:24:
I would like to get everyone's thoughts on two new dictionary methods:

def appendlist(self, key, *values):
try:
self[key].extend(values)
except KeyError:
self[key] = list(values)
For appendlist, I would have expected

def appendlist(self, key, sequence):
try:
self[key].extend(sequence)
except KeyError:
self[key] = list(sequence)


The original proposal reads better at the point of call when values is a
single item. In my experience this will be the typical usage:
d.appendlist(key, 'some value')

as opposed to your proposal which has to be written
d.appendlist(key, ['some value'])

The original allows values to be a sequence using
d.appendlist(key, *value_list)

Kent

Right. I did try the alternatives out and get the issue you point to.

But:

1) In my own code, cases where I'd use the proposed appendlist method
are typically cases where I'd want to add multiple items that have
already been collected in a sequence. But, since I've little coding
under my belt, I concede that considerations drawn from my experience
are not terribly probative. :)

2) Much more important, IMHO, is that the method name `appendlist'
really does suggest it's a list that will be appended. Hence my stated
expectation. While it would make the method name longer, given the
original interface Raymond posted, I would find appendtolist more
transparent.

out-of-my-depth-ly y'rs,

Brian vdB
 
G

George Sakkis

Raymond Hettinger said:
[Jeff Epler]
Maybe something for sets like 'appendlist' ('unionset'?)

While this could work and potentially be useful, I think it is better to keep
the proposal focused on the two common use cases. Adding a third would reduce
the chance of acceptance.

Also, in all of my code base, I've not run across a single opportunity to use
something like unionset(). This is surprising because I'm the set() author and
frequently use set based algorithms. Your example was a good one and I can
also image a graph represented as a dictionary of sets. Still, I don't mind
writing out the plain Python for this one if it only comes up once in a blue
moon.

Good example. I actually have a directed graph and multigraph module that uses dictionary of sets
internally. It turns out I've used setdefault 8 times in this module alone !

George
 
R

Roose

Ah OK, I stand corrected. Whoops. I just read the web page and thought the
wrong thing, that makes sense.
Think about it. A key= function is quite a different thing. It provides a
*temporary* comparison key while retaining the original value. IOW, your
re-write is incorrect:
L = ['the', 'quick', 'brownish', 'toad']
max(L, key=len) 'brownish'
max(len(x) for x in L)
8


Remain calm. Keep the faith. Guido's design works fine.

No important use cases were left unserved by any() and all().



Raymond Hettinger
 
E

El Pitonero

Raymond said:
As written out above, the += syntax works fine but does not work with append().
...
BTW, there is no need to make the same post three times.

The append() syntax works, if you use the other definition of safedict
(*). There are more than one way of defining safedict, see the subtle
differences between the two versions of safedict, and you'll be glad
more than one version has been posted. At any rate, what has been
presented is a general idea, nitpicking details is kind of out of
place. Programmers know how to modify a general receipe to suit their
actual needs, right?

(*) In some cases, people do not want to create a dictionary entry when
an inquiry is done on a missing item. In some case, they do. A general
receipe cannot cater to the needs of everybody.
 
G

George Sakkis

Aahz said:
+1 tally()

-1 for count(): Implies an accessor, not a mutator.
-1 for tally(): Unfriendly to non-native english speakers.
+0.5 for add, increment. If incrementing a negative is unacceptable, how about
update/updateby/updateBy ?
+1 for accumulate. I don't think that separating the two cases -- adding to a scalar or appending to
a list -- is that essential; a self-respecting program should make this obvious by the name of the
parameter anyway ("dictionary.accumulate('hello', words)" vs "a.accumulate('hello', b)").

George
 
E

El Pitonero

George said:
-1 for count(): Implies an accessor, not a mutator.
-1 for tally(): Unfriendly to non-native english speakers.
+0.5 for add, increment. If incrementing a negative is unacceptable, how about
update/updateby/updateBy ?
+1 for accumulate. I don't think that separating the two cases --
adding to a scalar or appending to
a list -- is that essential; a self-respecting program should make
this obvious by the name of the
parameter anyway ("dictionary.accumulate('hello', words)" vs
"a.accumulate('hello', b)").

What about no name at all for the scalar case:

a['hello'] += 1
a['bye'] -= 2

and append() (or augmented assignment) for the list case:

a['hello'].append(word)
a['bye'] += [word]

?
 
D

Dan Sommers

[Dan Sommers]
Curious that in this lengthy discussion, a method name of
"accumulate" never came up. I'm not sure how to separate the two
cases (accumulating scalars vs. accumulating a list), though.
Separating the two cases is essential. Also, the wording should
contain strong cues that remind you of addition and of building a
list.

Agreed, with a slight hedge towards accumulation or tabulation rather
than addition. I don't think "summation" gets us anywhere, either.

Are the use cases for qty != 1 for weighted averages (that's the only
one I can think of off the top of my head)? Is something like this:

def accumulate( self, key, *values ):
if values == ( ):
values = 1
try:
self[ key ] += values
except KeyError:
if type( key ) == int:
self[ key ] = 1
else
self[ key ] = *values

possible? It's more "klunky" than I thought it would be before I
started typing it out.

Then we'd have these two use cases:

histogram = { }
for word in text.split( ):
histogram.accumulate( word )

and

org_chart = { }
for employee in employees:
org_chart.accumulate( employee.manager, employee.name )

Regards,
Dan
 
K

Kay Schluehr

Raymond said:
I would like to get everyone's thoughts on two new dictionary methods:

def count(self, value, qty=1):
try:
self[key] += qty
except KeyError:
self[key] = qty

def appendlist(self, key, *values):
try:
self[key].extend(values)
except KeyError:
self[key] = list(values)

-1 form me.

I'm not very glad with both of them ( not a naming issue ) because i
think that the dict type should offer only methods that apply to each
dict whatever it contains. count() specializes to dict values that are
addable and appendlist to those that are extendable. Why not
subtractable, dividable or right-shiftable? Because of majority
approval? I'm mot a speed fetishist and destroying the clarity of a
very fundamental data structure for speedup rather arbitrary
accumulations seems to be a bad idea. I would move this stuff in a
subclass.

Regards Kay
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,234
Messages
2,571,178
Members
47,811
Latest member
Adisty

Latest Threads

Top