Duplicate keys in dict?

vsoler · Mar 7, 2010

Hello,

My code snippet reads data from excel ranges. First row and first
column are column headers and row headers respectively. After reding
the range I build a dict.

.................'A'..............'B'
'ab'............3................5
'cd'............7................2
'cd'............9................1
'ac'............7................2

d={('ab','A'): 3, ('ab','B'): 5, ('cd','A'): 7, ...

However, as you can see there are two rows that start with 'cd', and
dicts, AFAIK do not accept duplicates.

What is the best workaround for this? Should I discard dicts? Should I
somehow have under 'cd'... a list of values?

One of the difficulties I find here is that I want to be able to
easily sum all the values for each row key: 'ab', 'cd' and 'ac'.
However, using lists inside dicts makes it a difficult issue for me.

What is the best approach for this problem? Can anybody help?

News123 · Mar 7, 2010

vsoler said:
Hello,

My code snippet reads data from excel ranges. First row and first
column are column headers and row headers respectively. After reding
the range I build a dict.

................'A'..............'B'
'ab'............3................5
'cd'............7................2
'cd'............9................1
'ac'............7................2

d={('ab','A'): 3, ('ab','B'): 5, ('cd','A'): 7, ...

However, as you can see there are two rows that start with 'cd', and
dicts, AFAIK do not accept duplicates.

Normall dicts are used if you want to access your data at a later point
in time by the key name.

Do you want to be able to do this?

Then what would you expect to receive for d[('cd','A')] ?

The first value? the second value? both values?

Could you perhaps change further occurences of 'cd' with 'cd1' , 'cd2' ,
'cd3', ... ?

Not knowing your exact context makes it difficult to suggest solutions?

perhaps you could switch to a list containing a tuple of (rowname,rowdict)

l = [ ('ab', { 'A': 3 , 'B': 5 } ),
'cd', { 'A': 7 , 'B': 2 } ),
'cd', { 'A': 9 , 'B': 1 } ),
'ac', { ... }
]

bye

N

Steven D'Aprano · Mar 7, 2010

Hello,

My code snippet reads data from excel ranges. First row and first column
are column headers and row headers respectively. After reding the range
I build a dict.

................'A'..............'B'
'ab'............3................5
'cd'............7................2
'cd'............9................1
'ac'............7................2

d={('ab','A'): 3, ('ab','B'): 5, ('cd','A'): 7, ...

However, as you can see there are two rows that start with 'cd', and
dicts, AFAIK do not accept duplicates.

One of the difficulties I find here is that I want to be able to easily
sum all the values for each row key: 'ab', 'cd' and 'ac'. However,
using lists inside dicts makes it a difficult issue for me.

Given the sample above, what answer do you expect for summing the 'cd'
row? There are four reasonable answers:

7 + 2 = 9
9 + 1 = 10
7 + 2 + 9 + 1 = 19
Error

You need to decide what you want to do before asking how to do it.

vsoler · Mar 7, 2010

Given the sample above, what answer do you expect for summing the 'cd'
row? There are four reasonable answers:

7 + 2 = 9
9 + 1 = 10
7 + 2 + 9 + 1 = 19
Error

You need to decide what you want to do before asking how to do it.

Steven,

What I need is that sum(('cd','A')) gives me 16, sum(('cd','B')) gives
me 3.

I apologize for not having made it clear.

Dennis Lee Bieber · Mar 7, 2010

Hello,

My code snippet reads data from excel ranges. First row and first
column are column headers and row headers respectively. After reding
the range I build a dict.

................'A'..............'B'
'ab'............3................5
'cd'............7................2
'cd'............9................1
'ac'............7................2

d={('ab','A'): 3, ('ab','B'): 5, ('cd','A'): 7, ...

However, as you can see there are two rows that start with 'cd', and
dicts, AFAIK do not accept duplicates.

First, I would not key using ("ab", "A")... I'd key just off "ab"...
The A and B columns are just that -- column positions -- the data would
look more like

{ "ab" : (3, 5),
"cd" : (7, 2)... }

Now... for the duplicate key processing...

If you initialize each key with a list, you can then append the data
tuple...

data = {}
for key, cA, cB in sourcedata:
if key in data:
data[key].append( (cA, cB) )
else:
data[key] = [ (cA, cB) ]

This way, what you end up with

"cd" : [ (7, 2), (9, 1) ]

What is the best workaround for this? Should I discard dicts? Should I
somehow have under 'cd'... a list of values?

If you really are using Excel -- it might be better to find out how
to command Excel to compute your results... Basically it sounds like you
are creating a grouped sum on the columns, with groups defined by the
value of the first column.

Would be child's play with a regular SQL database -- something like
(this is off the top of my head so may have some syntax errors)

select key, sum(A), sum(B) from aTable
group by key
order by key

Or just use a list of tuples, sorting, and create a report writer
break handler...

'ab'............3................5
'cd'............7................2
'cd'............9................1
'ac'............7................2

data = [ ("ab", 3, 5),
("cd", 7, 2),
("cd", 9, 1),
("ac", 7, 2) ]

skey = None
sA = 0
sB = 0

data.sort() #ensures same keys are adjacent

for (k, a, b) in data:
if k != skey: #key changed, report group output
if skey:
print "%s\t%s\t%s" % (skey, sA, sB)
skey = k #initialize next group
sA = a
sB = b
else:
sA += a #sum to current group
sB += b
if skey: #need to output the last key in the data at end
print "%s\t%s\t%s" % (skey, sA, sB)

Tim Chase · Mar 7, 2010

vsoler said:
What I need is that sum(('cd','A')) gives me 16, sum(('cd','B')) gives
me 3.

But you really *do* want lists inside the dict if you want to be
able to call sum() on them. You want to map the tuple ('cd','A')
to the list [7,9] so you can sum the results. And if you plan to
sum the results, it's far easier to have one-element lists and
just sum them, instead of having to special case "if it's a list,
sum it, otherwise, return the value". So I'd use something like

import csv
f = file(INFILE, 'rb')
r = csv.reader(f, ...)
headers = r.next() # discard the headers
d = defaultdict(list)
for (label, a, b) in r:
d[(label, 'a')].append(int(a))
d[(label, 'b')].append(int(b))
# ...
for (label, col), value in d.iteritems():
print label, col, 'sum =', sum(value)

Alternatively, if you don't need to store the intermediate
values, and just want to store the sums, you can accrue them as
you go along:

d = defaultdict(int)
for (label, a, b) in r:
d[(label, 'a')] += int(a)
d[(label, 'b')] += int(b)
# ...
for (label, col), value in d.iteritems():
print label, col, 'sum =', value

Both are untested, but I'm pretty sure they're both viable,
modulo my sleep-deprived eyes.

-tkc

Yinon Ehrlich · Mar 9, 2010

Hello,

My code snippet reads data from excel ranges. First row and first
column are column headers and row headers respectively. After reding
the range I build a dict.

What is the best approach for this problem? Can anybody help?

Have you tried xlread ? (http://www.python-excel.org/)
Best,
-- Yinon

How to multiply two matrices of size in using inline assembly in C++	3	Mar 3, 2024
C program: memory leak/ segmentation fault/ memory limit exceeded	0	Nov 12, 2022
Concatenating dictionary values and keys, and further operations	11	Jun 5, 2006
python memory leak only in x86 64 linux	0	Apr 16, 2008
How find all childrens values of a nested dictionary, fast!	8	Nov 4, 2010
What is the Most Efficient Way of Printing A Dict's Contents Out In Columns?	6	Jun 14, 2011
Adding 'download' column to existing 'visitors' table (as requested)	18	Nov 6, 2013
efficient way to process data	0	Jan 12, 2014

Duplicate keys in dict?

vsoler

News123

Steven D'Aprano

vsoler

Dennis Lee Bieber

Tim Chase

Yinon Ehrlich

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads