Duplicate keys in dict?

V

vsoler

Hello,

My code snippet reads data from excel ranges. First row and first
column are column headers and row headers respectively. After reding
the range I build a dict.

.................'A'..............'B'
'ab'............3................5
'cd'............7................2
'cd'............9................1
'ac'............7................2

d={('ab','A'): 3, ('ab','B'): 5, ('cd','A'): 7, ...

However, as you can see there are two rows that start with 'cd', and
dicts, AFAIK do not accept duplicates.

What is the best workaround for this? Should I discard dicts? Should I
somehow have under 'cd'... a list of values?

One of the difficulties I find here is that I want to be able to
easily sum all the values for each row key: 'ab', 'cd' and 'ac'.
However, using lists inside dicts makes it a difficult issue for me.

What is the best approach for this problem? Can anybody help?
 
N

News123

vsoler said:
Hello,

My code snippet reads data from excel ranges. First row and first
column are column headers and row headers respectively. After reding
the range I build a dict.

................'A'..............'B'
'ab'............3................5
'cd'............7................2
'cd'............9................1
'ac'............7................2

d={('ab','A'): 3, ('ab','B'): 5, ('cd','A'): 7, ...

However, as you can see there are two rows that start with 'cd', and
dicts, AFAIK do not accept duplicates.

Normall dicts are used if you want to access your data at a later point
in time by the key name.

Do you want to be able to do this?


Then what would you expect to receive for d[('cd','A')] ?

The first value? the second value? both values?

Could you perhaps change further occurences of 'cd' with 'cd1' , 'cd2' ,
'cd3', ... ?

Not knowing your exact context makes it difficult to suggest solutions?

perhaps you could switch to a list containing a tuple of (rowname,rowdict)


l = [ ('ab', { 'A': 3 , 'B': 5 } ),
'cd', { 'A': 7 , 'B': 2 } ),
'cd', { 'A': 9 , 'B': 1 } ),
'ac', { ... }
]


bye

N
 
S

Steven D'Aprano

Hello,

My code snippet reads data from excel ranges. First row and first column
are column headers and row headers respectively. After reding the range
I build a dict.

................'A'..............'B'
'ab'............3................5
'cd'............7................2
'cd'............9................1
'ac'............7................2

d={('ab','A'): 3, ('ab','B'): 5, ('cd','A'): 7, ...

However, as you can see there are two rows that start with 'cd', and
dicts, AFAIK do not accept duplicates.
One of the difficulties I find here is that I want to be able to easily
sum all the values for each row key: 'ab', 'cd' and 'ac'. However,
using lists inside dicts makes it a difficult issue for me.

Given the sample above, what answer do you expect for summing the 'cd'
row? There are four reasonable answers:

7 + 2 = 9
9 + 1 = 10
7 + 2 + 9 + 1 = 19
Error

You need to decide what you want to do before asking how to do it.
 
V

vsoler

Given the sample above, what answer do you expect for summing the 'cd'
row? There are four reasonable answers:

7 + 2 = 9
9 + 1 = 10
7 + 2 + 9 + 1 = 19
Error

You need to decide what you want to do before asking how to do it.

Steven,

What I need is that sum(('cd','A')) gives me 16, sum(('cd','B')) gives
me 3.

I apologize for not having made it clear.
 
D

Dennis Lee Bieber

Hello,

My code snippet reads data from excel ranges. First row and first
column are column headers and row headers respectively. After reding
the range I build a dict.

................'A'..............'B'
'ab'............3................5
'cd'............7................2
'cd'............9................1
'ac'............7................2

d={('ab','A'): 3, ('ab','B'): 5, ('cd','A'): 7, ...

However, as you can see there are two rows that start with 'cd', and
dicts, AFAIK do not accept duplicates.
First, I would not key using ("ab", "A")... I'd key just off "ab"...
The A and B columns are just that -- column positions -- the data would
look more like

{ "ab" : (3, 5),
"cd" : (7, 2)... }

Now... for the duplicate key processing...

If you initialize each key with a list, you can then append the data
tuple...

data = {}
for key, cA, cB in sourcedata:
if key in data:
data[key].append( (cA, cB) )
else:
data[key] = [ (cA, cB) ]

This way, what you end up with

"cd" : [ (7, 2), (9, 1) ]
What is the best workaround for this? Should I discard dicts? Should I
somehow have under 'cd'... a list of values?

If you really are using Excel -- it might be better to find out how
to command Excel to compute your results... Basically it sounds like you
are creating a grouped sum on the columns, with groups defined by the
value of the first column.

Would be child's play with a regular SQL database -- something like
(this is off the top of my head so may have some syntax errors)

select key, sum(A), sum(B) from aTable
group by key
order by key


Or just use a list of tuples, sorting, and create a report writer
break handler...
'ab'............3................5
'cd'............7................2
'cd'............9................1
'ac'............7................2
data = [ ("ab", 3, 5),
("cd", 7, 2),
("cd", 9, 1),
("ac", 7, 2) ]

skey = None
sA = 0
sB = 0

data.sort() #ensures same keys are adjacent

for (k, a, b) in data:
if k != skey: #key changed, report group output
if skey:
print "%s\t%s\t%s" % (skey, sA, sB)
skey = k #initialize next group
sA = a
sB = b
else:
sA += a #sum to current group
sB += b
if skey: #need to output the last key in the data at end
print "%s\t%s\t%s" % (skey, sA, sB)
 
T

Tim Chase

vsoler said:
What I need is that sum(('cd','A')) gives me 16, sum(('cd','B')) gives
me 3.

But you really *do* want lists inside the dict if you want to be
able to call sum() on them. You want to map the tuple ('cd','A')
to the list [7,9] so you can sum the results. And if you plan to
sum the results, it's far easier to have one-element lists and
just sum them, instead of having to special case "if it's a list,
sum it, otherwise, return the value". So I'd use something like

import csv
f = file(INFILE, 'rb')
r = csv.reader(f, ...)
headers = r.next() # discard the headers
d = defaultdict(list)
for (label, a, b) in r:
d[(label, 'a')].append(int(a))
d[(label, 'b')].append(int(b))
# ...
for (label, col), value in d.iteritems():
print label, col, 'sum =', sum(value)

Alternatively, if you don't need to store the intermediate
values, and just want to store the sums, you can accrue them as
you go along:

d = defaultdict(int)
for (label, a, b) in r:
d[(label, 'a')] += int(a)
d[(label, 'b')] += int(b)
# ...
for (label, col), value in d.iteritems():
print label, col, 'sum =', value

Both are untested, but I'm pretty sure they're both viable,
modulo my sleep-deprived eyes.

-tkc
 
Y

Yinon Ehrlich

Hello,

My code snippet reads data from excel ranges. First row and first
column are column headers and row headers respectively. After reding
the range I build a dict.

What is the best approach for this problem? Can anybody help?

Have you tried xlread ? (http://www.python-excel.org/)
Best,
-- Yinon
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top