Dictionaries and loops

M

Mike P

Hi All
i have a CSV file that i'm reading in and each line has the look of
the below

{None: ['User-ID', 'Count']}
{None: ['576460847178667334', '1']}
{None: ['576460847178632334', '8']}

i want to make a dictionary of items in the form
{576460847178667334:1, 576460847178632334:8, ..... } for all rows in
the datafile

my code so far is thus:

dict1={}
j=1
for row in reader1:
if j==1:
j+=1
continue #thus allowing me to skip the first row
if j>1:
for element in row.values():
for item in element:
if int(item)%2==0:
dict1[int(item)] = int(item)+1
# i know this is the problem line as it's not picking the second item
up just finding the first and increasing it, but i can't figure out
how to correct this?
j+=1

I get one dictionary from this but not the correct data inside, can
anyone help?
 
B

Bruno Desthuilliers

Mike P a écrit :
Hi All
i have a CSV file that i'm reading in and each line has the look of
the below

{None: ['User-ID', 'Count']}
{None: ['576460847178667334', '1']}
{None: ['576460847178632334', '8']}

This doesn't look like a CSV file at all... Is that what you actually
have in the file, or what you get from the csv.reader ???
i want to make a dictionary of items in the form
{576460847178667334:1, 576460847178632334:8, ..... } for all rows in
the datafile

my code so far is thus:

dict1={}
j=1
for row in reader1:
if j==1:
j+=1
continue #thus allowing me to skip the first row
if j>1:

Drop this, and call reader1.next() before entering the loop.
for element in row.values():
for item in element:
if int(item)%2==0:
dict1[int(item)] = int(item)+1

You're repeating the same operation (building an int from a string)
three time, where one would be enough:

for item in element:
item = int(item)
if item %2 == 0: # or : if not item % 2:
dict1[item] = item + 1

But this code is not going to yield the expected result...
# i know this is the problem line as it's not picking the second item
up just finding the first and increasing it, but i can't figure out
how to correct this?

Mmm... What about learning Python instead of trying any random code ?
Programming by accident won't take you very far, and you can't expect
this neswgroup to do your own work.


Ok, assuming your CSV file looks like this - and you never have
duplicate values for the User-id column:

# source.csv
"User-ID", "Count"
576460847178667334, 1
576460847178632334, 8'

Here's a possible solution:

result = {}
src = open("source.csv", "rb")
try:
reader = csv.reader(src)
reader.next()
for row in reader:
user_id, count = int(row[0]), int(row[1])
result[user_id] = count
finally:
src.close()

or more tersely:

src = open("source.csv", "rb")
try:
reader = csv.reader(src)
reader.next()
result = dict(map(int, row) for row in reader)
finally:
src.close()
 
B

Bruno Desthuilliers

Mike P a écrit :
Hi All
i have a CSV file that i'm reading in and each line has the look of
the below

{None: ['User-ID', 'Count']}
{None: ['576460847178667334', '1']}
{None: ['576460847178632334', '8']}

This doesn't look like a CSV file at all... Is that what you actually
have in the file, or what you get from the csv.reader ???
i want to make a dictionary of items in the form
{576460847178667334:1, 576460847178632334:8, ..... } for all rows in
the datafile

my code so far is thus:

dict1={}
j=1
for row in reader1:
if j==1:
j+=1
continue #thus allowing me to skip the first row
if j>1:

Drop this, and call reader1.next() before entering the loop.
for element in row.values():
for item in element:
if int(item)%2==0:
dict1[int(item)] = int(item)+1

You're repeating the same operation (building an int from a string)
three time, where one would be enough:

for item in element:
item = int(item)
if item %2 == 0: # or : if not item % 2:
dict1[item] = item + 1

But this code is not going to yield the expected result...
# i know this is the problem line as it's not picking the second item
up just finding the first and increasing it, but i can't figure out
how to correct this?

Mmm... What about learning Python instead of trying any random code ?
Programming by accident won't take you very far, and you can't expect
this neswgroup to do your own work.


Ok, assuming your CSV file looks like this - and you never have
duplicate values for the User-id column:

# source.csv
"User-ID", "Count"
576460847178667334, 1
576460847178632334, 8'

Here's a possible solution:

result = {}
src = open("source.csv", "rb")
try:
reader = csv.reader(src)
reader.next()
for row in reader:
user_id, count = int(row[0]), int(row[1])
result[user_id] = count
finally:
src.close()

or more tersely:

src = open("source.csv", "rb")
try:
reader = csv.reader(src)
reader.next()
result = dict(map(int, row) for row in reader)
finally:
src.close()
 
B

bearophileHUGS

Few solutions, not much tested:

data = """{None: ['User-ID', 'Count']}
{None: ['576460847178667334', '1']}
{None: ['576460847178632334', '8']}"""

lines = iter(data.splitlines())
lines.next()

identity_table = "".join(map(chr, xrange(256)))
result = {}
for line in lines:
parts = line.translate(identity_table, "'[]{},").split()
key, val = map(int, parts[1:])
assert key not in result
result[key] = val
print result

(With Python 3 finally that identity_table can be replaced by None)

# --------------------------------------

import re

patt = re.compile(r"(\d+).+?(\d+)")

lines = iter(data.splitlines())
lines.next()

result = {}
for line in lines:
key, val = map(int, patt.search(line).groups())
assert key not in result
result[key] = val
print result

# --------------------------------------

from itertools import groupby

lines = iter(data.splitlines())
lines.next()

result = {}
for line in lines:
key, val = (int("".join(g)) for h,g in groupby(line,
key=str.isdigit) if h)
assert key not in result
result[key] = val
print result

Bye,
bearophile
 
B

bearophileHUGS

Bruno Desthuilliers:
This doesn't look like a CSV file at all... Is that what you actually
have in the file, or what you get from the csv.reader ???

I presume you are right, the file probably doesn't contain that stuff
like I have assumed in my silly/useless solutions :)

Bye,
bearophile
 
B

Bruno Desthuilliers

(e-mail address removed) a écrit :
Bruno Desthuilliers:

I presume you are right, the file probably doesn't contain that stuff
like I have assumed in my silly/useless solutions :)

Yeps. I suspect the OP found a very creative way to misuse
csv.DictReader, but I couldn't figure out how he managed to get such a mess.
 
M

Mike P

Thanks for the solution above,

The raw data looked like
User-ID,COUNTS
576460840144207854,6
576460821700280307,2
576460783848259584,1
576460809027715074,3
576460825909089607,1
576460817407934470,1

and i used

CSV_INPUT1 = "C:/Example work/Attr_model/Activity_test.csv"
fin1 = open(CSV_INPUT1, "rb")
reader1 = csv.DictReader((fin1), [], delimiter=",")
for row in reader1:
print row

with the following outcome.
{None: ['User-ID', 'COUNTS']}
{None: ['576460840144207854', '6']}
{None: ['576460821700280307', '2']}
{None: ['576460783848259584', '1']}
{None: ['576460809027715074', '3']}
{None: ['576460825909089607', '1']}

So i can see csv.reader is what i should have been using

Thanks for the help
 
B

Bruno Desthuilliers

Mike P a écrit :
Thanks for the solution above,

The raw data looked like
User-ID,COUNTS
576460840144207854,6
576460821700280307,2
576460783848259584,1
576460809027715074,3
576460825909089607,1
576460817407934470,1

and i used

CSV_INPUT1 = "C:/Example work/Attr_model/Activity_test.csv"
fin1 = open(CSV_INPUT1, "rb")
reader1 = csv.DictReader((fin1), [], delimiter=",")

This should have been:
reader1 = csv.DictReader(fin1, delimiter=",")

or even just csv.DictReader(fin1), since IIRC ',' is the default
delimiter (I'll let you check this by yourself...).

with which you would have:
[
{'User-ID':'576460840144207854', 'count':'6'},
{'User-ID':'576460821700280307', 'count':'2'},
# etc...
]
with the following outcome.
{None: ['User-ID', 'COUNTS']}
{None: ['576460840144207854', '6']}
{None: ['576460821700280307', '2']}
{None: ['576460783848259584', '1']}
{None: ['576460809027715074', '3']}
{None: ['576460825909089607', '1']}

And you didn't noticed anything strange ???
So i can see csv.reader is what i should have been using

With only 2 values, DictReader is probably a bit overkill, yes.
Thanks for the help

You're welcome. But do yourself a favour: take time to *learn* Python -
at least the very basic (no pun) stuff like iterating over a sequence.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,818
Latest member
Brigette36

Latest Threads

Top