[NEWB] Dictionary instantiation?

M

Matt_D

Hello there, this is my first post to the list. Only been working with
Python for a few days. Basically a complete newbie to programming.

I'm working with csv module as an exercise to parse out a spreadsheet
I use for work.(I am an editor for a military journalism unit) Not
trying to do anything useful, just trying to manipulate the data.
Anyway, here's the code I've got so far:

import csv
import string
import os

#Open the appropriate .csv file
csv_file = csv.reader(open("D:\\Python25\\BNSR.csv"))

#Create blank dictionary to hold {[author]:[no. of stories]} data
story_per_author = {}

def author_to_dict(): #Function to add each author to the dictionary
once to get initial entry for that author
for row in csv_file:
author_count = row[-1]
story_per_author[author_count] = 1

#Fetch author names
def rem_blank_authors(): #Function to remove entries with '' in the
AUTHOR field of the .csv
csv_list = list(csv_file) #Convert the open file to list format
for e-z mode editing
for row in csv_list:
author_name = row[-1]
if author_name == '': #Find entries where no author is listed
csv_list.remove(row) #Remove those entries from the list

def assign_author_to_title(): #Assign an author to every title
author_of_title = {}
for row in csv_file:
title = row[3]
author = row[-1]
author_of_title[title] = author


assign_author_to_title()
print author_of_title

--

Ok, the last two lines are kind of my "test the last function" test.
Now when I run these two lines I get the error:

Traceback (most recent call last):
File "D:\Python25\Lib\SITE-P~1\PYTHON~1\pywin\framework
\scriptutils.py", line 310, in RunScript
exec codeObject in __main__.__dict__
File "D:\Python25\csv_read.py", line 33, in <module>
print author_of_title
NameError: name 'author_of_title' is not defined

I am guessing that the author_of_title dict does not exist outside of
the function in which it is created? The concept of instantiation is
sort of foreign to me so I'm having some trouble predicting when it
happens.

If I call the assign_author_to_title function later, am I going to be
able to work with the author_of_title dictionary? Or is it best if I
create author_of_title outside of my function definitions?

Clearly I'm just stepping through my thought process right now,
creating functions as I see a need for them. I'm sure the code is
sloppy and terrible but please be gentle!
 
V

Virgil Dupras

Hello there, this is my first post to the list. Only been working with
Python for a few days. Basically a complete newbie to programming.

I'm working with csv module as an exercise to parse out a spreadsheet
I use for work.(I am an editor for a military journalism unit) Not
trying to do anything useful, just trying to manipulate the data.
Anyway, here's the code I've got so far:

import csv
import string
import os

#Open the appropriate .csv file
csv_file = csv.reader(open("D:\\Python25\\BNSR.csv"))

#Create blank dictionary to hold {[author]:[no. of stories]} data
story_per_author = {}

def author_to_dict(): #Function to add each author to the dictionary
once to get initial entry for that author
for row in csv_file:
author_count = row[-1]
story_per_author[author_count] = 1

#Fetch author names
def rem_blank_authors(): #Function to remove entries with '' in the
AUTHOR field of the .csv
csv_list = list(csv_file) #Convert the open file to list format
for e-z mode editing
for row in csv_list:
author_name = row[-1]
if author_name == '': #Find entries where no author is listed
csv_list.remove(row) #Remove those entries from the list

def assign_author_to_title(): #Assign an author to every title
author_of_title = {}
for row in csv_file:
title = row[3]
author = row[-1]
author_of_title[title] = author

assign_author_to_title()
print author_of_title

--

Ok, the last two lines are kind of my "test the last function" test.
Now when I run these two lines I get the error:

Traceback (most recent call last):
File "D:\Python25\Lib\SITE-P~1\PYTHON~1\pywin\framework
\scriptutils.py", line 310, in RunScript
exec codeObject in __main__.__dict__
File "D:\Python25\csv_read.py", line 33, in <module>
print author_of_title
NameError: name 'author_of_title' is not defined

I am guessing that the author_of_title dict does not exist outside of
the function in which it is created? The concept of instantiation is
sort of foreign to me so I'm having some trouble predicting when it
happens.

If I call the assign_author_to_title function later, am I going to be
able to work with the author_of_title dictionary? Or is it best if I
create author_of_title outside of my function definitions?

Clearly I'm just stepping through my thought process right now,
creating functions as I see a need for them. I'm sure the code is
sloppy and terrible but please be gentle!

As you said, author_of_title doesn't exist outside of
assign_author_to_title() because it has been instantiated in the
function, and thus belong to the local scope. You could instantiate
your dictionary outside of the function, but the nicest way to handle
this would be to add a line "return author_of_title" at the end of
assign_author_to_title() and have "print assign_author_to_title()"
instead of the 2 last lines.
 
B

Bruno Desthuilliers

Matt_D a écrit :
Hello there, this is my first post to the list. Only been working with
Python for a few days. Basically a complete newbie to programming.

I'm working with csv module as an exercise to parse out a spreadsheet
I use for work.(I am an editor for a military journalism unit) Not
trying to do anything useful, just trying to manipulate the data.
Anyway, here's the code I've got so far:

import csv
import string
import os

#Open the appropriate .csv file
csv_file = csv.reader(open("D:\\Python25\\BNSR.csv"))

#Create blank dictionary to hold {[author]:[no. of stories]} data
story_per_author = {}

def author_to_dict(): #Function to add each author to the dictionary
once to get initial entry for that author

First point: your comment would be better in a docstring - and that
would make the code more readable

def author_to_dict():
"""Function to add each author to the dictionary
once to get initial entry for that author
"""
for row in csv_file:

Second point: you're using 2 global variables. This is something to
avoid whenever possible (that is: almost always). Here you're in the
very typical situation of a function that produces output
(story_per_author) depending only on it's input (csv_file) - so the
correct implementation is to pass the input as an argument and return
the output:
author_count = row[-1]
story_per_author[author_count] = 1


def author_to_dict(csv_file):
story_per_author = {}
for row in csv_file:
author_count = row[-1]
story_per_author[author_count] = 1
return story_per_author

Now take care: the object returned by csv.reader is not a sequence, it's
an iterator. Once you've looped over all it's content, it's exhausted.
#Fetch author names
def rem_blank_authors():

same remark wrt/ comments

#Function to remove entries with '' in the
AUTHOR field of the .csv

# Convert the open file to list format
# for e-z mode editing
csv_list = list(csv_file)

Yet another useless global.

> for row in csv_list:
author_name = row[-1]
if author_name == '': #Find entries where no author is listed
csv_list.remove(row) #Remove those entries from the list

Since you don't return anything from this function, the only effect is
to consume the whole global csv_file iterator - the csv_list object is
discarded after function execution.

def assign_author_to_title(): #Assign an author to every title
author_of_title = {}
for row in csv_file:
title = row[3]
author = row[-1]
author_of_title[title] = author

Same remarks here
assign_author_to_title()
print author_of_title

author_of_title is local to the assign_author_to_title function. You
cannot access it from outside.
--

Ok, the last two lines are kind of my "test the last function" test.
Now when I run these two lines I get the error:

Traceback (most recent call last):
22> File "D:\Python25\Lib\SITE-P~1\PYTHON~1\pywin\framework
\scriptutils.py", line 310, in RunScript
exec codeObject in __main__.__dict__
File "D:\Python25\csv_read.py", line 33, in <module>
print author_of_title
NameError: name 'author_of_title' is not defined

I am guessing that the author_of_title dict does not exist outside of
the function in which it is created?
Bingo.

The concept of instantiation is
sort of foreign to me so I'm having some trouble predicting when it
happens.

It has nothing to do with instanciation, it's about scoping rules. A
named defined in a function is local to that function. If you create an
object in a function and want to make it available to the outside world,
you have to return it from the function - like I did the rewrite of
author_to_dict - and of course assign this return value to another name
in the caller function scope.
If I call the assign_author_to_title function later, am I going to be
able to work with the author_of_title dictionary? Or is it best if I
create author_of_title outside of my function definitions?

By all mean avoid global variables. In all the above code, there's not a
single reason to use them. Remember that functions take params and
return values. Please take a little time to read more material about
functions and scoping rules.
Clearly I'm just stepping through my thought process right now,
creating functions as I see a need for them. I'm sure the code is
sloppy and terrible

Well... It might be better !-)

Ok, there are the usual CS-newbie-struggling-with-new-concepts errors.
The cure is well-known : read more material (tutorials and code),
experiment (Python is great for this - read about the '-i' option of the
python interpreter), and post here when you run into trouble.
but please be gentle!

Hope I haven't been to rude !-)
 
M

Matt_D

Hello there, this is my first post to the list. Only been working with
Python for a few days. Basically a complete newbie to programming.
I'm working with csv module as an exercise to parse out a spreadsheet
I use for work.(I am an editor for a military journalism unit) Not
trying to do anything useful, just trying to manipulate the data.
Anyway, here's the code I've got so far:
import csv
import string
import os
#Open the appropriate .csv file
csv_file = csv.reader(open("D:\\Python25\\BNSR.csv"))
#Create blank dictionary to hold {[author]:[no. of stories]} data
story_per_author = {}
def author_to_dict(): #Function to add each author to the dictionary
once to get initial entry for that author
for row in csv_file:
author_count = row[-1]
story_per_author[author_count] = 1
#Fetch author names
def rem_blank_authors(): #Function to remove entries with '' in the
AUTHOR field of the .csv
csv_list = list(csv_file) #Convert the open file to list format
for e-z mode editing
for row in csv_list:
author_name = row[-1]
if author_name == '': #Find entries where no author is listed
csv_list.remove(row) #Remove those entries from the list
def assign_author_to_title(): #Assign an author to every title
author_of_title = {}
for row in csv_file:
title = row[3]
author = row[-1]
author_of_title[title] = author
assign_author_to_title()
print author_of_title

Ok, the last two lines are kind of my "test the last function" test.
Now when I run these two lines I get the error:
Traceback (most recent call last):
File "D:\Python25\Lib\SITE-P~1\PYTHON~1\pywin\framework
\scriptutils.py", line 310, in RunScript
exec codeObject in __main__.__dict__
File "D:\Python25\csv_read.py", line 33, in <module>
print author_of_title
NameError: name 'author_of_title' is not defined
I am guessing that the author_of_title dict does not exist outside of
the function in which it is created? The concept of instantiation is
sort of foreign to me so I'm having some trouble predicting when it
happens.
If I call the assign_author_to_title function later, am I going to be
able to work with the author_of_title dictionary? Or is it best if I
create author_of_title outside of my function definitions?
Clearly I'm just stepping through my thought process right now,
creating functions as I see a need for them. I'm sure the code is
sloppy and terrible but please be gentle!

As you said, author_of_title doesn't exist outside of
assign_author_to_title() because it has been instantiated in the
function, and thus belong to the local scope. You could instantiate
your dictionary outside of the function, but the nicest way to handle
this would be to add a line "return author_of_title" at the end of
assign_author_to_title() and have "print assign_author_to_title()"
instead of the 2 last lines.

Another newb question, same project:

#Fetch author names
def rem_blank_authors(): #Function to remove entries with '' in the
AUTHOR field of the .csv
csv_list = list(csv_file) #Convert the open file to list format
for e-z mode editing
for row in csv_list:
author_name = row[-1]
if author_name == '': #Find entries where no author is listed
csv_list.remove(row) #Remove those entries from the list
return csv_list

def author_to_dict(): #Function to add each author to the dictionary
once to get initial entry for that author
#rem_blank_authors() #Call this function to remove blank author
fields before building the main dictionary
for row in csv_file:
author_count = row[-1]
if author_count in story_per_author:
story_per_author[author_count] += 1
else:
story_per_author[author_count] = 1
return story_per_author

def assign_author_to_title(): #Assign an author to every title
author_of_title = {}
for row in csv_file:
title = row[3]
author = row[-1]
author_of_title[title] = author

author_to_dict()
print story_per_author

--

The solution provided for my previous post worked out. Now I'm testing
the author_to_dict function, modified to get an accurate count of
stories each author has written. Now, if I call rem_blank_authors,
story_per_author == {}. But if I #comment out that line, it returns
the expected key values in story_per_author. What is happening in
rem_blank_authors that is returning no keys in the dictionary?

I'm afraid I don't really understand the mechanics of "return" and
searching the docs hasn't yielded too much help since "return" is such
a common word (in both the Python 2.5 docs and Dive Into Python). I
realize I should probably RTFM, but honestly, I have tried and can't
find a good answer. Can I get a down and dirty explanation of exactly
what "return" does? And why it's sometimes "return" and sometimes it
has an argument? (i.e. "return" vs. "return author_of_title")
 
M

Matt_D

Hello there, this is my first post to the list. Only been working with
Python for a few days. Basically a complete newbie to programming.
I'm working with csv module as an exercise to parse out a spreadsheet
I use for work.(I am an editor for a military journalism unit) Not
trying to do anything useful, just trying to manipulate the data.
Anyway, here's the code I've got so far:
import csv
import string
import os
#Open the appropriate .csv file
csv_file = csv.reader(open("D:\\Python25\\BNSR.csv"))
#Create blank dictionary to hold {[author]:[no. of stories]} data
story_per_author = {}
def author_to_dict(): #Function to add each author to the dictionary
once to get initial entry for that author
for row in csv_file:
author_count = row[-1]
story_per_author[author_count] = 1
#Fetch author names
def rem_blank_authors(): #Function to remove entries with '' in the
AUTHOR field of the .csv
csv_list = list(csv_file) #Convert the open file to list format
for e-z mode editing
for row in csv_list:
author_name = row[-1]
if author_name == '': #Find entries where no author is listed
csv_list.remove(row) #Remove those entries from the list
def assign_author_to_title(): #Assign an author to every title
author_of_title = {}
for row in csv_file:
title = row[3]
author = row[-1]
author_of_title[title] = author
assign_author_to_title()
print author_of_title
--
Ok, the last two lines are kind of my "test the last function" test.
Now when I run these two lines I get the error:
Traceback (most recent call last):
File "D:\Python25\Lib\SITE-P~1\PYTHON~1\pywin\framework
\scriptutils.py", line 310, in RunScript
exec codeObject in __main__.__dict__
File "D:\Python25\csv_read.py", line 33, in <module>
print author_of_title
NameError: name 'author_of_title' is not defined
I am guessing that the author_of_title dict does not exist outside of
the function in which it is created? The concept of instantiation is
sort of foreign to me so I'm having some trouble predicting when it
happens.
If I call the assign_author_to_title function later, am I going to be
able to work with the author_of_title dictionary? Or is it best if I
create author_of_title outside of my function definitions?
Clearly I'm just stepping through my thought process right now,
creating functions as I see a need for them. I'm sure the code is
sloppy and terrible but please be gentle!
As you said, author_of_title doesn't exist outside of
assign_author_to_title() because it has been instantiated in the
function, and thus belong to the local scope. You could instantiate
your dictionary outside of the function, but the nicest way to handle
this would be to add a line "return author_of_title" at the end of
assign_author_to_title() and have "print assign_author_to_title()"
instead of the 2 last lines.

Another newb question, same project:

#Fetch author names
def rem_blank_authors(): #Function to remove entries with '' in the
AUTHOR field of the .csv
csv_list = list(csv_file) #Convert the open file to list format
for e-z mode editing
for row in csv_list:
author_name = row[-1]
if author_name == '': #Find entries where no author is listed
csv_list.remove(row) #Remove those entries from the list
return csv_list

def author_to_dict(): #Function to add each author to the dictionary
once to get initial entry for that author
#rem_blank_authors() #Call this function to remove blank author
fields before building the main dictionary
for row in csv_file:
author_count = row[-1]
if author_count in story_per_author:
story_per_author[author_count] += 1
else:
story_per_author[author_count] = 1
return story_per_author

def assign_author_to_title(): #Assign an author to every title
author_of_title = {}
for row in csv_file:
title = row[3]
author = row[-1]
author_of_title[title] = author

author_to_dict()
print story_per_author

--

The solution provided for my previous post worked out. Now I'm testing
the author_to_dict function, modified to get an accurate count of
stories each author has written. Now, if I call rem_blank_authors,
story_per_author == {}. But if I #comment out that line, it returns
the expected key values in story_per_author. What is happening in
rem_blank_authors that is returning no keys in the dictionary?

I'm afraid I don't really understand the mechanics of "return" and
searching the docs hasn't yielded too much help since "return" is such
a common word (in both the Python 2.5 docs and Dive Into Python). I
realize I should probably RTFM, but honestly, I have tried and can't
find a good answer. Can I get a down and dirty explanation of exactly
what "return" does? And why it's sometimes "return" and sometimes it
has an argument? (i.e. "return" vs. "return author_of_title")

Oop, made this last post before seeing Bruno's. Still have the same
questions but haven't implemented his suggestions.
 
M

Matt_D

On Dec 7, 11:42 am, Virgil Dupras <[email protected]>
wrote:
Hello there, this is my first post to the list. Only been working with
Python for a few days. Basically a complete newbie to programming.
I'm working with csv module as an exercise to parse out a spreadsheet
I use for work.(I am an editor for a military journalism unit) Not
trying to do anything useful, just trying to manipulate the data.
Anyway, here's the code I've got so far:
import csv
import string
import os
#Open the appropriate .csv file
csv_file = csv.reader(open("D:\\Python25\\BNSR.csv"))
#Create blank dictionary to hold {[author]:[no. of stories]} data
story_per_author = {}
def author_to_dict(): #Function to add each author to the dictionary
once to get initial entry for that author
for row in csv_file:
author_count = row[-1]
story_per_author[author_count] = 1
#Fetch author names
def rem_blank_authors(): #Function to remove entries with '' in the
AUTHOR field of the .csv
csv_list = list(csv_file) #Convert the open file to list format
for e-z mode editing
for row in csv_list:
author_name = row[-1]
if author_name == '': #Find entries where no author is listed
csv_list.remove(row) #Remove those entries from the list
def assign_author_to_title(): #Assign an author to every title
author_of_title = {}
for row in csv_file:
title = row[3]
author = row[-1]
author_of_title[title] = author
assign_author_to_title()
print author_of_title
--
Ok, the last two lines are kind of my "test the last function" test.
Now when I run these two lines I get the error:
Traceback (most recent call last):
File "D:\Python25\Lib\SITE-P~1\PYTHON~1\pywin\framework
\scriptutils.py", line 310, in RunScript
exec codeObject in __main__.__dict__
File "D:\Python25\csv_read.py", line 33, in <module>
print author_of_title
NameError: name 'author_of_title' is not defined
I am guessing that the author_of_title dict does not exist outside of
the function in which it is created? The concept of instantiation is
sort of foreign to me so I'm having some trouble predicting when it
happens.
If I call the assign_author_to_title function later, am I going to be
able to work with the author_of_title dictionary? Or is it best if I
create author_of_title outside of my function definitions?
Clearly I'm just stepping through my thought process right now,
creating functions as I see a need for them. I'm sure the code is
sloppy and terrible but please be gentle!
As you said, author_of_title doesn't exist outside of
assign_author_to_title() because it has been instantiated in the
function, and thus belong to the local scope. You could instantiate
your dictionary outside of the function, but the nicest way to handle
this would be to add a line "return author_of_title" at the end of
assign_author_to_title() and have "print assign_author_to_title()"
instead of the 2 last lines.
Another newb question, same project:
#Fetch author names
def rem_blank_authors(): #Function to remove entries with '' in the
AUTHOR field of the .csv
csv_list = list(csv_file) #Convert the open file to list format
for e-z mode editing
for row in csv_list:
author_name = row[-1]
if author_name == '': #Find entries where no author is listed
csv_list.remove(row) #Remove those entries from the list
return csv_list
def author_to_dict(): #Function to add each author to the dictionary
once to get initial entry for that author
#rem_blank_authors() #Call this function to remove blank author
fields before building the main dictionary
for row in csv_file:
author_count = row[-1]
if author_count in story_per_author:
story_per_author[author_count] += 1
else:
story_per_author[author_count] = 1
return story_per_author
def assign_author_to_title(): #Assign an author to every title
author_of_title = {}
for row in csv_file:
title = row[3]
author = row[-1]
author_of_title[title] = author
author_to_dict()
print story_per_author

The solution provided for my previous post worked out. Now I'm testing
the author_to_dict function, modified to get an accurate count of
stories each author has written. Now, if I call rem_blank_authors,
story_per_author == {}. But if I #comment out that line, it returns
the expected key values in story_per_author. What is happening in
rem_blank_authors that is returning no keys in the dictionary?
I'm afraid I don't really understand the mechanics of "return" and
searching the docs hasn't yielded too much help since "return" is such
a common word (in both the Python 2.5 docs and Dive Into Python). I
realize I should probably RTFM, but honestly, I have tried and can't
find a good answer. Can I get a down and dirty explanation of exactly
what "return" does? And why it's sometimes "return" and sometimes it
has an argument? (i.e. "return" vs. "return author_of_title")

Oop, made this last post before seeing Bruno's. Still have the same
questions but haven't implemented his suggestions.

Wow, list spam.

Sorry about this.

Anyway, disregard my last two. I get it now. When a variable is
defined inside a function, and modifications are made to it, that
variable must be passed to module. In this specific example when I:
return csv_file

in the rem_blank_authors() function it goes *back* to author_to_dict
using the now-available csv_file variable as its parameter. The
problem in that I was forgetting that each row in the csv file is a
discrete item in the list generated by csv.reader. I think I got it
figured now. Thanks for all the help.
 
B

Bruno Desthuilliers

Matt_D a écrit :
(snip)
Another newb question, same project:

#Fetch author names
def rem_blank_authors(): #Function to remove entries with '' in the
AUTHOR field of the .csv
csv_list = list(csv_file) #Convert the open file to list format
for e-z mode editing
for row in csv_list:
author_name = row[-1]
if author_name == '': #Find entries where no author is listed
csv_list.remove(row) #Remove those entries from the list
return csv_list

The return statement being in the for block, this will always return at
the end of the first iteration. Which is obviously not what you want !-)

Hint : the return statement should be at the same indentation level as
the for statement.

Also, modifying a sequence in place while iterating over it is a *very*
bad idea. The canonical solution is to build a new sequence based on the
original. or, if working on an iterator, to write a filtering iterator, ie:

def rem_blank_authors(csv_file):
for row in csv_list:
author_name = row[-1]
if author_name.strip() != '':
yield row


Now you can use it as:

for row in rem_blank_authors(csv_file)):
assert(row[-1].strip() != '')

def author_to_dict():

def author_to_dict(csv_file):
for row in csv_file:

for row in rem_blank_authors(csv_file)):
author_count = row[-1]

Why naming it 'author_count' when it's the author name ?

author_name = row[-1]

You're inside a fonction, which has it's own namespace, so this won't
clash with local names of other functions !-)
if author_count in story_per_author:
story_per_author[author_count] += 1
else:
story_per_author[author_count] = 1

if author_name in story_per_author:
story_per_author[author_name] += 1
else:
story_per_author[author_name] = 1

Not very important but might be good to know: if the general case is
that each author is found several times in a csv file, you might get
better results using a try/except block:

try:
story_per_author[author_name] += 1
except KeyError:
story_per_author[author_name] = 1

return story_per_author
The solution provided for my previous post worked out. Now I'm testing
the author_to_dict function, modified to get an accurate count of
stories each author has written. Now, if I call rem_blank_authors,
story_per_author == {}. But if I #comment out that line, it returns
the expected key values in story_per_author. What is happening in
rem_blank_authors that is returning no keys in the dictionary?

You already consumed the whole csv_file iterator.
I'm afraid I don't really understand the mechanics of "return"

It's an entirely different problem.
and
searching the docs hasn't yielded too much help since "return" is such
a common word (in both the Python 2.5 docs and Dive Into Python).

heck, not only in Python - this is really CS101.

For short: the return statemement terminates the function execution.
Also, the object (if any) following the return statement will be the
"return value" of the function, that is the value returned to the
caller. If no value is given, the return value will be the None object
(which is Python's generic 'null' value). If the function ends without a
n explicit return, it's return value will also be None.
I
realize I should probably RTFM, but honestly, I have tried and can't
find a good answer. Can I get a down and dirty explanation of exactly
what "return" does? And why it's sometimes "return" and sometimes it
has an argument? (i.e. "return" vs. "return author_of_title")

cf above.

Dummy exemples:

def add(num1, num2):
return num1 + num2

x = add(1, 2)
print x

def say_hello_to_jim(name):
if name == 'jim':
return "Hello Jim"
# implicit 'return None' here

names = ['bob', 'will', 'jim', 'al']
for name in names:
print say_hello_to_jim(name)

HTH
 
B

Bruno Desthuilliers

Matt_D a écrit :
(snip whole posts)

Please Matt do the world a favour : don't repost the whole damn think
just to add a couple lines !-)
 
T

Tim Golden

Matt_D wrote:

[... snip loads ...]
Wow, list spam.
Indeed.

Sorry about this.
Good.

Anyway, disregard my last two. I get it now.

Glad to hear it.

<friendly advice>
Might I suggest, though, that it's not necessary to repeat
the entire history of the thread on every email. Unless
you have reason to reproduce the older code and comments,
just elide them, optionally using some kind of [snip] marker
as I have above to indicate that something's been left out.
</friendly advice>

TJG
 
S

Shane Geiger

#!/usr/bin/python
"""
EXAMPLE USAGE OF PYTHON'S CSV.DICTREADER FOR PEOPLE NEW TO PYTHON

Python - Batteries Included(tm)

This file will demonstrate that when you use the python CSV module, you
don't have to remove the newline characters, as between "acorp_ Ac" and
"orp Foundation" and other parts of the data below.

It also demonstrates python's csv.DictReader, which allows you to read a
CSV record into a dictionary.

This will also demonstrate the use of lists ([]s) and dicts ({}s).

If this doesn't whet your appetite for getting ahold of a powertool
instead of sed for managing CSV data, I don't know what will.

"""

#### FIRST: CREATE A TEMPORARY CSV FILE FOR DEMONSTRATION PURPOSES
mycsvdata = """
"Category","0","acorp_ Ac
orp Foundation","","","Acorp Co","(480) 905-1906","877-462-5267 toll
free","800-367-2228","800-367-2228","(e-mail address removed)
g","7895 East Drive","Scottsdale","AZ","85260-6916","","","","","","Pres
Fred & Linda ","0","0","1","3","4","1"

"Category","0","acorp_ Bob and Margaret Schwartz","","","","317-321-6030
her","317-352-0844","","","","321 North Butler Ave.","In
dianapolis","IN","46219","","","","","","Refrigeration
man","0","1","2","3","4","0"

"Category","0","acorp_ Elschlager,
Bob","","","","","702-248-4556","","","(e-mail address removed)","7950 W.
Flamingo Rd. #2032","Las Vega
s","NV","89117","","","","","","guy I met","0","1","2","3","4","1"

"""

## NOTE: IF YOU HAVE A RECORD SEPARATOR WITHIN QUOTES, IT WILL NOT BE
TREATED LIKE A RECORD SEPARATOR!
## Beef|"P|otatos"|Dinner Roll|Ice Cream


import os, sys
def writefile(filename, filedata, perms=750):
f = open(filename, "w")
f.write(filedata)
os.system("chmod "+str(perms)+" "+filename)
f.close()

file2write = 'mycsvdata.txt'
writefile(file2write,mycsvdata)

# Check that the file exists
if not os.path.exists(file2write):
print "ERROR: unable to write file:", file2write," Exiting now!"
sys.exit()

# ...so everything down to this point merely creates the
# temporary CSV file for the code to test (below).



#### SECOND: READ IN THE CSV FILE TO CREATE A LIST OF PYTHON
DICTIONARIES, WHERE EACH
# DICTIONARY CONTAINS THE DATA FROM ONE ROW. THE KEYS OF THE
DICTIONARY WILL BE THE FIELD NAMES
# AND THE VALUES OF THE DICTIONARY WILL BE THE VALUES CONTAINED WITHIN
THE CSV FILE'S ROW.

import csv

### NOTE: Modify this list to match the fields of the CSV file.
header_flds =
['cat','num','name','blank1','blank2','company','phone1','phone2', \

'phone3','phone4','email','addr1','city','state','zip','blank3', \

'blank4','blank5','blank6','blank7','title','misc1','misc2','misc3', \
'mics4','misc5','misc6']

file2open = 'mycsvdata.txt'

reader = csv.DictReader(open(file2open), [], delimiter=",")
data = []
while True:
try:
# Read next "header" line (if there isn't one then exit the loop)
reader.fieldnames = header_flds
rdr = reader.next()
data.append(rdr)
except StopIteration: break


def splitjoin(x):
""" This removes any nasty \n that might exist in a field
(of course, if you want that in the field, don't use this)
"""
return ''.join((x).split('\n'))


#### THIRD: ITERATE OVER THE LIST OF DICTS (IN WHICH EACH DICT IS A
ROW/RECORD FROM THE CSV FILE)

# example of accessing all the dictionaries once they are in the list
'data':
import string
for rec in data: # for each CVS record
itmz = rec.items() # get the items from the dictionary
print "- = " * 20
for key,val in itmz:
print key.upper()+": \t\t",splitjoin(val)
# Note: splitjoin() allows a record to contain fields
with newline characters




Matt_D said:
Hello there, this is my first post to the list. Only been working with
Python for a few days. Basically a complete newbie to programming.

I'm working with csv module as an exercise to parse out a spreadsheet
I use for work.(I am an editor for a military journalism unit) Not
trying to do anything useful, just trying to manipulate the data.
Anyway, here's the code I've got so far:

import csv
import string
import os

#Open the appropriate .csv file
csv_file = csv.reader(open("D:\\Python25\\BNSR.csv"))

#Create blank dictionary to hold {[author]:[no. of stories]} data
story_per_author = {}

def author_to_dict(): #Function to add each author to the dictionary
once to get initial entry for that author
for row in csv_file:
author_count = row[-1]
story_per_author[author_count] = 1

#Fetch author names
def rem_blank_authors(): #Function to remove entries with '' in the
AUTHOR field of the .csv
csv_list = list(csv_file) #Convert the open file to list format
for e-z mode editing
for row in csv_list:
author_name = row[-1]
if author_name == '': #Find entries where no author is listed
csv_list.remove(row) #Remove those entries from the list

def assign_author_to_title(): #Assign an author to every title
author_of_title = {}
for row in csv_file:
title = row[3]
author = row[-1]
author_of_title[title] = author


assign_author_to_title()
print author_of_title

--

Ok, the last two lines are kind of my "test the last function" test.
Now when I run these two lines I get the error:

Traceback (most recent call last):
File "D:\Python25\Lib\SITE-P~1\PYTHON~1\pywin\framework
\scriptutils.py", line 310, in RunScript
exec codeObject in __main__.__dict__
File "D:\Python25\csv_read.py", line 33, in <module>
print author_of_title
NameError: name 'author_of_title' is not defined

I am guessing that the author_of_title dict does not exist outside of
the function in which it is created? The concept of instantiation is
sort of foreign to me so I'm having some trouble predicting when it
happens.

If I call the assign_author_to_title function later, am I going to be
able to work with the author_of_title dictionary? Or is it best if I
create author_of_title outside of my function definitions?

Clearly I'm just stepping through my thought process right now,
creating functions as I see a need for them. I'm sure the code is
sloppy and terrible but please be gentle!


--
Shane Geiger
IT Director
National Council on Economic Education
(e-mail address removed) | 402-438-8958 | http://www.ncee.net

Leading the Campaign for Economic and Financial Literacy



-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHWbyGyuP8LzpNs84RAlEmAJ9kPVZB1L4pPBfQQzK5EfRZWIHd1ACgooAf
ztQ5sV0xLO1sgYvZkT7vjfU=
=gf1i
-----END PGP SIGNATURE-----
 
S

Steven D'Aprano

Also, modifying a sequence in place while iterating over it is a *very*
bad idea.

That's somewhat of an exaggeration, surely. Some sorts of modifications
are more error-prone than others, and deserves your warning e.g.
inserting or deleting items (as the OP was doing). But modifying the
items themselves isn't risky at all. E.g. instead of this:

L = []
for x in xrange(1000000):
L.append(result_of_some_calculation(x))


a perfectly reasonable optimization technique for large lists is:

L = [None]*1000000
for i,x in enumerate(L):
L = result_of_some_calculation(x)


The second avoids needing to re-size the list repeatedly, which is quite
slow.
 
B

Bruno Desthuilliers

Steven D'Aprano a écrit :
That's somewhat of an exaggeration, surely.

Somewhat of a shortcut if you want - given the context, I obviously
meant adding/removing items to/from the sequence while iterating over it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,999
Messages
2,570,243
Members
46,836
Latest member
login dogas

Latest Threads

Top