Word count from file help.

J

jester.dev

Hello,

I'm learning Python from Python Bible, and having some
problems with this code below. When I run it, I get nothing. It
should open the file poem.txt (which exists in the current
directory) and count number of times any given word appears
in the text.

#!/usr/bin/python


# WordCount.py - Counts the words in a given text file (poem.txt)

import string

def CountWords(Text):
"Count how many times each word appears in Text"
# A string (above) after a def statement is a -
# "docstring" - a comment intended for documentation.
WordCount={}
# We will build up (and return) a dictionary whose keys
# are the words, and whose values are the corresponding
# number of occurrences.

CountWords=""
# To make the job cleaner, add a period at the end of the
# text; that way, we are guaranteed to be finished with
# the current word when we run out of letters:
Text=Text+"."

# We assume that ' and - don't break words, but any other
# nonalphabetic character does. This assumption isn't
# entirely accurate, but it's close enough for us.
# string.letters is a string of all alphabetic charactors.
PiecesOfWords=string.letters+"'-"

# Iterate over each character in the text. The function
# len () returns the length of a sequence.
for CharacterIndex in range(0,len(Text)):
CurrentCharacter=Text[CharacterIndex]

# The find() method of a string finds the starting
# index of the first occurrence of a substring within
# a string, or returns -1 of it doesn't find a substring.
# The next line of code tests to see wether CurrentCharacter
# is part of a word:
if(PiecesOfWords.find(CurrentCharacter)!=-1):
# Append this letter to the current word.
CurrentWord=CurrentWord+CurrentCharacter
else:
# This character is no a letter.
if(CurrentWord!=""):
# We just finished a word.
# Convert to lowercase, so "The" and
"the"
# fall in the same bucket...

CurrentWord=string.lower(CurrentWord)

# Now increment this word's count.

CurrentCount=WordCount.get(CurrentWord,0)

WordCount[CurrentWord]=CurrentCount+1

# Start a new word.
CurrentWord=""
return(WordCount)
if (__name__=="__main__"):
# Read the text from the file
peom.txt.
TextFile=open("poem.txt","r")
Text=TextFile.read()
TextFile.close()

# Count the words in the text.
WordCount=CountWords(Text)
# Alphabetize the word list, and
print them all out.
SortedWords=WordCount.keys()
SortedWords.sort()
for Word in SortedWords:
print Word.WordCount[Word]
 
B

Ben Finney

I'm learning Python from Python Bible

Welcome, I hope you're enjoying learning the language.
problems with this code below. When I run it, I get nothing.

More information required:

How are you invoking it (what command do you type)? Does the program
appear to do something, then exit?

You've told us what you expect the program to do (thanks!):
It should open the file poem.txt (which exists in the current
directory) and count number of times any given word appears in the
text.

Diagnostics:

When you encounter unexpected behaviour in a complex piece of code, it's
best to test some assumptions.

What happens when the file "poem.txt" is not there? (Rename the file to
a different name.) This will tell you whether the program is even
attempting to read the file.

What happens when you import this into the interactive Python prompt,
then call CountWords on some text? This will tell you whether the
function is performing as expected.

And so on.


One possible problem that may be a mistake in the way you pasted the
text into your newsgroup message:
#!/usr/bin/python
[...]
import string

def CountWords(Text):
[...]
for CharacterIndex in range(0,len(Text)):
[...]
if(PiecesOfWords.find(CurrentCharacter)!=-1):
[...]
else:
if(CurrentWord!=""):
[...]
if (__name__=="__main__"):
[...]

Indentation defines structural language blocks in Python. The "def",
"for", "if" structures above will encompass *all* lines below them until
the next line at their own indentation level or less.

In other words, if the code looks the way you've pasted it here, the
"def" encompasses everything below it; the "for" encompasses everything
below it; and the "if(PiecesOfWords...):" encompasses everything below
it. Including the "if( __name__ == "__main__" ):" line.

Thus, as you've posted it here, the file imports the string module,
defines a function -- then does nothing with it.

Please be sure to paste the text literally in messages; or, if you've
pasted the text exactly as it is in the program, learn how Python
interprets indentation:

<http://www.python.org/doc/current/ref/indentation.html>
 
P

Paul McGuire

jester.dev said:
Hello,

I'm learning Python from Python Bible, and having some
problems with this code below. When I run it, I get nothing. It
should open the file poem.txt (which exists in the current
directory) and count number of times any given word appears
in the text.
Try this:

# wordCount.py
#
# invoke using: python wordCount.py <filename>
#
from pyparsing import Word, alphas
import sys

# modify this word definition as you wish - whitespace is implicit separator
wordSpec = Word(alphas)

if len(sys.argv) > 1:
infile = sys.argv[1]

wordDict = {}
filetext = "\n".join( file(infile).readlines() )
for wd,locstart,locend in wordSpec.scanString(filetext):
#~ curWord = string.lower(wd[0])
curWord = wd[0].lower()
if wordDict.has_key( curWord ):
wordDict[curWord] += 1
else:
wordDict[curWord] = 1

print "%s has %d different words." % ( infile, len(wordDict.keys()) )
keylist = wordDict.keys()
keylist.sort( lambda a,b:
( wordDict - wordDict[a] ) or
( ( ( a > b ) and 1 ) or ( ( a < b ) and -1 ) or 0 ) )
for k in keylist:
print k, ":", wordDict[k]
 
P

Paul McGuire

Oops, sorry, forgot to mention that this requires downloading pyparsing at
http://pyparsing.sourceforge.net.

Paul McGuire said:
jester.dev said:
Hello,

I'm learning Python from Python Bible, and having some
problems with this code below. When I run it, I get nothing. It
should open the file poem.txt (which exists in the current
directory) and count number of times any given word appears
in the text.
Try this:

# wordCount.py
#
# invoke using: python wordCount.py <filename>
#
from pyparsing import Word, alphas
import sys

# modify this word definition as you wish - whitespace is implicit separator
wordSpec = Word(alphas)

if len(sys.argv) > 1:
infile = sys.argv[1]

wordDict = {}
filetext = "\n".join( file(infile).readlines() )
for wd,locstart,locend in wordSpec.scanString(filetext):
curWord = wd[0].lower()
if wordDict.has_key( curWord ):
wordDict[curWord] += 1
else:
wordDict[curWord] = 1

print "%s has %d different words." % ( infile, len(wordDict.keys()) )
keylist = wordDict.keys()
keylist.sort( lambda a,b:
( wordDict - wordDict[a] ) or
( ( ( a > b ) and 1 ) or ( ( a < b ) and -1 ) or 0 ) )
for k in keylist:
print k, ":", wordDict[k]
 
J

jester.dev

See inline.

Ben said:
How are you invoking it (what command do you type)? Does the program
appear to do something, then exit?

I made it executable: chmod 755 word_count.py
I also tried: python word_count.py

Diagnostics:

When you encounter unexpected behaviour in a complex piece of code, it's
best to test some assumptions.

What happens when the file "poem.txt" is not there? (Rename the file to
a different name.) This will tell you whether the program is even
attempting to read the file.

It does nothing either way. First time I ran it the file was not there.
What happens when you import this into the interactive Python prompt,
then call CountWords on some text? This will tell you whether the
function is performing as expected.

Nothing happens. :) So I guess what you said below is correct.
And so on.


One possible problem that may be a mistake in the way you pasted the
text into your newsgroup message:
#!/usr/bin/python
[...]
import string

def CountWords(Text):
[...]
for CharacterIndex in range(0,len(Text)):
[...]
if(PiecesOfWords.find(CurrentCharacter)!=-1):
[...]
else:
if(CurrentWord!=""):
[...]
if (__name__=="__main__"):
[...]

Indentation defines structural language blocks in Python. The "def",
"for", "if" structures above will encompass *all* lines below them until
the next line at their own indentation level or less.

In other words, if the code looks the way you've pasted it here, the
"def" encompasses everything below it; the "for" encompasses everything
below it; and the "if(PiecesOfWords...):" encompasses everything below
it. Including the "if( __name__ == "__main__" ):" line.

Thus, as you've posted it here, the file imports the string module,
defines a function -- then does nothing with it.

Please be sure to paste the text literally in messages; or, if you've
pasted the text exactly as it is in the program, learn how Python
interprets indentation:

<http://www.python.org/doc/current/ref/indentation.html>

Thanks for the link. I'm not really used to this whole indention deal yet. I
as however using WingIDE which indents for me.

JesterDev
 
D

Dave K

Hello,

I'm learning Python from Python Bible, and having some
problems with this code below. When I run it, I get nothing. It
should open the file poem.txt (which exists in the current
directory) and count number of times any given word appears
in the text.

When I run it (after re-formatting - you can see below how it appears
in my newsreader), and after fixing the two error messages, it prints
the results just as you describe. Try this:

1) Add the line 'CurrentWord = ""' just before the line
'for CharacterIndex in range(0,len(Text)):'
2) Change the very last line to 'print Word, WordCount[Word]'

If that doesn't work for you then I suspect that the indenting in your
program is wrong (rather than just being mangled by posting it), but
I'm just guessing. It would be helpful if you posted the actual error
message (Traceback) that the Python interpreter prints, that makes it
much easier to find the problem.

Dave
#!/usr/bin/python


# WordCount.py - Counts the words in a given text file (poem.txt)

import string

def CountWords(Text):
"Count how many times each word appears in Text"
# A string (above) after a def statement is a -
# "docstring" - a comment intended for documentation.
WordCount={}
# We will build up (and return) a dictionary whose keys
# are the words, and whose values are the corresponding
# number of occurrences.

CountWords=""
# To make the job cleaner, add a period at the end of the
# text; that way, we are guaranteed to be finished with
# the current word when we run out of letters:
Text=Text+"."

# We assume that ' and - don't break words, but any other
# nonalphabetic character does. This assumption isn't
# entirely accurate, but it's close enough for us.
# string.letters is a string of all alphabetic charactors.
PiecesOfWords=string.letters+"'-"

# Iterate over each character in the text. The function
# len () returns the length of a sequence.
for CharacterIndex in range(0,len(Text)):
CurrentCharacter=Text[CharacterIndex]

# The find() method of a string finds the starting
# index of the first occurrence of a substring within
# a string, or returns -1 of it doesn't find a substring.
# The next line of code tests to see wether CurrentCharacter
# is part of a word:
if(PiecesOfWords.find(CurrentCharacter)!=-1):
# Append this letter to the current word.
CurrentWord=CurrentWord+CurrentCharacter
else:
# This character is no a letter.
if(CurrentWord!=""):
# We just finished a word.
# Convert to lowercase, so "The" and
"the"
# fall in the same bucket...

CurrentWord=string.lower(CurrentWord)

# Now increment this word's count.

CurrentCount=WordCount.get(CurrentWord,0)

WordCount[CurrentWord]=CurrentCount+1

# Start a new word.
CurrentWord=""
return(WordCount)
if (__name__=="__main__"):
# Read the text from the file
peom.txt.
TextFile=open("poem.txt","r")
Text=TextFile.read()
TextFile.close()

# Count the words in the text.
WordCount=CountWords(Text)
# Alphabetize the word list, and
print them all out.
SortedWords=WordCount.keys()
SortedWords.sort()
for Word in SortedWords:
print Word.WordCount[Word]
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,961
Messages
2,570,130
Members
46,689
Latest member
liammiller

Latest Threads

Top