Am 11.11.2010 21:33, schrieb Paul Watson:
I realize that you may need or want to do this in Python. This would
be trivial in an awk script.
There are several ways to do this.
skip = ('and','or','but')
all=[]
[[all.append(w) for w in l.split() if w not in skip] for l in
open('some.txt').readlines()]
print all
If some.txt contains your original question, it returns this:
["Let's", 'say', 'that', 'I', 'have', 'an', 'article.', 'What', 'I',
'want', 'to
', 'do', 'is', 'read', 'in', 'this', 'file', 'have', 'the', 'program',
'skip', '
over', 'ever', 'instance', 'of', 'the', 'words', '"the",', '"and",',
'"or",', '"
but".', 'What', 'would', 'be', 'the', 'general', 'strategy', 'for',
'attacking',
'a', 'problem', 'like', 'this?']
But this _one_ way to get there.
Faster solutions could be based on a regex:
import re
skip = ('and','or','but')
all = re.compile('(\w+)')
print [w for w in all.findall(open('some.txt').read()) if w not in skip]
this gives this result (you loose some punctuation etc):
['Let', 's', 'say', 'that', 'I', 'have', 'an', 'article', 'What', 'I',
'want', '
to', 'do', 'is', 'read', 'in', 'this', 'file', 'have', 'the', 'program',
'skip',
'over', 'ever', 'instance', 'of', 'the', 'words', 'the', 'What',
'would', 'be',
'the', 'general', 'strategy', 'for', 'attacking', 'a', 'problem',
'like', 'this
']
But there are some many ways to do it ...