How do I skip over multiple words in a file?

chad · Nov 11, 2010

Let's say that I have an article. What I want to do is read in this
file and have the program skip over ever instance of the words "the",
"and", "or", and "but". What would be the general strategy for
attacking a problem like this?

Tim Chase · Nov 11, 2010

Let's say that I have an article. What I want to do is read in
this file and have the program skip over ever instance of the
words "the", "and", "or", and "but". What would be the
general strategy for attacking a problem like this?

I'd keep a file of "stop words", read them into a set
(normalizing case in the process). Then, as I skim over each
word in my target file, check if the case-normalized version of
the word is in your stop-words and skipping if it is. It might
look something like this:

def normalize_word(s):
return s.strip().upper()

stop_words = set(
normalize_word(word)
for word in file('stop_words.txt')
)
for line in file('data.txt'):
for word in line.split():
if normalize_word(word) in stop_words: continue
process(word)

-tkc

r0g · Nov 11, 2010

Let's say that I have an article. What I want to do is read in this
file and have the program skip over ever instance of the words "the",
"and", "or", and "but". What would be the general strategy for
attacking a problem like this?

If your files are not too big I'd simply read them into a string and do
a string replace for each word you want to skip. If you want case
insensitivity use re.replace() instead of the default string.replace()
method. Neither are elegant or all that efficient but both are very
easy. If your use case requires something high performance then best
keep looking

Roger.

Paul Watson · Nov 11, 2010

Let's say that I have an article. What I want to do is read in this
file and have the program skip over ever instance of the words "the",
"and", "or", and "but". What would be the general strategy for
attacking a problem like this?

I realize that you may need or want to do this in Python. This would be
trivial in an awk script.

Paul Rubin · Nov 11, 2010

chad said:
Let's say that I have an article. What I want to do is read in this
file and have the program skip over ever instance of the words "the",
"and", "or", and "but". What would be the general strategy for
attacking a problem like this?

Something like (untested):

stopwords = set (('and', 'or', 'but'))

def goodwords():
for line in file:
for w in line.split():
if w.lower() not in stopwords:
yield w

Removing punctuation is left as an exercise.

Stefan Sonnenberg-Carstens · Nov 11, 2010

Am 11.11.2010 21:33, schrieb Paul Watson:

I realize that you may need or want to do this in Python. This would
be trivial in an awk script.

There are several ways to do this.

skip = ('and','or','but')
all=[]
[[all.append(w) for w in l.split() if w not in skip] for l in
open('some.txt').readlines()]
print all

If some.txt contains your original question, it returns this:
["Let's", 'say', 'that', 'I', 'have', 'an', 'article.', 'What', 'I',
'want', 'to
', 'do', 'is', 'read', 'in', 'this', 'file', 'have', 'the', 'program',
'skip', '
over', 'ever', 'instance', 'of', 'the', 'words', '"the",', '"and",',
'"or",', '"
but".', 'What', 'would', 'be', 'the', 'general', 'strategy', 'for',
'attacking',
'a', 'problem', 'like', 'this?']

But this _one_ way to get there.
Faster solutions could be based on a regex:
import re
skip = ('and','or','but')
all = re.compile('(\w+)')
print [w for w in all.findall(open('some.txt').read()) if w not in skip]

this gives this result (you loose some punctuation etc):
['Let', 's', 'say', 'that', 'I', 'have', 'an', 'article', 'What', 'I',
'want', '
to', 'do', 'is', 'read', 'in', 'this', 'file', 'have', 'the', 'program',
'skip',
'over', 'ever', 'instance', 'of', 'the', 'words', 'the', 'What',
'would', 'be',
'the', 'general', 'strategy', 'for', 'attacking', 'a', 'problem',
'like', 'this
']

But there are some many ways to do it ...

What does --no-skip do in nose?	2	Dec 31, 2013
How do i convert a Chinese DAT file from a game I play	2	Feb 4, 2022
Single put routine overlapping words during iteration	4	Jan 2, 2023
How do I set the default content page) on a Classic ASP file?	0	Aug 24, 2021
How do i do math problems with files in java?	3	Jan 11, 2022
Spring Boot Request Mapping: How to Handle Multiple Request Paths in a Controller	1	Oct 12, 2023
How do i open a file in java?	6	Jan 11, 2022
How do I make this craftinfsystem Work	1	Feb 9, 2023

How do I skip over multiple words in a file?

chad

Tim Chase

r0g

Paul Watson

Paul Rubin

Stefan Sonnenberg-Carstens

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads