how to remove the same words in the paragraph

kylin · Nov 3, 2009

I need to remove the word if it appears in the paragraph twice. could
some give me some clue or some useful function in the python.

Andre Engels · Nov 3, 2009

I need to remove the word if it appears in the paragraph twice. could
some give me some clue or some useful function in the python.

Well, it depends a bit on what you call 'the same word' (In the
paragraph "Fly fly, fly!" does the word fly occur 0, 1, 2 or 3
times?), but the split() function seems a logical choice to use
whatever the answer to that question.

Peter Otten · Nov 3, 2009

kylin said:
I want to remove all the punctuation and no need words form a string
datasets for experiment.

I need to remove the word if it appears in the paragraph twice. could
some give me some clue or some useful function in the python.

twice. could
.... some give me some clue or some useful function in the python.
.... """".:,-"))).split())))
I
appears
clue
could
function
give
if
in
it
me
need
or
paragraph
python
remove
some
the
to
twice
useful
word

Tim Chase · Nov 3, 2009

kylin said:
I need to remove the word if it appears in the paragraph twice. could
some give me some clue or some useful function in the python.

Sounds like homework. To fail your class, use this one:
one two three four five six seven eight

which is absolutely horrible because it mutates the set within
the list comprehension. The passable solution would use a
for-loop to iterate over each word in the paragraph, emitting it
if it hadn't already been seen. Maintain those words in set, so
your words know how not to be seen. ("Mr. Nesbitt, would you
please stand up?")

This also assumes your paragraph consists only of words and
whitespace. But since you posted your previous homework-sounding
question on stripping out non-word/whitespace characters, you'll
want to look into using a regexp like "[\w\s]" to clean up the
cruft in the paragraph. Neither solution above preserves non
white-space/word characters, for which I'd recommend using a
re.sub() with a callback. Such a callback class might look
something like
.... def __init__(self):
.... self.s = set()
.... def __call__(self, m):
.... w = m.group(0)
.... if w in self.s: return ''
.... self.s.add(w)
.... return w
....
where I leave the definition of "r" to the student. Also beware
of case-differences for which you might have to normalize.

You'll also want to use more descriptive variable names than my
one-letter tokens.

-tkc

Tim Chase · Nov 4, 2009

Can we use inp_paragraph.count(iter_word) to make it simple ?

It would work, but the performance will drop off sharply as the
length of the paragraph grows, and you'd still have to keep track
of which words you already printed so you can correctly print the
first one. So you might as well not bother with counting.

-tkc

Tim Chase · Nov 9, 2009

I think simple regex may come handy,

p=re.compile(r'(.+) .*\1') #note the space
s=p.search("python and i love python")
s.groups()
(' python',)

But that matches for only one double word.Someone else could light up here
to extract all the double words.Then they can be removed from the original
paragraph.

This has multiple problems:
('python one',)

and even once you have the list of theoretical duplicates (by
changing the regexp to r'\b(\w+)\b.*?\1' perhaps), you still have
to worry about emitting the first instance but not subsequent
instances.

-tkc

How to remove the password from Outlook PST File?	3	Jun 19, 2024
RegExp - Match specific words, but not if they're inside parenthesis (with or without other words within)	6	Jan 29, 2023
How to remove the undefined thing?	1	Oct 19, 2022
How can I remove the extra space marked in the image attached to my Email HTML template?	2	Feb 25, 2023
word replacing in a paragraph	0	Jan 6, 2014
Single put routine overlapping words during iteration	4	Jan 2, 2023
Randomly remove elements from array	1	Aug 31, 2024
how to remove the punctuation and no need words from paragraphs	1	Nov 3, 2009

how to remove the same words in the paragraph

kylin

Andre Engels

Peter Otten

Tim Chase

Tim Chase

Tim Chase

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads