python replace/sub/wildcard/regex issue

T

tom

hi...

trying to figure out how to solve what should be an easy python/regex/
wildcard/replace issue.

i've tried a number of different approaches.. so i must be missing
something...

my initial sample text are:

Soo Choi</span>LONGEDITBOX">Apryl Berney
Soo Choi</span>LONGEDITBOX">Joel Franks
Joel Franks</span>GEDITBOX">Alexander Yamato

and i'm trying to get

Soo Choi foo Apryl Berney
Soo Choi foo Joel Franks
Joel Franks foo Alexander Yamato

the issue i'm facing.. is how to start at "</" and end at '">' and
substitute inclusive of the stuff inside the regex...

i've tried derivations of

name=re.sub("</s[^>]*\">"," foo ",name)

but i'm missing something...

thoughts... thanks

tom
 
C

Chris Rebert

hi...

trying to figure out how to solve what should be an easy python/regex/
wildcard/replace issue.

i've tried a number of different approaches.. so i must be missing
something...

my initial sample text are:

Soo Choi</span>LONGEDITBOX">Apryl Berney
Soo Choi</span>LONGEDITBOX">Joel Franks
Joel Franks</span>GEDITBOX">Alexander Yamato

and i'm trying to get

Soo Choi foo Apryl Berney
Soo Choi foo Joel Franks
Joel Franks foo Alexander Yamato

the issue i'm facing.. is how to start at "</" and end at '">' and
substitute inclusive of the stuff inside the regex...

i've tried derivations of

name=re.sub("</s[^>]*\">"," foo ",name)

but i'm missing something...

thoughts... thanks

"Some people, when confronted with a problem, think 'I know, I'll use
regular expressions.' Now they have two problems."

Assuming your sample text is representative of all your test:

new_text = "\n".join(line[:line.index('<')] +
line[line.rindex('>')+1:] for line in your_text.split('\n'))

Cheers,
Chris
 
A

alex23

trying to figure out how to solve what should be an easy python/regex/
wildcard/replace issue.
but i'm missing something...

Well, some would say you've missed the most obvious solution of _not_
using regexps :)

I'd probably do it via string methods wrapped up in a helper function:
.... first, rest = text.split('<', 1)
.... ignore, last = rest.rsplit('>', 1)
.... return '%s foo %s' % (first, last)
....'Joel Franks foo Alexander Yamato'
 
C

Chris Rebert

hi...

trying to figure out how to solve what should be an easy python/regex/
wildcard/replace issue.

i've tried a number of different approaches.. so i must be missing
something...

my initial sample text are:

Soo Choi</span>LONGEDITBOX">Apryl Berney
Soo Choi</span>LONGEDITBOX">Joel Franks
Joel Franks</span>GEDITBOX">Alexander Yamato

and i'm trying to get

Soo Choi foo Apryl Berney
Soo Choi foo Joel Franks
Joel Franks foo Alexander Yamato

the issue i'm facing.. is how to start at "</" and end at '">' and
substitute inclusive of the stuff inside the regex...

i've tried derivations of

name=re.sub("</s[^>]*\">"," foo ",name)

but i'm missing something...

thoughts... thanks

"Some people, when confronted with a problem, think 'I know, I'll use
regular expressions.' Now they have two problems."

Assuming your sample text is representative of all your test:

new_text = "\n".join(line[:line.index('<')] + line[line.rindex('>')+1:] for line in your_text.split('\n'))

Erm, remembering to intersperse the "foo" (should be all 1-line, bloody Gmail):
new_text = "\n".join(line[:line.index('<')] + " foo " +
line[line.rindex('>')+1:] for line in your_text.split('\n'))

Or just use alex23's method, which seems all-round superior. :)

Cheers,
Chris
 
D

dippim

hi...

trying to figure out how to solve what should be an easy python/regex/
wildcard/replace issue.

i've tried a number of different approaches.. so i must be missing
something...

my initial sample text are:

Soo Choi</span>LONGEDITBOX">Apryl Berney
Soo Choi</span>LONGEDITBOX">Joel Franks
Joel Franks</span>GEDITBOX">Alexander Yamato

and i'm trying to get

Soo Choi foo Apryl Berney
Soo Choi foo Joel Franks
Joel Franks foo Alexander Yamato

the issue i'm facing.. is how to start at "</" and end at '">' and
substitute inclusive of the stuff inside the regex...

i've tried derivations of

name=re.sub("</s[^>]*\">"," foo ",name)

but i'm missing something...

thoughts... thanks

tom

The problem here is that </s matches itself correctly. However, [^>]*
consumes anything that's not > and then stops when it hits something
that is >. So, [^>]* consumes "pan" in each case, then tries to match
\">, but fails since there isn't a ", so the match ends. It never
makes it to the second >.

I agree with Chris Rebert, regexes are dangerous because the number of
possible cases where you can match isn't always clear (see the above
explanation :). Also, if the number of comparisons you have to do
isn't high, they can be inefficient. However, for your limited set of
examples the following should work:

aList = ['Soo Choi</span>LONGEDITBOX">Apryl Berney',
'Soo Choi</span>LONGEDITBOX">Joel Franks',
'Joel Franks</span>GEDITBOX">Alexander Yamato']

matcher = re.compile(r"<[\w\W]*>")

newList = []
for x in aList:
newList.append(matcher.sub(" foo ", x))

print newList

David
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,967
Messages
2,570,148
Members
46,694
Latest member
LetaCadwal

Latest Threads

Top