python replace/sub/wildcard/regex issue

tom · Jan 19, 2010

hi...

trying to figure out how to solve what should be an easy python/regex/
wildcard/replace issue.

i've tried a number of different approaches.. so i must be missing
something...

my initial sample text are:

Soo ChoiLONGEDITBOX">Apryl Berney
Soo ChoiLONGEDITBOX">Joel Franks
Joel FranksGEDITBOX">Alexander Yamato

and i'm trying to get

Soo Choi foo Apryl Berney
Soo Choi foo Joel Franks
Joel Franks foo Alexander Yamato

the issue i'm facing.. is how to start at "</" and end at '">' and
substitute inclusive of the stuff inside the regex...

i've tried derivations of

name=re.sub("</s[^>]*\">"," foo ",name)

but i'm missing something...

thoughts... thanks

tom

Chris Rebert · Jan 19, 2010

hi...

trying to figure out how to solve what should be an easy python/regex/
wildcard/replace issue.

i've tried a number of different approaches.. so i must be missing
something...

my initial sample text are:

Soo ChoiLONGEDITBOX">Apryl Berney
Soo ChoiLONGEDITBOX">Joel Franks
Joel FranksGEDITBOX">Alexander Yamato

and i'm trying to get

Soo Choi foo Apryl Berney
Soo Choi foo Joel Franks
Joel Franks foo Alexander Yamato

the issue i'm facing.. is how to start at "</" and end at '">' and
substitute inclusive of the stuff inside the regex...

i've tried derivations of

name=re.sub("</s[^>]*\">"," foo ",name)

but i'm missing something...

thoughts... thanks

"Some people, when confronted with a problem, think 'I know, I'll use
regular expressions.' Now they have two problems."

Assuming your sample text is representative of all your test:

new_text = "\n".join(line[:line.index('<')] +
line[line.rindex('>')+1:] for line in your_text.split('\n'))

Cheers,
Chris

alex23 · Jan 19, 2010

trying to figure out how to solve what should be an easy python/regex/
wildcard/replace issue.
but i'm missing something...

Well, some would say you've missed the most obvious solution of _not_
using regexps

I'd probably do it via string methods wrapped up in a helper function:
.... first, rest = text.split('<', 1)
.... ignore, last = rest.rsplit('>', 1)
.... return '%s foo %s' % (first, last)
....'Joel Franks foo Alexander Yamato'

Chris Rebert · Jan 19, 2010

hi...

trying to figure out how to solve what should be an easy python/regex/
wildcard/replace issue.

i've tried a number of different approaches.. so i must be missing
something...

my initial sample text are:

Soo ChoiLONGEDITBOX">Apryl Berney
Soo ChoiLONGEDITBOX">Joel Franks
Joel FranksGEDITBOX">Alexander Yamato

and i'm trying to get

Soo Choi foo Apryl Berney
Soo Choi foo Joel Franks
Joel Franks foo Alexander Yamato

the issue i'm facing.. is how to start at "</" and end at '">' and
substitute inclusive of the stuff inside the regex...

i've tried derivations of

name=re.sub("</s[^>]*\">"," foo ",name)

but i'm missing something...

thoughts... thanks

Click to expand...

"Some people, when confronted with a problem, think 'I know, I'll use
regular expressions.' Now they have two problems."

Assuming your sample text is representative of all your test:

new_text = "\n".join(line[:line.index('<')] + line[line.rindex('>')+1:] for line in your_text.split('\n'))

Erm, remembering to intersperse the "foo" (should be all 1-line, bloody Gmail):
new_text = "\n".join(line[:line.index('<')] + " foo " +
line[line.rindex('>')+1:] for line in your_text.split('\n'))

Or just use alex23's method, which seems all-round superior.

Cheers,
Chris

dippim · Jan 19, 2010

hi...

trying to figure out how to solve what should be an easy python/regex/
wildcard/replace issue.

i've tried a number of different approaches.. so i must be missing
something...

my initial sample text are:

Soo ChoiLONGEDITBOX">Apryl Berney
Soo ChoiLONGEDITBOX">Joel Franks
Joel FranksGEDITBOX">Alexander Yamato

and i'm trying to get

Soo Choi foo Apryl Berney
Soo Choi foo Joel Franks
Joel Franks foo Alexander Yamato

the issue i'm facing.. is how to start at "</" and end at '">' and
substitute inclusive of the stuff inside the regex...

i've tried derivations of

name=re.sub("</s[^>]*\">"," foo ",name)

but i'm missing something...

thoughts... thanks

tom

The problem here is that </s matches itself correctly. However, [^>]*
consumes anything that's not > and then stops when it hits something
that is >. So, [^>]* consumes "pan" in each case, then tries to match
\">, but fails since there isn't a ", so the match ends. It never
makes it to the second >.

I agree with Chris Rebert, regexes are dangerous because the number of
possible cases where you can match isn't always clear (see the above
explanation

. Also, if the number of comparisons you have to do
isn't high, they can be inefficient. However, for your limited set of
examples the following should work:

aList = ['Soo ChoiLONGEDITBOX">Apryl Berney',
'Soo ChoiLONGEDITBOX">Joel Franks',
'Joel FranksGEDITBOX">Alexander Yamato']

matcher = re.compile(r"<[\w\W]*>")

newList = []
for x in aList:
newList.append(matcher.sub(" foo ", x))

print newList

David

python/xpath issue..	0	Aug 25, 2008
python - firefox dom/xpath question/issue	1	Aug 25, 2008
Strange crash issue on Windows w/ PyGTK, Cairo...	8	Mar 18, 2009
possible issue with mechanize/python parsing	0	Jul 10, 2006
mechanize select_form issue..	0	Jul 10, 2006
Pyglet on Python3.x, problems	5	Jul 29, 2013
Standard Forth versus Python: a case study	42	Oct 11, 2006
anybody help me	1	Feb 10, 2006

python replace/sub/wildcard/regex issue

tom

Chris Rebert

alex23

Chris Rebert

dippim

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads