regular expressions, substituting and adding in one step?

John Salerno · May 8, 2006

Ok, this might look familiar. I'd like to use regular expressions to
change this line:

self.source += '' + paragraph + '\n\n'

to read:

self.source += '%s\n\n' % paragraph

Now, matching the middle part and replacing it with '%s' is easy, but
how would I add the extra string to the end of the line? Is it done all
at once, or must I make a new regex to match?

Also, I figure I'd use a group to match the word 'paragraph', and use
that group to insert the word at the end, but how will I 'retain' the
state of \1 if I use more than one regex to do this?

I'd like to do this for several lines, so I'm trying not to make it too
specific (i.e., matching the entire line, for example, and then adding
text after it, if that's possible).

So the questions are, how do you use regular expressions to add text to
the end of a line, even if you aren't matching the end of the line in
the first place? Or does that entail using separate regexes that *do* do
this? If the latter, how do I retain the value of the groups taken from
the first re?

Thanks, hope that made some sense.

John Salerno · May 8, 2006

John said:
So the questions are, how do you use regular expressions to add text to
the end of a line, even if you aren't matching the end of the line in
the first place? Or does that entail using separate regexes that *do* do
this? If the latter, how do I retain the value of the groups taken from
the first re?

Here's what I have so far:

-----------

import re

txt_file = open(r'C:\Python24\myscripts\re_test.txt')
new_string = re.sub(r"' \+ ([a-z]+) \+ '", '%s', txt_file.read())
new_string = re.sub(r'$', ' % paragraph', new_string)
txt_file.close()

-----------

re_test.txt contains:

self.source += '' + paragraph + '\n\n'

Both substitutions work, but now I just need to figure out how to
replace the hard-coded ' % paragraph' parameter with something that uses
the group taken from the first regex. I'm guessing if I don't use it at
that time, then it's lost. I suppose I could create a MatchObject and
save group(1) as a variable for later use, but that would be a lot of
extra steps, so I wanted to see if there's a way to do it all at one
time with regular expressions.

Thanks.

Paul McGuire · May 8, 2006

John Salerno said:
Ok, this might look familiar. I'd like to use regular expressions to
change this line:

self.source += '' + paragraph + '\n\n'

to read:

self.source += '%s\n\n' % paragraph

John -

You've been asking for re-based responses, so I apologize in advance for
this digression. Pyparsing is an add-on Python module that can provide a
number of features beyond just text matching and parsing. Pyparsing allows
you to define callbacks (or "parse actions") that get invoked during the
parsing process, and these callbacks can modify the matched text.

Since your re approach seems to be on a fairly convergent path, I felt I
needed to come up with more demanding examples to justify a pyparsing
solution. So I contrived these additional cases:

self.source += '' + paragraph + '\n\n'
listItem1 = '<li>' + someText + '</li>'
listItem2 = '<li>' + someMoreText + '</li>'
self.source += '<ul>' + listItem1 + '\n' + listItem2 + '\n' + '</ul>\n\n'

The following code processes these expressions. Admittedly, it is not as
terse as your re-based code samples have been, but it may give you another
data point in your pursuite of a solution. (The pyparsing home wiki is at
http://pyparsing.wikispaces.com.)

The purpose of the intermediate classes is to convert the individual terms
of the string expresssion into a list of string terms, either variable
references or quoted literals. This conversion is done in the term-specific
parse actions created by makeTermParseAction. Then the overall string
expression gets its own parse action, which processes the list of term
objects, and creates the modified string expression. Two different string
expression conversion functions are shown, one generating string
interpolation expressions, and one generating "".join() expressions.

Hope this helps, or is at least mildly entertaining,
-- Paul

================
from pyparsing import *

testLines = r"""
self.source += '' + paragraph + '\n\n'
listItem1 = '<li>' + someText + '</li>'
listItem2 = '<li>' + someMoreText + '</li>'
self.source += '<ul>' + listItem1 + '\n' + listItem2 + '\n' + '</ul>\n\n'
"""

# define some classes to use during parsing
class StringExprTerm(object):
def __init__(self,content):
self.content = content

class VarRef(StringExprTerm):
pass

class QuotedLit(StringExprTerm):
pass

def makeTermParseAction(cls):
def parseAction(s,l,tokens):
return cls(tokens[0])
return parseAction

# define parts we want to recognize as terms in a string expression
varName = Word(alphas+"_", alphanums+"_")
varName.setParseAction( makeTermParseAction( VarRef ) )
quotedString.setParseAction( removeQuotes, makeTermParseAction(
QuotedLit ) )
stringTerm = varName | quotedString

# define a string expression in terms of term expressions
PLUS = Suppress("+")
EQUALS = Suppress("=")
stringExpr = EQUALS + stringTerm + ZeroOrMore( PLUS + stringTerm )

# define a parse action, to be invoked every time a string expression is
found
def interpolateTerms(originalString,locn,tokens):
out = []
refs = []
terms = tokens
for term in terms:
if isinstance(term,QuotedLit):
out.append( term.content )
elif isinstance(term,VarRef):
out.append( "%s" )
refs.append( term.content )
else:
print "hey! this is impossible!"

# generate string to be interpolated, and interp operator
outstr = "'" + "".join(out) + "' % "

# generate interpolation argument tuple
if len(refs) > 1:
outstr += "(" + ",".join(refs) + ")"
else:
outstr += ",".join(refs)

# return generated string (don't forget leading = sign)
return "= " + outstr

stringExpr.setParseAction( interpolateTerms )

print "Original:",
print testLines
print
print "Modified:",
print stringExpr.transformString( testLines )

# define slightly different parse action, to use list join instead of string
interp
def createListJoin(originalString,locn,tokens):
out = []
terms = tokens
for term in terms:
if isinstance(term,QuotedLit):
out.append( "'" + term.content + "'" )
elif isinstance(term,VarRef):
out.append( term.content )
else:
print "hey! this is impossible!"

# generate string to be interpolated, and interp operator
outstr = "[" + ",".join(out) + "]"

# return generated string (don't forget leading = sign)
return "= ''.join(" + outstr + ")"

del stringExpr.parseAction[:]
stringExpr.setParseAction( createListJoin )

print
print "Modified (2):",
print stringExpr.transformString( testLines )

================
Prints out:
Original:
self.source += '' + paragraph + '\n\n'
listItem1 = '<li>' + someText + '</li>'
listItem2 = '<li>' + someMoreText + '</li>'
self.source += '<ul>' + listItem1 + '\n' + listItem2 + '\n' + '</ul>\n\n'

Modified:
self.source += '%s\n\n' % paragraph
listItem1 = '<li>%s</li>' % someText
listItem2 = '<li>%s</li>' % someMoreText
self.source += '<ul>%s\n%s\n</ul>\n\n' % (listItem1,listItem2)

Modified (2):
self.source += ''.join(['',paragraph,'\n\n'])
listItem1 = ''.join(['<li>',someText,'</li>'])
listItem2 = ''.join(['<li>',someMoreText,'</li>'])
self.source += ''.join(['<ul>',listItem1,'\n',listItem2,'\n','</ul>\n\n'])
================

Kent Johnson · May 9, 2006

John said:
Ok, this might look familiar. I'd like to use regular expressions to
change this line:

self.source += '' + paragraph + '\n\n'

to read:

self.source += '%s\n\n' % paragraph

Now, matching the middle part and replacing it with '%s' is easy, but
how would I add the extra string to the end of the line? Is it done all
at once, or must I make a new regex to match?

Also, I figure I'd use a group to match the word 'paragraph', and use
that group to insert the word at the end, but how will I 'retain' the
state of \1 if I use more than one regex to do this?

Do it all in one match / substitution using \1 to insert the value of
the paragraph group at the new location:

In [19]: test = "self.source += '' + paragraph + '\n\n'"

In [20]: re.sub(r"'' \+ (.*?) \+ '\n\n'", r"'%s\n\n' %
\1", test)
Out[20]: "self.source += '%s\n\n' % paragraph"

Kent

John Salerno · May 9, 2006

Kent said:
Do it all in one match / substitution using \1 to insert the value of
the paragraph group at the new location:

In [19]: test = "self.source += '' + paragraph + '\n\n'"

In [20]: re.sub(r"'' \+ (.*?) \+ '\n\n'", r"'%s\n\n' %
\1", test)
Out[20]: "self.source += '%s\n\n' % paragraph"

Interesting. Thanks! I was just doing some more reading of the re
module, so now I understand sub() better. I'll give this a try too. Call
me crazy, but I'm interested in regular expressions right now.

Kent Johnson · May 10, 2006

John said:
Call
me crazy, but I'm interested in regular expressions right now.

Not crazy at all. REs are a powerful and useful tool that every
programmer should know how to use. They're just not the right tool for
every job!

Kent

John Salerno · May 10, 2006

Kent said:
They're just not the right tool for
every job!

Thank god for that! As easy as they've become to me (after seeming
utterly cryptic and impenetrable), they are still a little unwieldy.
Next step: learn how to write look-ahead and look-behind REs!

Utility to locate errors in regular expressions	3	May 24, 2013
Python Regular Expressions	4	Jun 22, 2011
Use of compile flags in regular expressions.	0	Jul 19, 2012
regular expressions and matching delimeters	17	May 21, 2014
Large regular expressions	1	Mar 15, 2010
regular expressions, stack and nesting	2	Mar 22, 2009
Groups in regular expressions don't repeat as expected	7	Apr 20, 2011
Regular expressions, capture repeated groups	4	Jul 8, 2010

regular expressions, substituting and adding in one step?

John Salerno

John Salerno

Paul McGuire

Kent Johnson

John Salerno

Kent Johnson

John Salerno

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads