John Salerno said:
Ok, this might look familiar. I'd like to use regular expressions to
change this line:
self.source += '<p>' + paragraph + '</p>\n\n'
to read:
self.source += '<p>%s</p>\n\n' % paragraph
John -
You've been asking for re-based responses, so I apologize in advance for
this digression. Pyparsing is an add-on Python module that can provide a
number of features beyond just text matching and parsing. Pyparsing allows
you to define callbacks (or "parse actions") that get invoked during the
parsing process, and these callbacks can modify the matched text.
Since your re approach seems to be on a fairly convergent path, I felt I
needed to come up with more demanding examples to justify a pyparsing
solution. So I contrived these additional cases:
self.source += '<p>' + paragraph + '</p>\n\n'
listItem1 = '<li>' + someText + '</li>'
listItem2 = '<li>' + someMoreText + '</li>'
self.source += '<ul>' + listItem1 + '\n' + listItem2 + '\n' + '</ul>\n\n'
The following code processes these expressions. Admittedly, it is not as
terse as your re-based code samples have been, but it may give you another
data point in your pursuite of a solution. (The pyparsing home wiki is at
http://pyparsing.wikispaces.com.)
The purpose of the intermediate classes is to convert the individual terms
of the string expresssion into a list of string terms, either variable
references or quoted literals. This conversion is done in the term-specific
parse actions created by makeTermParseAction. Then the overall string
expression gets its own parse action, which processes the list of term
objects, and creates the modified string expression. Two different string
expression conversion functions are shown, one generating string
interpolation expressions, and one generating "".join() expressions.
Hope this helps, or is at least mildly entertaining,
-- Paul
================
from pyparsing import *
testLines = r"""
self.source += '<p>' + paragraph + '</p>\n\n'
listItem1 = '<li>' + someText + '</li>'
listItem2 = '<li>' + someMoreText + '</li>'
self.source += '<ul>' + listItem1 + '\n' + listItem2 + '\n' + '</ul>\n\n'
"""
# define some classes to use during parsing
class StringExprTerm(object):
def __init__(self,content):
self.content = content
class VarRef(StringExprTerm):
pass
class QuotedLit(StringExprTerm):
pass
def makeTermParseAction(cls):
def parseAction(s,l,tokens):
return cls(tokens[0])
return parseAction
# define parts we want to recognize as terms in a string expression
varName = Word(alphas+"_", alphanums+"_")
varName.setParseAction( makeTermParseAction( VarRef ) )
quotedString.setParseAction( removeQuotes, makeTermParseAction(
QuotedLit ) )
stringTerm = varName | quotedString
# define a string expression in terms of term expressions
PLUS = Suppress("+")
EQUALS = Suppress("=")
stringExpr = EQUALS + stringTerm + ZeroOrMore( PLUS + stringTerm )
# define a parse action, to be invoked every time a string expression is
found
def interpolateTerms(originalString,locn,tokens):
out = []
refs = []
terms = tokens
for term in terms:
if isinstance(term,QuotedLit):
out.append( term.content )
elif isinstance(term,VarRef):
out.append( "%s" )
refs.append( term.content )
else:
print "hey! this is impossible!"
# generate string to be interpolated, and interp operator
outstr = "'" + "".join(out) + "' % "
# generate interpolation argument tuple
if len(refs) > 1:
outstr += "(" + ",".join(refs) + ")"
else:
outstr += ",".join(refs)
# return generated string (don't forget leading = sign)
return "= " + outstr
stringExpr.setParseAction( interpolateTerms )
print "Original:",
print testLines
print
print "Modified:",
print stringExpr.transformString( testLines )
# define slightly different parse action, to use list join instead of string
interp
def createListJoin(originalString,locn,tokens):
out = []
terms = tokens
for term in terms:
if isinstance(term,QuotedLit):
out.append( "'" + term.content + "'" )
elif isinstance(term,VarRef):
out.append( term.content )
else:
print "hey! this is impossible!"
# generate string to be interpolated, and interp operator
outstr = "[" + ",".join(out) + "]"
# return generated string (don't forget leading = sign)
return "= ''.join(" + outstr + ")"
del stringExpr.parseAction[:]
stringExpr.setParseAction( createListJoin )
print
print "Modified (2):",
print stringExpr.transformString( testLines )
================
Prints out:
Original:
self.source += '<p>' + paragraph + '</p>\n\n'
listItem1 = '<li>' + someText + '</li>'
listItem2 = '<li>' + someMoreText + '</li>'
self.source += '<ul>' + listItem1 + '\n' + listItem2 + '\n' + '</ul>\n\n'
Modified:
self.source += '<p>%s</p>\n\n' % paragraph
listItem1 = '<li>%s</li>' % someText
listItem2 = '<li>%s</li>' % someMoreText
self.source += '<ul>%s\n%s\n</ul>\n\n' % (listItem1,listItem2)
Modified (2):
self.source += ''.join(['<p>',paragraph,'</p>\n\n'])
listItem1 = ''.join(['<li>',someText,'</li>'])
listItem2 = ''.join(['<li>',someMoreText,'</li>'])
self.source += ''.join(['<ul>',listItem1,'\n',listItem2,'\n','</ul>\n\n'])
================