Cleaning up a string

J

James Stroud

Hello all,

I dashed off the following function to clean a string in a little
program I wrote:

def cleanup(astr, changes):
for f,t in changes:
atr = astr.replace(f, t)
return astr

where changes would be a tuple, for example:

changes = (
('%', '\%'),
('$', '\$'),
('-', '_')
)


If these were were single replacements (like the last), string.translate
would be the way to go. As it is, however, the above seems fairly
inefficient as it potentially creates a new string at each round. Does
some function or library exist for these types of transformations that
works more like string.translate or is the above the best one can hope
to do without writing some C? I'm guessing that "if s in astr" type
optimizations are already done in the replace() method, so that is not
really what I'm getting after.

James

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/
 
P

Peter Otten

James said:
I dashed off the following function to clean a string in a little
program I wrote:

def cleanup(astr, changes):
for f,t in changes:
atr = astr.replace(f, t)
return astr

where changes would be a tuple, for example:

changes = (
('%', '\%'),
('$', '\$'),
('-', '_')
)


If these were were single replacements (like the last), string.translate
would be the way to go. As it is, however, the above seems fairly
inefficient as it potentially creates a new string at each round. Does
some function or library exist for these types of transformations that
works more like string.translate or is the above the best one can hope
to do without writing some C? I'm guessing that "if s in astr" type
optimizations are already done in the replace() method, so that is not
really what I'm getting after.

unicode.translate() supports this kind of replacement...
u'a \\% b \\$ c_d'

and re.compile(...).sub() accepts a function:
.... return lookup[match.group()]
....
re.compile("([$%-])").sub(replace, "a % b $ c-d")
'a \\% b \\$ c_d'

Peter
 
M

MRAB

Hello all,

I dashed off the following function to clean a string in a little

program I wrote:

def cleanup(astr, changes):
for f,t in changes:
atr = astr.replace(f, t)
return astr

where changes would be a tuple, for example:

changes = (
('%', '\%'),
('$', '\$'),
('-', '_')
)

If these were were single replacements (like the last), string.translate
would be the way to go. As it is, however, the above seems fairly
inefficient as it potentially creates a new string at each round. Does
some function or library exist for these types of transformations that
works more like string.translate or is the above the best one can hope
to do without writing some C? I'm guessing that "if s in astr" type
optimizations are already done in the replace() method, so that is not
really what I'm getting after.
A simple way of replacing single characters would be this:

def cleanup(astr, changes):
changes = dict(changes)
return "".join(changes.get(c, c) for c in astr)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,228
Members
46,817
Latest member
AdalbertoT

Latest Threads

Top