re.sub and named groups

  • Thread starter Emanuele D'Arrigo
  • Start date
E

Emanuele D'Arrigo

Hi everybody,

I'm having a ball with the power of regular expression but I stumbled
on something I don't quite understand:

theOriginalString = "spam:(?P<first>.*) ham:(?P<second>.*)"
aReplacementPattern = "\(\?P<first>.*\)"
aReplacementString= "foo"
re.sub(aReplacementPattern , aReplacementString, theOriginalString)

results in :

"spam:foo"

instead, I was expecting:

"spam:foo ham:"

Why is that?

Thanks for your help!

Manu
 
M

MRAB

Emanuele said:
> Hi everybody,
>
> I'm having a ball with the power of regular expression but I stumbled
> on something I don't quite understand:
>
> theOriginalString = "spam:(?P<first>.*) ham:(?P<second>.*)"
> aReplacementPattern = "\(\?P<first>.*\)"
> aReplacementString= "foo"
> re.sub(aReplacementPattern , aReplacementString, theOriginalString)
>
> results in :
>
> "spam:foo"
>
> instead, I was expecting:
>
> "spam:foo ham:"
>
> Why is that?
>
> Thanks for your help!
>
The quantifiers eg "*" are normally greedy; they try to match as much as
possible. Therefore ".*" matches:

spam:(?P<first>.*) ham:(?P<second>.*)
^^^^^^^^^^^^^^^^^^^^^

You could use the lazy form "*?" which tries to match as little as
possible, eg "\(\?P<first>.*?\)" where the ".*?" matches:

spam:(?P<first>.*) ham:(?P<second>.*)
^^

giving "spam:foo ham:(?P<second>.*)".
 
E

Emanuele D'Arrigo

You could use the lazy form "*?" which tries to match as little as
possible, eg "\(\?P<first>.*?\)" where the ".*?" matches:
spam:(?P<first>.*) ham:(?P<second>.*)
giving "spam:foo ham:(?P<second>.*)".

A-ha! Of course! That makes perfect sense! Thank you! Problem solved!

Ciao!

Manu
 
Y

Yapo Sébastien

Hi everybody,

I'm having a ball with the power of regular expression but I stumbled
on something I don't quite understand:

theOriginalString = "spam:(?P<first>.*) ham:(?P<second>.*)"
aReplacementPattern = "\(\?P<first>.*\)"
aReplacementString= "foo"
re.sub(aReplacementPattern , aReplacementString, theOriginalString)

results in :

"spam:foo"

instead, I was expecting:

"spam:foo ham:"

Why is that?

Thanks for your help!

Manu
I think that .* in your replacement pattern matches ".*)
ham:(?P<second>.*" in your original string which seems correct for a regexp.
Perhaps you should try aReplacementPattern = "\(\?P<first>\.\*\)" or use
replace() since your replacement pattern is not a regexp anymore.

Sebastien
 
A

Aahz

I'm having a ball with the power of regular expression but I stumbled
on something I don't quite understand:

Book recommendation: _Mastering Regular Expressions_, Jeffrey Friedl
 
P

Paul McGuire

Hi everybody,

I'm having a ball with the power of regular expression

Don't forget the ball you can have with the power of ordinary Python
strings, string methods, and string interpolation!

originalString = "spam:%(first)s ham:%(second)s"
print originalString % { "first" : "foo" , "second" : "" }

prints

spam:foo ham:

with far fewer surprises and false steps. (Note: Py3K supports this
same feature, but with different interpolation syntax, which I have
not learned yet.)

Book recommendation: Text Processing in Python, by David Mertz (free
online - http://gnosis.cx/TPiP/), in which David advises against
dragging out the RE heavy artillery until you've at least thought
about using ordinary string methods.

-- Paul
 
R

Rhodri James

Don't forget the ball you can have with the power of ordinary Python
strings, string methods, and string interpolation!

So the moral of this story is take a ball of strings with you for
when you get lost in regular expressions.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,298
Messages
2,571,542
Members
48,283
Latest member
RitaVui655

Latest Threads

Top