C
Christos Georgiou
Say I have some string that begins with an arbitrary sequence of
characters
and then alternates repeating the letters 'a' and 'b' any number of times,
e.g.
"xyz123aaabbaabbbbababbbbaaabb"
I'm looking for a regular expression that matches the first, and only the
first, sequence of the letter 'a', and only if the length of the sequence
is
exactly 3.
Does such a regular expression exist? If so, any ideas as to what it
could
be?
Is this what you mean?
^[^a]*(a{3})(?:[^a].*)?$
Close, but the pattern should allow "arbitrary sequence of characters" that
precede the alternating a's and b's to contain the letter 'a'. In other
words, the pattern should accept:
"xayz123aaabbab"
since the 'a' between the 'x' and 'y' is not directly followed by a 'b'.
Your proposed pattern rejects this string.
1.
(a{3})(?:b[ab]*)?$
This finds the first (leftmost) "aaa" either at the end of the string or
followed by 'b' and then arbitrary sequences of 'a' and 'b'.
This will also match "aaaa" (from second position on).
2.
If you insist in only three 'a's and you can add the constraint that:
* let s be the "arbitrary sequence of characters" at the start of your
searched text
* len(s) >= 1 and not s.endswith('a')
then you'll have this reg.ex.
(?<=[^a])(a{3})(?:b[ab]*)?$
3.
If you want to allow for a possible empty "arbitrary sequence of characters"
at the start and you don't mind search speed
^(?:.?*[^a])?(a{3})(?:b[ab]*)?$
This should cover you:
'aaabbab's="xayzbaaa123aaabbab"
r=re.compile(r"^(?:.*?[^a])?(a{3})(?:b[ab]*)?$")
m= r.match(s)
m.group(1) 'aaa'
m.start(1) 11
s[11:]