regex newbie

Greg Carlson · Feb 19, 2004

I've looked through a number of books and faq's and such and haven't been
able to solve my regex conundrum. I need to find the first match before
another match. For example, with the string 'abcdefgabcdefgfooabcdefg', I
need to match 'foo' and the 'a' previous to but nearest 'foo' (not the one
at the beginning of the string). Also, there's an unknown number of
characters between the 'a' and the 'foo'. Any help would be greatly
appreciated.

Greg Carlson

Dave Cardwell · Feb 19, 2004

Greg Carlson said:
I've looked through a number of books and faq's and such and haven't been
able to solve my regex conundrum. I need to find the first match before
another match. For example, with the string 'abcdefgabcdefgfooabcdefg', I
need to match 'foo' and the 'a' previous to but nearest 'foo' (not the one
at the beginning of the string). Also, there's an unknown number of
characters between the 'a' and the 'foo'. Any help would be greatly
appreciated.

Greg Carlson

Normally a regular expression tries to gobble up as much as it can, in this
case it will try to match the 'a' furthest away from 'foo'.

To get round this, you can do:
/a[^a]*foo/
which will match an 'a', any number of anything-but-a, then foo.

Alternatively you can do:
/a.*?foo/
Here the ? makes the regexp 'not greedy'. That is, it will try to match
across the minimum amount of characters (hence the closest 'a' to 'foo').

Either would work, though I'd wager the second was using the best coding
practice.

Regards,

Brian McCauley · Feb 19, 2004

Greg Carlson said:
Subject: regex newbie

Please put the subject of your post in the Subject of your post. If
in doubt try this simple test. Imagine you could have been bothered
to have done a search before you posted. Next imagine you found a
thread with your subject line. Would you have been able to recognise
it as the same subject?

I've looked through a number of books and faq's and such and haven't been
able to solve my regex conundrum. I need to find the first match before
another match. For example, with the string 'abcdefgabcdefgfooabcdefg', I
need to match 'foo' and the 'a' previous to but nearest 'foo' (not the one
at the beginning of the string). Also, there's an unknown number of
characters between the 'a' and the 'foo'. Any help would be greatly
appreciated.

If 'a' really is a single character then see other response.

Otherwise I'd usually use...

/(.*)(a.*foo)/

Note this actually matches both everything before the desired target
and the desired target. Note also this finds the last 'a' before the
_last_ 'foo'.

--
\\ ( )
. _\\__[oo
.__/ \\ /\@
. l___\\
# ll l\\
###LL LL\\

Brian McCauley · Feb 19, 2004

/a[^a]*foo/
which will match an 'a', any number of anything-but-a, then foo.

That's the normal solution assuming 'a' really is single character.

Alternatively you can do:
/a.*?foo/
Here the ? makes the regexp 'not greedy'. That is, it will try to match
across the minimum amount of characters (hence the closest 'a' to 'foo').

Bzzzt! Non-geedy does not trump first-match.

--
\\ ( )
. _\\__[oo
.__/ \\ /\@
. l___\\
# ll l\\
###LL LL\\

Greg Carlson · Feb 19, 2004

Brian McCauley said:
Please put the subject of your post in the Subject of your post....

Oops. I see your point.

If 'a' really is a single character then see other response.

Otherwise I'd usually use...

/(.*)(a.*foo)/

Note this actually matches both everything before the desired target
and the desired target. Note also this finds the last 'a' before the
_last_ 'foo'.

That makes sense. So how would I find the last 'a' before the _first_ 'foo'?
My latest attempt is:

$tmp = 'abcdefgabcdefgfooabcdefgfoo';
$tmp =~ m/(foo)/ogcs;
[do stuff with $1] # this part works as I'd hoped
$tmp = substr($tmp, 0, pos($tmp));
$tmp =~ m/.*(a).+?$/os;

But that still got the first 'a'. Also, $tmp can be rather large so the
substr is a bit distasteful. Is there any way to search backward from the
current pos or something similar? Thanks again.

Greg Carlson

Glenn Jackman · Feb 19, 2004

Greg Carlson said:
That makes sense. So how would I find the last 'a' before the _first_ 'foo'?
My latest attempt is:

$tmp = 'abcdefgabcdefgfooabcdefgfoo';

my ($stuff) = $tmp =~ /(a[^a]*foo)/;

Glenn Jackman · Feb 19, 2004

Greg Carlson said:
That makes sense. So how would I find the last 'a' before the _first_ 'foo'?
My latest attempt is:

$tmp = 'abcdefgabcdefgfooabcdefgfoo';

As Dave Cardwell posted earlier:

my ($stuff) = $tmp =~ /(a[^a]*foo)/;

Brian McCauley · Feb 19, 2004

I shall assume that since you are still persuing this approach that in
your real problem 'a' is not a single character.

That makes sense. So how would I find the last 'a' before the _first_ 'foo'?
My latest attempt is:

$tmp = 'abcdefgabcdefgfooabcdefgfoo';
$tmp =~ m/(foo)/ogcs;

Don't put qualifiers on m// that you don't understand. /os have no
effect in the above line so if you understood them you'd not have used
them.

[do stuff with $1] # this part works as I'd hoped

Don't ever do stuff with $1 without first checking that the match
succeded. If you are sure that the match will succeded always then
append "or die" to it. This serves a dual function. Firstly it acts
a comment to anyone who reads your program meaning "I don't think this
match can ever fail". Secondly if it turns out you were wrong Perl
will tell you.

$tmp = substr($tmp, 0, pos($tmp));
$tmp =~ m/.*(a).+?$/os;

But that still got the first 'a'. Also, $tmp can be rather large so the
substr is a bit distasteful. Is there any way to search backward from the
current pos or something similar?

Yes, this is what \G is for - it anchors a regex at the current
pos()ition.

$_ = 'abcdefgabcde-FIRST-fooabcdefg-SECOND-foo';

# I assume pos()==0 initially
# Set pos() to be the end of first 'foo'
/foo/gc or die "no foo";

# Extract everything from the last 'a' before the current position
# to the current position.
/.*(a.*)\G/ or die "no a before first foo";

print "$1\n";

--
\\ ( )
. _\\__[oo
.__/ \\ /\@
. l___\\
# ll l\\
###LL LL\\

Brian McCauley · Feb 19, 2004

That well know clown Brian McCauley said:
Don't put qualifiers on m// that you don't understand.

Advice he'd do well to follow himself

$_ = 'abcdefgabcde-FIRST-fooabcdefg-SECOND-foo';
/foo/gc or die "no foo";
/.*(a.*)\G/ or die "no a before first foo";
print "$1\n";

The /c above does nothing.

$_ = 'abcdefgabcde-FIRST-fooabcdefg-SECOND-foo';
/foo/g or die "no foo";
/.*(a.*)\G/ or die "no a before first foo";
print "$1\n";

--
\\ ( )
. _\\__[oo
.__/ \\ /\@
. l___\\
# ll l\\
###LL LL\\

Brian McCauley · Feb 20, 2004

Showing a worrying trend towards insanity, Brian McCauley

Advice he'd do well to follow himself

Yeah, and like don't remove them from other people's code without
thinking either dude!

/foo/g or die "no foo";
/.*(a.*)\G/ or die "no a before first foo";

I suspect in the OP's problem the real target can span newlines so the
OP's use of /s is necessary in the second match.

/.*(a.*)\G/s or die "no a before first foo";

--
\\ ( )
. _\\__[oo
.__/ \\ /\@
. l___\\
# ll l\\
###LL LL\\

RegEx	0	Sep 1, 2022
Regex replace problem	2	Jan 6, 2022
SQL Connection string regex pattern to parse sections	1	May 9, 2024
regex question	7	Jun 20, 2013
FAQ 6.9 How can I quote a variable to use in a regex?	10	Apr 12, 2011
Clickable link conversion regex?	0	Nov 30, 2012
Complex regex question	1	Sep 26, 2009
Regex problem	2	Oct 8, 2007

regex newbie

Greg Carlson

Dave Cardwell

Brian McCauley

Brian McCauley

Greg Carlson

Glenn Jackman

Glenn Jackman

Brian McCauley

Brian McCauley

Brian McCauley

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads