How find all overlapping pattern?

Peng Yu · Feb 7, 2011

$string="abcabcabc";
@findall = $string =~ /abcabc/g;
print scalar(@findall), "\n";

The above commands will print 1 rather than 2. Because there are two
overlapping 'abcabc', I'd like to get 2. I'm wondering what is the
correct way to find all overlapping regexes. (Note that I gave
'abcabc' as an example, but it could be any complex regex) Thanks!

jl_post · Feb 7, 2011

$string="abcabcabc";
@findall = $string =~ /abcabc/g;
print scalar(@findall), "\n";

The above commands will print 1 rather than 2. Because there are two
overlapping 'abcabc', I'd like to get 2. I'm wondering what is the
correct way to find all overlapping regexes. (Note that I gave
'abcabc' as an example, but it could be any complex regex) Thanks!

Dear Peng Yu,

Here's one way to do it:

while ($string =~ m/(abcabc)/g)
{
push @findall, $1;
pos($string) = $-[0] + 1;
}

If you prefer to implement it in one line of code, you can do this:

push(@findall, $1) and pos($string) = $-[0] + 1
while $string =~ m/(abcabc)/g;

Here's the explanation of what is happening: Normally, m//g and
s///g both make additional matches AFTER (or right at) the end of the
previous match, meaning that you can't directly use them to find
overlapping patterns. However, inside a while($string =~ m//g) loop
you can manipulate the pos($string) variable to force m//g to begin
looking wherever you want -- or in your case, one character after the
start of the last match. (You have to start one (or more) characters
after, because if you started at (or before) the start of the last
match the loop would be infinite.)

As for the $-[0] variable, that's the first element of the @-
array, which you can look up with "perldoc -v @-". $-[0] is basically
the start of the last successful match, so ($-[0] + 1) would be the
earliest where you would want to continue your search for overlapping
patterns.

I hope this helps, Peng Yu.

Cheers,

-- Jean-Luc

ccc31807 · Feb 7, 2011

$string="abcabcabc";
@findall = $string =~ /abcabc/g;
print scalar(@findall), "\n";

The above commands will print 1 rather than 2. Because there are two
overlapping 'abcabc', I'd like to get 2. I'm wondering what is the
correct way to find all overlapping regexes. (Note that I gave
'abcabc' as an example, but it could be any complex regex) Thanks!

You don't have to use a regular expression in a case like this. You
can use index($string, $substring, $position) in a loop, ending the
loop which $position is less than zero. This is how you might do it in
a language like C.

Sometimes, the simpler way is better.

CC.

Ilya Zakharevich · Feb 7, 2011

$string="abcabcabc";
@findall = $string =~ /abcabc/g;
print scalar(@findall), "\n";

The above commands will print 1 rather than 2. Because there are two
overlapping 'abcabc', I'd like to get 2. I'm wondering what is the
correct way to find all overlapping regexes. (Note that I gave
'abcabc' as an example, but it could be any complex regex) Thanks!

Do not use RExes which "move the match point too far" (i.e., match
more than one character). In some situations 0-length match may cause
a problem (non-intuitive semantic), but if the REx is ALWAYS matching
0-length substring, the match rules are intuitive again.

So use /(?=(abcabc))/g.

Hope this helps,
Ilya

C.DeRykus · Feb 8, 2011

You don't have to use a regular expression in a case like this. You
can use index($string, $substring, $position) in a loop, ending the
loop which $position is less than zero. This is how you might do it in
a language like C.

Sometimes, the simpler way is better.

True in some cases but, IMO, a regex
is shorter and arguably much easier
here:

$_ = "abcabcabc";
($count, $pos ) = ( 0, 0 );

# regex
$count++ while /(?=abcabc)/g and ++$pos;

vs.

# index
while ($pos != -1 ) {
$pos = index( $_, 'abcabc', $pos );
$count++,$pos++ unless $pos == -1;
}

# and a trap lurks with this alternative
while ($pos != -1 ) {
$pos = index( $_, 'abcabc', $pos );
$count++ and $pos++ unless $pos == -1;
}

Peter J. Holzer · Feb 8, 2011

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

You don't have to use a regular expression in a case like this. You
can use index($string, $substring, $position) in a loop,

You did read what the OP wrote, did you?

hp

ccc31807 · Feb 8, 2011

You did read what the OP wrote, did you?

I did, and I thought about it. Several times in the past few weeks,
I've had problems with REs acting poorly, and used other means to do
what I needed to do, primarily index() and substr().

My point was not that an RE can always be replaced by built in
functions, but that an RE can sometimes be replaced by built in
functions.

CC.

jl_post · Feb 8, 2011

$string="abcabcabc";
@findall = $string =~ /abcabc/g;
print scalar(@findall), "\n";

Click to expand...

The above commands will print 1 rather than 2. Because there are two
overlapping 'abcabc', I'd like to get 2. I'm wondering what is the
correct way to find all overlapping regexes. (Note that I gave
'abcabc' as an example, but it could be any complex regex) Thanks!

Click to expand...

Here's one way to do it:

while ($string =~ m/(abcabc)/g)
{
push @findall, $1;
pos($string) = $-[0] + 1;
}

Hmmm... after reading the other replies, I think that:

@findall = $string =~ /(?=abcabc)/g;

(which uses a positive look-head) is probably the cleaner solution.

Just my opinion.

-- Jean-Luc

sln · Feb 8, 2011

Do not use RExes which "move the match point too far" (i.e., match
more than one character). In some situations 0-length match may cause
a problem (non-intuitive semantic), but if the REx is ALWAYS matching
0-length substring, the match rules are intuitive again.

So use /(?=(abcabc))/g.

s/ALWAYS/ONLY/

Nice, and the behavior should be the same if quantifiers and/or
assertions are added.

-sln

Weird Behavior with Rays in C and OpenGL	4	Feb 13, 2024
Searching for Regular Expressions in a string WITH overlap	1	Nov 21, 2008
FAQ 4.23 How do I find matching/nesting anything?	0	Apr 2, 2011
eval()ing a pattern substitution under 'use strict' and lexical scope	2	Jan 25, 2012
Must be a bug in the re module [was: Why this result with the remodule]	0	Nov 3, 2010
split()'s pattern argument	3	Mar 30, 2006
Tasks	1	Nov 29, 2022
Python Regex error	0	Mar 30, 2012

How find all overlapping pattern?

Peng Yu

jl_post

ccc31807

Ilya Zakharevich

C.DeRykus

Peter J. Holzer

ccc31807

jl_post

sln

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads