How find all overlapping pattern?

P

Peng Yu

$string="abcabcabc";
@findall = $string =~ /abcabc/g;
print scalar(@findall), "\n";

The above commands will print 1 rather than 2. Because there are two
overlapping 'abcabc', I'd like to get 2. I'm wondering what is the
correct way to find all overlapping regexes. (Note that I gave
'abcabc' as an example, but it could be any complex regex) Thanks!
 
J

jl_post

$string="abcabcabc";
@findall = $string =~ /abcabc/g;
print scalar(@findall), "\n";

The above commands will print 1 rather than 2. Because there are two
overlapping 'abcabc', I'd like to get 2. I'm wondering what is the
correct way to find all overlapping regexes. (Note that I gave
'abcabc' as an example, but it could be any complex regex) Thanks!


Dear Peng Yu,

Here's one way to do it:

while ($string =~ m/(abcabc)/g)
{
push @findall, $1;
pos($string) = $-[0] + 1;
}

If you prefer to implement it in one line of code, you can do this:

push(@findall, $1) and pos($string) = $-[0] + 1
while $string =~ m/(abcabc)/g;

Here's the explanation of what is happening: Normally, m//g and
s///g both make additional matches AFTER (or right at) the end of the
previous match, meaning that you can't directly use them to find
overlapping patterns. However, inside a while($string =~ m//g) loop
you can manipulate the pos($string) variable to force m//g to begin
looking wherever you want -- or in your case, one character after the
start of the last match. (You have to start one (or more) characters
after, because if you started at (or before) the start of the last
match the loop would be infinite.)

As for the $-[0] variable, that's the first element of the @-
array, which you can look up with "perldoc -v @-". $-[0] is basically
the start of the last successful match, so ($-[0] + 1) would be the
earliest where you would want to continue your search for overlapping
patterns.

I hope this helps, Peng Yu.

Cheers,

-- Jean-Luc
 
C

ccc31807

$string="abcabcabc";
@findall = $string =~ /abcabc/g;
print scalar(@findall), "\n";

The above commands will print 1 rather than 2. Because there are two
overlapping 'abcabc', I'd like to get 2. I'm wondering what is the
correct way to find all overlapping regexes. (Note that I gave
'abcabc' as an example, but it could be any complex regex) Thanks!

You don't have to use a regular expression in a case like this. You
can use index($string, $substring, $position) in a loop, ending the
loop which $position is less than zero. This is how you might do it in
a language like C.

Sometimes, the simpler way is better.

CC.
 
I

Ilya Zakharevich

$string="abcabcabc";
@findall = $string =~ /abcabc/g;
print scalar(@findall), "\n";

The above commands will print 1 rather than 2. Because there are two
overlapping 'abcabc', I'd like to get 2. I'm wondering what is the
correct way to find all overlapping regexes. (Note that I gave
'abcabc' as an example, but it could be any complex regex) Thanks!

Do not use RExes which "move the match point too far" (i.e., match
more than one character). In some situations 0-length match may cause
a problem (non-intuitive semantic), but if the REx is ALWAYS matching
0-length substring, the match rules are intuitive again.

So use /(?=(abcabc))/g.

Hope this helps,
Ilya
 
C

C.DeRykus

You don't have to use a regular expression in a case like this. You
can use index($string, $substring, $position) in a loop, ending the
loop which $position is less than zero. This is how you might do it in
a language like C.

Sometimes, the simpler way is better.

True in some cases but, IMO, a regex
is shorter and arguably much easier
here:


$_ = "abcabcabc";
($count, $pos ) = ( 0, 0 );


# regex
$count++ while /(?=abcabc)/g and ++$pos;

vs.

# index
while ($pos != -1 ) {
$pos = index( $_, 'abcabc', $pos );
$count++,$pos++ unless $pos == -1;
}

# and a trap lurks with this alternative
while ($pos != -1 ) {
$pos = index( $_, 'abcabc', $pos );
$count++ and $pos++ unless $pos == -1;
}
 
P

Peter J. Holzer

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

You don't have to use a regular expression in a case like this. You
can use index($string, $substring, $position) in a loop,

You did read what the OP wrote, did you?

hp
 
C

ccc31807

You did read what the OP wrote, did you?

I did, and I thought about it. Several times in the past few weeks,
I've had problems with REs acting poorly, and used other means to do
what I needed to do, primarily index() and substr().

My point was not that an RE can always be replaced by built in
functions, but that an RE can sometimes be replaced by built in
functions.

CC.
 
J

jl_post

$string="abcabcabc";
@findall = $string =~ /abcabc/g;
print scalar(@findall), "\n";
The above commands will print 1 rather than 2. Because there are two
overlapping 'abcabc', I'd like to get 2. I'm wondering what is the
correct way to find all overlapping regexes. (Note that I gave
'abcabc' as an example, but it could be any complex regex) Thanks!



   Here's one way to do it:

      while ($string =~ m/(abcabc)/g)
      {
         push @findall, $1;
         pos($string) = $-[0] + 1;
      }


Hmmm... after reading the other replies, I think that:

@findall = $string =~ /(?=abcabc)/g;

(which uses a positive look-head) is probably the cleaner solution.

Just my opinion.

-- Jean-Luc
 
S

sln

Do not use RExes which "move the match point too far" (i.e., match
more than one character). In some situations 0-length match may cause
a problem (non-intuitive semantic), but if the REx is ALWAYS matching
0-length substring, the match rules are intuitive again.

So use /(?=(abcabc))/g.

s/ALWAYS/ONLY/

Nice, and the behavior should be the same if quantifiers and/or
assertions are added.

-sln
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top