inefficient regex - please help!

Mothra · Apr 23, 2004

Am trying to match n occurences of a phrase ("Lost connection") in a text
file.

Here's the code I;ve got so far (I'm reading the whole file into the scalar
$lines)...

if ($lines =~ /(.*Lost connection.*(\n.*)*{4}/) {
print "yep\n";
} else {
print "nope\n";
}

The above works and prints "yep" if 4 or more occurences are found; however,
it's quite slow and if I want to match more occurences (e.g. 9) it takes
forever: almost as if the length of time the script takes to run is being
raised to the power of the number of occurences I'm trying to match!

How could I rewrite this regex more efficiently?

Toni Erdmann · Apr 23, 2004

Mothra said:
Am trying to match n occurences of a phrase ("Lost connection") in a text
file.

Here's the code I;ve got so far (I'm reading the whole file into the scalar
$lines)...

if ($lines =~ /(.*Lost connection.*(\n.*)*{4}/) {
print "yep\n";
} else {
print "nope\n";
}

The above works and prints "yep" if 4 or more occurences are found; however,
it's quite slow and if I want to match more occurences (e.g. 9) it takes
forever: almost as if the length of time the script takes to run is being
raised to the power of the number of occurences I'm trying to match!

How could I rewrite this regex more efficiently?

$count = 0;
while ($lines=~ /Lost connection/g) { $count++; }

You can 'break' the while loop: 'break if ( $count >= 4 );'
or any number you like

Toni

Gunnar Hjalmarsson · Apr 23, 2004

Mothra said:
Am trying to match n occurences of a phrase ("Lost connection") in
a text file.

Here's the code I;ve got so far (I'm reading the whole file into
the scalar $lines)...

if ($lines =~ /(.*Lost connection.*(\n.*)*{4}/) {
print "yep\n";
} else {
print "nope\n";
}

Hmm.. Yes, the regex seems unnecessarily complicated. Another approach:

my $cnt;
while ($lines =~ /Lost connection/g) {
if (++$cnt == 4) {
print "yep\n";
last;
}
}
print "nope\n" unless $cnt == 4;

Maybe not very elegant, but I'd guess it's faster.

The index() function might be an alternative, since the pattern is a
plain string.

Mothra · Apr 23, 2004

Toni Erdmann said:
$count = 0;
while ($lines=~ /Lost connection/g) { $count++; }

You can 'break' the while loop: 'break if ( $count >= 4 );'
or any number you like

Sorry but I meant specifically "how can I write the regex" part more
efficiently, i.e. counting the occurences of the phrase by using regex
against the scalar. The regex is actually needed for a separate program -
the script I posted was just an example. Sorry, should have been more clear
about that.

Mothra · Apr 23, 2004

sorry - see response above, but I need to do the whole thing in regex.

Tore Aursand · Apr 23, 2004

Am trying to match n occurences of a phrase ("Lost connection") in a text
file.

Here's the code I;ve got so far (I'm reading the whole file into the scalar
$lines)...

if ($lines =~ /(.*Lost connection.*(\n.*)*{4}/) {
print "yep\n";
} else {
print "nope\n";
}

How about this?

my $count;
$count++ while ( $lines =~ /Lost connection/g );

James Willmore · Apr 23, 2004

Am trying to match n occurences of a phrase ("Lost connection") in a text
file.

[ ... ]

Here's the code I;ve got so far (I'm reading the whole file into the scalar
$lines)...

if ($lines =~ /(.*Lost connection.*(\n.*)*{4}/) {

^^^ ???
Are you sure you have the right regular expression here? It looks like
you're missing a paren. Plus, you're trying to *extract* the newlines
(think "(\n)" ). Maybe you forgot to escape the metacharacters?

[ ... ]

How could I rewrite this regex more efficiently?

If you want to just *count* "Lost connection", consider using 'index'. If
you're trying to *extract* information from the line that contains "Lost
Connection", read the file one line at a time, *then* use a regular
expression to extract the information from the line. From what you
posted, it looks like your trying to *extract* information from the "Lost
Connection" occurance - but you state you want to "count" the occurances.

HTH

--
Jim

Copyright notice: all code written by the author in this post is
released under the GPL. http://www.gnu.org/licenses/gpl.txt
for more information.

a fortune quote ...
Succumb to natural tendencies. Be hateful and boring.

Gary E. Ansok · Apr 23, 2004

Am trying to match n occurences of a phrase ("Lost connection") in a text
file.

Here's the code I;ve got so far (I'm reading the whole file into the scalar
$lines)...

if ($lines =~ /(.*Lost connection.*(\n.*)*{4}/) {
print "yep\n";
} else {
print "nope\n";
}

Well, if you absolutely must do it in one regex, how about

if ($lines =~ /(?:Lost connection.*?){4}/s) {

I'm not sure whether there's any benefit to using .*? instead of .*,
you might want to try it both ways and see.

Gary Ansok

Joe Smith · Apr 23, 2004

Mothra said:
/(.*Lost connection.*(\n.*)*{4}/

The above works and prints "yep" if 4 or more occurences are found; however,
it's quite slow and if I want to match more occurences (e.g. 9) it takes
forever: almost as if the length of time the script takes to run is being
raised to the power of the number of occurences I'm trying to match!

Don't use consecutive asterisks. The ")*{4}" part can't be right.

/{Lost connection.*){4}/s

-Joe

Arduino Code Please Help	0	Oct 30, 2024
Please help	2	Jul 19, 2022
HELP PLEASE	4	Jul 20, 2022
Please, help me.	1	Aug 15, 2023
Code help please	4	May 19, 2023
Creating a regex to get multiple values and print	0	Jan 10, 2021
Need help with this script	4	Mar 12, 2023
BITCOIN PROGRAMMING - CODE INCLUDED - needs slight modification in linux terminal - NSA please do not block	0	Nov 2, 2024

inefficient regex - please help!

Mothra

Toni Erdmann

Gunnar Hjalmarsson

Mothra

Mothra

Tore Aursand

James Willmore

Gary E. Ansok

Joe Smith

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads