inefficient regex - please help!

M

Mothra

Am trying to match n occurences of a phrase ("Lost connection") in a text
file.

Here's the code I;ve got so far (I'm reading the whole file into the scalar
$lines)...

if ($lines =~ /(.*Lost connection.*(\n.*)*{4}/) {
print "yep\n";
} else {
print "nope\n";
}

The above works and prints "yep" if 4 or more occurences are found; however,
it's quite slow and if I want to match more occurences (e.g. 9) it takes
forever: almost as if the length of time the script takes to run is being
raised to the power of the number of occurences I'm trying to match!

How could I rewrite this regex more efficiently?
 
T

Toni Erdmann

Mothra said:
Am trying to match n occurences of a phrase ("Lost connection") in a text
file.

Here's the code I;ve got so far (I'm reading the whole file into the scalar
$lines)...

if ($lines =~ /(.*Lost connection.*(\n.*)*{4}/) {
print "yep\n";
} else {
print "nope\n";
}

The above works and prints "yep" if 4 or more occurences are found; however,
it's quite slow and if I want to match more occurences (e.g. 9) it takes
forever: almost as if the length of time the script takes to run is being
raised to the power of the number of occurences I'm trying to match!

How could I rewrite this regex more efficiently?

$count = 0;
while ($lines=~ /Lost connection/g) { $count++; }

You can 'break' the while loop: 'break if ( $count >= 4 );'
or any number you like

Toni
 
G

Gunnar Hjalmarsson

Mothra said:
Am trying to match n occurences of a phrase ("Lost connection") in
a text file.

Here's the code I;ve got so far (I'm reading the whole file into
the scalar $lines)...

if ($lines =~ /(.*Lost connection.*(\n.*)*{4}/) {
print "yep\n";
} else {
print "nope\n";
}

Hmm.. Yes, the regex seems unnecessarily complicated. Another approach:

my $cnt;
while ($lines =~ /Lost connection/g) {
if (++$cnt == 4) {
print "yep\n";
last;
}
}
print "nope\n" unless $cnt == 4;

Maybe not very elegant, but I'd guess it's faster.

The index() function might be an alternative, since the pattern is a
plain string.
 
M

Mothra

Toni Erdmann said:
$count = 0;
while ($lines=~ /Lost connection/g) { $count++; }

You can 'break' the while loop: 'break if ( $count >= 4 );'
or any number you like
Sorry but I meant specifically "how can I write the regex" part more
efficiently, i.e. counting the occurences of the phrase by using regex
against the scalar. The regex is actually needed for a separate program -
the script I posted was just an example. Sorry, should have been more clear
about that.
 
T

Tore Aursand

Am trying to match n occurences of a phrase ("Lost connection") in a text
file.

Here's the code I;ve got so far (I'm reading the whole file into the scalar
$lines)...

if ($lines =~ /(.*Lost connection.*(\n.*)*{4}/) {
print "yep\n";
} else {
print "nope\n";
}

How about this?

my $count;
$count++ while ( $lines =~ /Lost connection/g );
 
J

James Willmore

Am trying to match n occurences of a phrase ("Lost connection") in a text
file.

[ ... ]
Here's the code I;ve got so far (I'm reading the whole file into the scalar
$lines)...

if ($lines =~ /(.*Lost connection.*(\n.*)*{4}/) {
^^^ ???
Are you sure you have the right regular expression here? It looks like
you're missing a paren. Plus, you're trying to *extract* the newlines
(think "(\n)" ). Maybe you forgot to escape the metacharacters?

[ ... ]
How could I rewrite this regex more efficiently?

If you want to just *count* "Lost connection", consider using 'index'. If
you're trying to *extract* information from the line that contains "Lost
Connection", read the file one line at a time, *then* use a regular
expression to extract the information from the line. From what you
posted, it looks like your trying to *extract* information from the "Lost
Connection" occurance - but you state you want to "count" the occurances.

HTH

--
Jim

Copyright notice: all code written by the author in this post is
released under the GPL. http://www.gnu.org/licenses/gpl.txt
for more information.

a fortune quote ...
Succumb to natural tendencies. Be hateful and boring.
 
G

Gary E. Ansok

Am trying to match n occurences of a phrase ("Lost connection") in a text
file.

Here's the code I;ve got so far (I'm reading the whole file into the scalar
$lines)...

if ($lines =~ /(.*Lost connection.*(\n.*)*{4}/) {
print "yep\n";
} else {
print "nope\n";
}

Well, if you absolutely must do it in one regex, how about

if ($lines =~ /(?:Lost connection.*?){4}/s) {

I'm not sure whether there's any benefit to using .*? instead of .*,
you might want to try it both ways and see.

Gary Ansok
 
J

Joe Smith

Mothra said:
/(.*Lost connection.*(\n.*)*{4}/

The above works and prints "yep" if 4 or more occurences are found; however,
it's quite slow and if I want to match more occurences (e.g. 9) it takes
forever: almost as if the length of time the script takes to run is being
raised to the power of the number of occurences I'm trying to match!

Don't use consecutive asterisks. The ")*{4}" part can't be right.

/{Lost connection.*){4}/s

-Joe
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,148
Messages
2,570,838
Members
47,385
Latest member
Joneswilliam01

Latest Threads

Top