Regexp optimization

S

sravi

I have the following piece of code,

#-----------------------
my $sd = '(\d{8})\s+(\d\d:\d\d:\d\d)';
my $ed = '(\d{8}|-1)\s+(\d\d:\d\d:\d\d|-1)';

my $up_com = '^#?(\S+)\s*(\S+)\s*'."$startDateRegex\$";

my $regex = "([+#-]?)\s*$sd\\s+$ed\\s+(\\S+)\\s+(\\S+)\\s+(.+)";

while (<>) {

if (/$regex/o) {

my @line =
($1, $2, $3, $4, $5, $6, $7, $8);
# Process the data and print
}
elsif (/$up_com/o) {

my ($a,$b,$c) = ($2,$3,$4);
# Process the data and print
}
else {

print;
}
}
#-----------------------

I have two regex that can match the given line. Is it possible to
combine first and second regex into one regex and process the data
optimally?
 
M

Mark Clements

sravi said:
I have the following piece of code,

#-----------------------
my $sd = '(\d{8})\s+(\d\d:\d\d:\d\d)';
my $ed = '(\d{8}|-1)\s+(\d\d:\d\d:\d\d|-1)';

my $up_com = '^#?(\S+)\s*(\S+)\s*'."$startDateRegex\$";

my $regex = "([+#-]?)\s*$sd\\s+$ed\\s+(\\S+)\\s+(\\S+)\\s+(.+)";

while (<>) {

if (/$regex/o) {

my @line =
($1, $2, $3, $4, $5, $6, $7, $8);
# Process the data and print
}
elsif (/$up_com/o) {

my ($a,$b,$c) = ($2,$3,$4);
# Process the data and print
}
else {

print;
}
}
#-----------------------

I have two regex that can match the given line. Is it possible to
combine first and second regex into one regex and process the data
optimally?
Yes, but you run the risk of losing legibility. My suggestions are :-

* look at using Regexp::Assemble.

* if the processing is pretty much identical when $regex (you may want
to use a more descriptive name for this variable) and $up_com match, you
could do the processing in a subroutine.

* when you have matched $up_com, you pull out the matched items with

my ($a,$b,$c) = ($2,$3,$4);

You could use non-capturing parentheses in order to avoid capturing the
first term (note use of "?:" ).
eg
my $up_com = '^#?(?:\S+)\s*(\S+)\s*'."$startDateRegex\$";


see man perlre and look for "non-capturing".

* you *could* (not sure this helps legibility) combine test and
assignment in the same line eg

if( my @line = ( /$regex/ ) ) {

}

(working example)
bob 538 $ perl -le '$f="abcd";print "@a" if @a=($f=~/(a)(b)(.)(d)/);'
a b c d
bob 539 $ perl -le '$f="abcd";print "@a" if @a=($f=~/(a)(b)(x)(d)/);'
bob 540 $

You may like to consider using something like Benchmark::Timer so that
you can track whether your optimizations are er, optimizing or not.
Check out perldoc re for details on debugging regexs.

regards,

Mark
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,169
Messages
2,570,920
Members
47,463
Latest member
FinleyMoye

Latest Threads

Top