Regexp optimization

sravi · Apr 18, 2005

I have the following piece of code,

#-----------------------
my $sd = '(\d{8})\s+(\d\d:\d\d:\d\d)';
my $ed = '(\d{8}|-1)\s+(\d\d:\d\d:\d\d|-1)';

my $up_com = '^#?(\S+)\s*(\S+)\s*'."$startDateRegex\$";

my $regex = "([+#-]?)\s*$sd\\s+$ed\\s+(\\S+)\\s+(\\S+)\\s+(.+)";

while (<>) {

if (/$regex/o) {

my @line =
($1, $2, $3, $4, $5, $6, $7, $8);
# Process the data and print
}
elsif (/$up_com/o) {

my ($a,$b,$c) = ($2,$3,$4);
# Process the data and print
}
else {

print;
}
}
#-----------------------

I have two regex that can match the given line. Is it possible to
combine first and second regex into one regex and process the data
optimally?

Mark Clements · Apr 18, 2005

sravi said:
I have the following piece of code,

#-----------------------
my $sd = '(\d{8})\s+(\d\d:\d\d:\d\d)';
my $ed = '(\d{8}|-1)\s+(\d\d:\d\d:\d\d|-1)';

my $up_com = '^#?(\S+)\s*(\S+)\s*'."$startDateRegex\$";

my $regex = "([+#-]?)\s*$sd\\s+$ed\\s+(\\S+)\\s+(\\S+)\\s+(.+)";

while (<>) {

if (/$regex/o) {

my @line =
($1, $2, $3, $4, $5, $6, $7, $8);
# Process the data and print
}
elsif (/$up_com/o) {

my ($a,$b,$c) = ($2,$3,$4);
# Process the data and print
}
else {

print;
}
}
#-----------------------

I have two regex that can match the given line. Is it possible to
combine first and second regex into one regex and process the data
optimally?

Yes, but you run the risk of losing legibility. My suggestions are :-

* look at using Regexp::Assemble.

* if the processing is pretty much identical when $regex (you may want
to use a more descriptive name for this variable) and $up_com match, you
could do the processing in a subroutine.

* when you have matched $up_com, you pull out the matched items with

my ($a,$b,$c) = ($2,$3,$4);

You could use non-capturing parentheses in order to avoid capturing the
first term (note use of "?:" ).
eg
my $up_com = '^#?(?:\S+)\s*(\S+)\s*'."$startDateRegex\$";

see man perlre and look for "non-capturing".

* you *could* (not sure this helps legibility) combine test and
assignment in the same line eg

if( my @line = ( /$regex/ ) ) {

}

(working example)
bob 538 $ perl -le '$f="abcd";print "@a" if @a=($f=~/(a)(b)(.)(d)/);'
a b c d
bob 539 $ perl -le '$f="abcd";print "@a" if @a=($f=~/(a)(b)(x)(d)/);'
bob 540 $

You may like to consider using something like Benchmark::Timer so that
you can track whether your optimizations are er, optimizing or not.
Check out perldoc re for details on debugging regexs.

regards,

Mark

Trying to build a SARIMAX model to forecast the S&P500 trend	0	Nov 5, 2023
Problems with using event handlers for button and textarea input	1	Nov 29, 2021
Python pyPDF4 code to bookmark pdf based upon date text	1	Jan 18, 2023
Padding strings for a clean visual print out...	5	Dec 23, 2023
Php modal form to email	1	Aug 28, 2024
BITCOIN PROGRAMMING - CODE INCLUDED - needs slight modification in linux terminal - NSA please do not block	0	Nov 2, 2024
Need help with this script	4	Mar 12, 2023
Index.php with bootstrap implementation and extern sites inclusion	0	Oct 30, 2024

Regexp optimization

sravi

Mark Clements

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads