Best way to search for multiple strings in a line?

S

Scott Stark

I hope this isn't too dumb of a question, but what's the best way to
search for an array of items in the lines of a file? The following
seems inefficient, especially with a large number of search strings:

while($line=<FILE>){
foreach $s (@searchStrings){
push(@found,$line) if($line=~/$s/);
}
}

thanks,
Scott
 
M

Matija Papec

X-Ftn-To: Scott Stark

I hope this isn't too dumb of a question, but what's the best way to
search for an array of items in the lines of a file? The following
seems inefficient, especially with a large number of search strings:

while($line=<FILE>){
foreach $s (@searchStrings){
push(@found,$line) if($line=~/$s/);
}
}

my $s = join '|', map { '\Q'.$_.'\E' } @searchStrings;
$s = qr/$s/; #compile regex

while($line = <FILE>){
push (@found,$line) if $line =~ /$s/;
}

if this doesn't work replace first line with,
my $s = join '|', map { qq/\Q$_\E/ } @searchStrings;
 
B

Bob Walton

Scott said:
I hope this isn't too dumb of a question, but what's the best way to
search for an array of items in the lines of a file? The following
seems inefficient, especially with a large number of search strings:

while($line=<FILE>){
foreach $s (@searchStrings){
push(@found,$line) if($line=~/$s/);
}
} ....


Scott

You might benefit from the use of the "study" function (perldoc -f
study) for your task. Check out the examples given there.

Also, if you are truly searching for strings rather than patterns,
consider using the "index" function instead of regexps. It should be
faster most of the time.

I assume from the way you coded it above that you know for certain your
strings do not contain any regexp metacharacters. If they do, you will
need to quote them using \Q and \E.

Don't build a new regexp every time through the file read loop -- that
requires each regexp to be compiled for each line. You could use the
alternation metacharacter | to build a single regexp that will match any
of your search strings, as was suggested by a previous poster. Or you
could build an array of regexps and later apply them one at a time.
Maybe like:

for(@searchStrings){push @regexps,qr/\Q$_\E/}

Then later:

while($line=~<FILE>){
study $line;
for(@regexps){push @found,$line if $line=~$_}
}

Finally, consider that a line may match more than one of your strings.
If that happens, your code will put the line in @found more than once.
Is that desired behavior? If not, you might want to terminate the
foreach loop when you have a successful match. That would also speed
things up, as additional matches would not be tested for once a match is
found.

HTH.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,122
Messages
2,570,716
Members
47,282
Latest member
hopkins1988

Latest Threads

Top