A
Al Belden
Hi all,
I've been working on a problem that I thought might be of interest: I'm
trying to replace some korn shell scripts that search source code files with
perl scripts to gain certain features such as:
More powerful regular expressions available in perl
Ability to print out lines before and after matches (gnu grep supports this
but is not availble on our Digital Unix and AIX platforms)
Make searches case insensitive by default (yes, I know this can be done with
grep but the shell scripts that use
grep don't do this)
We're talking about approx. 5000 files spread over 15 directories. To date
it has proven quite difficult (for me) to match the performance of the Korn
shell scripts using perl scripts and still obtain the line number and
context information needed. The crux of the problem is that I have seen the
best performance from perl when I match with the /g option on a string that
represents the current slurped file:
local $/;
my $curStr = <FH>;
while ($curStr =~ /$compiledRegex/g)
{
# write matches to file for eventual paging
}
This works well except that when each match is found I need the line number
the match has been found in. As far as I can tell from reading and research
there is no variable that holds this information as I am not reading from
the file at this point. I can get the information in other ways such as:
1. Reading each file a line at a time, testing for a match and keeping a
line counter or using $NR.
2. Reading the file into an array and processing a line at a time
3. Creating index files for the source files that store line offsets and
using them with the slurp method in the
paragraph above
4. Creating an in-memory index for each file that contains a match and using
it for subsequent matches in that file
1, 2 and 4 above suffer performance degradation relative to unix grep. #3
provides good performance and is the method I am currently using but it
requires creating and maintaining index files. I was wondering if I could
tie a scalar to a file and use the slurping loop above. Then perhaps $NR and
$. would contain the current line number as the file would be read as the
loop is traversed. Any other ideas would be welcome
Al
I've been working on a problem that I thought might be of interest: I'm
trying to replace some korn shell scripts that search source code files with
perl scripts to gain certain features such as:
More powerful regular expressions available in perl
Ability to print out lines before and after matches (gnu grep supports this
but is not availble on our Digital Unix and AIX platforms)
Make searches case insensitive by default (yes, I know this can be done with
grep but the shell scripts that use
grep don't do this)
We're talking about approx. 5000 files spread over 15 directories. To date
it has proven quite difficult (for me) to match the performance of the Korn
shell scripts using perl scripts and still obtain the line number and
context information needed. The crux of the problem is that I have seen the
best performance from perl when I match with the /g option on a string that
represents the current slurped file:
local $/;
my $curStr = <FH>;
while ($curStr =~ /$compiledRegex/g)
{
# write matches to file for eventual paging
}
This works well except that when each match is found I need the line number
the match has been found in. As far as I can tell from reading and research
there is no variable that holds this information as I am not reading from
the file at this point. I can get the information in other ways such as:
1. Reading each file a line at a time, testing for a match and keeping a
line counter or using $NR.
2. Reading the file into an array and processing a line at a time
3. Creating index files for the source files that store line offsets and
using them with the slurp method in the
paragraph above
4. Creating an in-memory index for each file that contains a match and using
it for subsequent matches in that file
1, 2 and 4 above suffer performance degradation relative to unix grep. #3
provides good performance and is the method I am currently using but it
requires creating and maintaining index files. I was wondering if I could
tie a scalar to a file and use the slurping loop above. Then perhaps $NR and
$. would contain the current line number as the file would be read as the
loop is traversed. Any other ideas would be welcome
Al