perl vs Unix grep

Al Belden · Jul 3, 2004

Hi all,
I've been working on a problem that I thought might be of interest: I'm
trying to replace some korn shell scripts that search source code files with
perl scripts to gain certain features such as:

More powerful regular expressions available in perl
Ability to print out lines before and after matches (gnu grep supports this
but is not availble on our Digital Unix and AIX platforms)
Make searches case insensitive by default (yes, I know this can be done with
grep but the shell scripts that use
grep don't do this)

We're talking about approx. 5000 files spread over 15 directories. To date
it has proven quite difficult (for me) to match the performance of the Korn
shell scripts using perl scripts and still obtain the line number and
context information needed. The crux of the problem is that I have seen the
best performance from perl when I match with the /g option on a string that
represents the current slurped file:

local $/;
my $curStr = <FH>;
while ($curStr =~ /$compiledRegex/g)
{
# write matches to file for eventual paging
}

This works well except that when each match is found I need the line number
the match has been found in. As far as I can tell from reading and research
there is no variable that holds this information as I am not reading from
the file at this point. I can get the information in other ways such as:

1. Reading each file a line at a time, testing for a match and keeping a
line counter or using $NR.
2. Reading the file into an array and processing a line at a time
3. Creating index files for the source files that store line offsets and
using them with the slurp method in the
paragraph above
4. Creating an in-memory index for each file that contains a match and using
it for subsequent matches in that file

1, 2 and 4 above suffer performance degradation relative to unix grep. #3
provides good performance and is the method I am currently using but it
requires creating and maintaining index files. I was wondering if I could
tie a scalar to a file and use the slurping loop above. Then perhaps $NR and
$. would contain the current line number as the file would be read as the
loop is traversed. Any other ideas would be welcome

Al

Giridhar Nandigam · Jul 7, 2004

Hello Al Baden,

I have had similar problem with getting the index numder of the
element match when we search for elements in an array. It was
fruitless. I used Hash map, but that was a burden on the system. In
another possiable implementation i have done with use of a separate
variable indexCount on array and reintialized evry time.

That's it.
Perl is langauge to make things work at any cost. All the best.
Thanks.
Giridhar Nandigam

perl vs Unix grep	3	Jul 3, 2004
Big problem I need to solve with some unix utils	1	Jun 19, 2022
eval within grep not working	1	Oct 1, 2010
A question about grep	11	May 16, 2014
UNIX shell tools	2	Nov 30, 2013
Grep Equivalent for Python	15	Mar 14, 2007
Using sh library with grep command	1	Nov 23, 2013
python vs. grep	15	May 6, 2008

perl vs Unix grep

Al Belden

Giridhar Nandigam

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads