A hash or array of regexp's?

T

Tim Shoppa

I often find myself with a list of things that I'm searching for. And
for each of the things I'm searching for, there's an action I want to
do.

Sometimes the "search for" pattern is just the first four characters in
the line, for example. Here things are easy: I build a hash with the
key being the four-character pattern, and the value being the
subroutine to execute. Works very nicely: get each line, use a
substr() to extract the first four characters, look them up in the
hash, and execute the correct subroutine. Very quick, very fast, very
idiomatic.

But other times the patterns are not so easily handled. Often they are
true regexp's, matching variable repeats/patterns. This of course can
be handled with if matches and blocks to do the actions, but this
screams out to me as something that I ought to be able to handle using
a data structure which is something like a hash, using regexp's as
keys.

Pages 193/194 of the Camel book reveal how to loop over a bunch of
precompiled regexp's, using qr// to precompile the regexp's, and this
isn't bad. But it's not quite the same as a hash lookup. And it seems
to me that there ought to be an idiom, maybe a CPAN module, that makes
the whole operation look more like a hash lookup, because that's how I
think of it in my head, even though I know that regexp's aren't really
as quick or efficient as simple keys.

So, is there a common perl idiom for dealing with this situation?
Maybe a CPAN module?

Tim.
 
X

xhoster

Tim Shoppa said:
I often find myself with a list of things that I'm searching for. And
for each of the things I'm searching for, there's an action I want to
do.

Sometimes the "search for" pattern is just the first four characters in
the line, for example. Here things are easy: I build a hash with the
key being the four-character pattern, and the value being the
subroutine to execute. Works very nicely: get each line, use a
substr() to extract the first four characters, look them up in the
hash, and execute the correct subroutine. Very quick, very fast, very
idiomatic.

But other times the patterns are not so easily handled. Often they are
true regexp's, matching variable repeats/patterns. This of course can
be handled with if matches and blocks to do the actions, but this
screams out to me as something that I ought to be able to handle using
a data structure which is something like a hash, using regexp's as
keys.

Pages 193/194 of the Camel book reveal how to loop over a bunch of
precompiled regexp's, using qr// to precompile the regexp's, and this
isn't bad. But it's not quite the same as a hash lookup. And it seems
to me that there ought to be an idiom, maybe a CPAN module, that makes
the whole operation look more like a hash lookup, because that's how I
think of it in my head, even though I know that regexp's aren't really
as quick or efficient as simple keys.

Also, any given string can match many different regexes, while there is
exactly one hash key it can match. Trying to munge such a situation into a
hash-like idiom seems very misleading and just asking for trouble.

I'd just use an array of arrays, with each inner array being of length 2,
a regex/action pair.

Xho
 
F

Fabian Pilkowski

* Tim Shoppa said:
I often find myself with a list of things that I'm searching for. And
for each of the things I'm searching for, there's an action I want to
do.

Sometimes the "search for" pattern is just the first four characters in
the line, for example. Here things are easy: I build a hash with the
key being the four-character pattern, and the value being the
subroutine to execute. Works very nicely: get each line, use a
substr() to extract the first four characters, look them up in the
hash, and execute the correct subroutine. Very quick, very fast, very
idiomatic.

But other times the patterns are not so easily handled. Often they are
true regexp's, matching variable repeats/patterns. This of course can
be handled with if matches and blocks to do the actions, but this
screams out to me as something that I ought to be able to handle using
a data structure which is something like a hash, using regexp's as
keys.

So, is there a common perl idiom for dealing with this situation?

I would do this with an array containing a regex as each second element
and the callback in the following one, then iterating over this array
while skipping the callback elements.

#!/usr/bin/perl -w
use strict;

my @array = (
qr/(line\s(\d)\2)/ => sub { print "match: $1" },
# ...
);

while ( <DATA> ) {
for my $i ( 0 .. @array-1 ) {
next if $i % 2; # skip if odd
my( $re, $sub ) = @array[ $i, $i+1 ];
$sub->() if $_ =~ $re; # callback
}
}
__DATA__
line 10
line 11
line 12

Maybe a CPAN module?

The Modul Tie::HashRef is moving around the problem of stringified hash
keys. Perhaps it accepts a reference to a regex as keys -- the doc isn't
talking about and neither I checked it out yet.

regards,
fabian
 
T

Tim Shoppa

Fabian said:
The Modul Tie::HashRef is moving around the problem

Thanks for the tip, it's not only a tied hash but also a useful
object-oriented approach to looking for matches. It takes "qr//" forms
directly as the key, no need stringify/destringify. And to answer the
other reply, the approach taken ("first match") works fine for my
purposes.

I know it's not really a hash (with all the efficiencies that would be
implied if it was) but I like to think in terms of a hash, and
Tie::HashRef works wonderfully for this.

Tim.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,825
Latest member
VernonQuy6

Latest Threads

Top