/\bf(?:o(?:o(?:b(?:a(?:r)?)?)?)?)?\b/

M

Mike

I want a regexp that will match (as a "whole word") any of the
(non-empty) prefixes of the word "foobar", i.e. "f", "fo", "foo",
"foob", etc. One way to do it is this:

/\bf(?:eek:(?:eek:(?:b(?:a(?:r)?)?)?)?)?\b/

Yuk. Is there a better way?

Mike
 
P

Peter Wyzl

:
: I want a regexp that will match (as a "whole word") any of the
: (non-empty) prefixes of the word "foobar", i.e. "f", "fo", "foo",
: "foob", etc. One way to do it is this:
:
: /\bf(?:eek:(?:eek:(?:b(?:a(?:r)?)?)?)?)?\b/
:
: Yuk. Is there a better way?

It took me a while to understand exactly what you want here so apologies if
I am wrong (not yet awake and had second coffee.. :))

Does this do what you want?

m/\b(f|fo|foo|foob|fooba|foobar)\b/

It would be fairly easy to build up the alternation string

$string = 'f|fo|foo|foob|fooba|foobar';

from any word this way and incorporate that into the regex.

so then

m/\b($string)\b/

would do the same job.

$string = substr $word,0,1;
$string = join '|', $string, substr $word,0,$_ for (2 .. length $word);

$match = $1 if $test =~ m/\b($string)\b/;

Seems to do what you want.

On a side note, I seem to not be able to replace the '$string' in the second
expression with 'substr $word,0,1' for the first element of the list. When
I do, I only get the last element of the second part of the list 'foobar'
and not all the others. The same if I discard the first element and write
the list generator as 'substr $word,0,$_ for (1 .. length $word);' I expect
this to give me a list which can be joined, but I only get the last element.

The fact it works in the case above, convinces me that it is returning a
list like I expect, I just dont see why 'join' is only getting the last
element of that list.

Maybe I should go have that coffee now
 
A

Anno Siegel

Mike said:
I want a regexp that will match (as a "whole word") any of the
(non-empty) prefixes of the word "foobar", i.e. "f", "fo", "foo",
"foob", etc. One way to do it is this:

/\bf(?:eek:(?:eek:(?:b(?:a(?:r)?)?)?)?)?\b/

Yuk. Is there a better way?

Yuk is in the eye of the beholder.

The regex isn't particularly readable, but it *is* an economical way
to match all possible prefixes, as opposed to the readable but
repetitive

/\b(?:foobar|fooba|foob|foo|fo|f)\b/

If the pattern is generated by a well-named routine, readability of
the result doesn't matter much:

sub prefix_pattern {
my $pat = '';
# build innermost pattern first
$pat = "(?:$_$pat)?" for reverse split //, shift;
$pat;
}

my $re = prefix_pattern( $word);
print join( ", ", 'X f YY foob ZZZ foobar' =~ /\b$re\b/g), "\n";

This prints a lot of comma-separated empty strings because it also matches
the empty prefix. The first character is made optional like the rest of
them. To amend this, treat the first character extra. Using the same
prefix_pattern():

my ( $first, $rest) = $word =~ /(.)(.*)/ or die "Word can't be empty";
my $re = $first . prefix_pattern( $rest);
print join( ", ", 'X f YY foob ZZZ foobar' =~ /\b$re\b/g), "\n";

Anno
 
M

Mike

Yuk is in the eye of the beholder.
The regex isn't particularly readable, but it *is* an economical way
to match all possible prefixes, as opposed to the readable but
repetitive

If the pattern is generated by a well-named routine, readability of
the result doesn't matter much:
sub prefix_pattern {
my $pat = '';
# build innermost pattern first
$pat = "(?:$_$pat)?" for reverse split //, shift;
$pat;
}
my $re = prefix_pattern( $word);
print join( ", ", 'X f YY foob ZZZ foobar' =~ /\b$re\b/g), "\n";
This prints a lot of comma-separated empty strings because it also matches
the empty prefix. The first character is made optional like the rest of
them. To amend this, treat the first character extra. Using the same
prefix_pattern():
my ( $first, $rest) = $word =~ /(.)(.*)/ or die "Word can't be empty";
my $re = $first . prefix_pattern( $rest);
print join( ", ", 'X f YY foob ZZZ foobar' =~ /\b$re\b/g), "\n";


That's cool. But I'm curious, why did you design prefix_pattern
to make also the first letter optional? My naive implementation
of prefix_pattern would have treated the first letter specially:

sub my_prefix_pattern {
my ($first, $rest) = split //, shift, 2;
$first . ($rest ? '(?:' . my_prefix_pattern($rest) . ')?' : '');
}

After all, in the rare instances in which one wanted to treat even
the first letter as optional, one could still easily define $re as

my $re = '(?:' . my_prefix_pattern($word) . ')?';

Mike
 
A

Anno Siegel

Mike said:
In <[email protected]>









That's cool. But I'm curious, why did you design prefix_pattern
to make also the first letter optional? My naive implementation
of prefix_pattern would have treated the first letter specially:

sub my_prefix_pattern {
my ($first, $rest) = split //, shift, 2;
$first . ($rest ? '(?:' . my_prefix_pattern($rest) . ')?' : '');
}

That's a possibility, though I tend to avoid recursion when it doesn't
have a clear advantage. Maybe it has, I haven't tried to rewrite it.

However, there's a general design principle that says a subroutine
should do one thing right, no more, no less. As a corollary, in a
well-written program every subroutine does almost nothing. :) This
makes as sub easy to describe and to remember what it does, and it
makes it well suited as a building block, even if the expected usage
changes later.

When writing a sub to do X, there is often a temptation to say, I'll
never want X without Y in this program, so let's put Y in there, too.
It looks economical. However, I have often (and too often painfully)
reversed that kind of decision, so I tend to keep dealing with special
cases out of subroutines.

Taken a step further, I might even pull split( //, ...) out of the sub
and relegate it to the caller. Let the user provide a list of arbitrary
strings (which will be single characters in the actual calls).

sub prefix_pattern {
my $pat = ''
$pat = "(?:$_$pat)?" for reverse @_;
$pat;
}

(The sub name needs some thought now.)
After all, in the rare instances in which one wanted to treat even
the first letter as optional, one could still easily define $re as

How do you know they will always be rare? The empty string *is* a
legitimate prefix, after all. I often find, when I refine an ad-hoc
solution to a more systematic approach, the limiting and special cases
are dealt with quite differently than anticipated. Then it's good when
they aren't hard-soldered into the basic routines.
my $re = '(?:' . my_prefix_pattern($word) . ')?';

Yes, but you are fixing a result to fit the general case because it is
too special. It is better to fix the general result to fit the special
case.

All that said, in the concrete example the issue is minor. If you
*know* you'll never want the empty prefix to match, go ahead and
write a sub to that specification. I just took occasion to apply
the more general principle.

Anno
 
M

Mike

That's a possibility, though I tend to avoid recursion when it doesn't
have a clear advantage. Maybe it has, I haven't tried to rewrite it.

I admit it, I'm a bit of a recursion junkie...
However, there's a general design principle that says a subroutine
should do one thing right, no more, no less. As a corollary, in a
well-written program every subroutine does almost nothing. :) This
makes as sub easy to describe and to remember what it does, and it
makes it well suited as a building block, even if the expected usage
changes later.

Thanks, this is very helpful. Design issues like this one are the
aspect of programming I have the hardest time with.

Mike
 
A

Anno Siegel

Mike said:
In <[email protected]>
(e-mail address removed)-berlin.de (Anno Siegel) writes:
[lots]
However, there's a general design principle that says a subroutine
should do one thing right, no more, no less. As a corollary, in a
well-written program every subroutine does almost nothing. :) This
makes as sub easy to describe and to remember what it does, and it
makes it well suited as a building block, even if the expected usage
changes later.

Thanks, this is very helpful. Design issues like this one are the
aspect of programming I have the hardest time with.

Me too.

Anno
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,164
Messages
2,570,898
Members
47,439
Latest member
shasuze

Latest Threads

Top