Hi Duncan
This of course gives priority to colours and only looks for garments or
footwear if the it hasn't matched on a prior pattern. If you actually
wanted to match the first occurrence of any of these (or if the condition
was re.match instead of re.search) then named groups can be a nice way of
simplifying the code:
A good point. And a good example when to use named
capture group references. This is easily extended
for 'spitting out' all other occuring categories
(see below).
PATTERN = '''
(?P<c>blue|white|red)
...
This is one nice thing in Pythons Regex Syntax,
you have to emulate the ?P-thing in other
Regex-Systems more or less 'awk'-wardly ;-)
For something this simple the titles and group names could be the
same, but I'm assuming real code might need a bit more.
Non no, this is quite good because it involves
some math-generated table-code lookup.
I managed somehow to extend your example in order
to spit out all matches and their corresponding
category:
import re
PATTERN = '''
(?P<c>blue |white |red )
| (?P<g>socks|tights )
| (?P<f>boot |shoe |trainer)
'''
PATTERN = re.compile(PATTERN , re.VERBOSE)
TITLES = { 'c': 'Colour', 'g': 'Garment', 'f': 'Footwear' }
t = 'blue socks and red shoes'
for match in PATTERN.finditer(t):
grp = match.lastgroup
print "%s: %s" %( TITLES[grp], match.group(grp) )
which writes out the expected:
Colour: blue
Garment: socks
Colour: red
Footwear: shoe
The corresponding Perl-program would look like this:
$PATTERN = qr/
(blue |white |red )(?{'c'})
| (socks|tights )(?{'g'})
| (boot |shoe |trainer)(?{'f'})
/x;
%TITLES = (c =>'Colour', g =>'Garment', f =>'Footwear');
$t = 'blue socks and red shoes';
print "$TITLES{$^R}: $^N\n" while( $t=~/$PATTERN/g );
and prints the same:
Colour: blue
Garment: socks
Colour: red
Footwear: shoe
You don't have nice named match references (?P<..>)
in Perl-5, so you have to emulate this by an ordinary
code assertion (?{..}) an set some value ($^R) on
the fly - which is not that bad in the end (imho).
(?{..}) means "zero with code assertion",
this sets Perl-predefined $^R to its evaluated
value from the {...}
As you can see, the pattern matching related part
reduces from 4 lines to one line.
If you wouldn't need dictionary lookup and
get away with associated categories, all
you'd have to do would be this:
$PATTERN = qr/
(blue |white |red )(?{'Colour'})
| (socks|tights )(?{'Garment'})
| (boot |shoe |trainer)(?{'Footwear'})
/x;
$t = 'blue socks and red shoes';
print "$^R: $^N\n" while( $t=~/$PATTERN/g );
What's the point of all that? IMHO, Python's
Regex support is quite good and useful, but
won't give you an edge over Perl's in the end.
Thanks & Regards
Mirco