R
Rainer Weikusat
NB: This text is partially motivated by the fairly usesless (in this
particular respect) coroutine, closure and generator articles in
Wikipedia ("Perl supports this *if* you download the following 10E24
modules ...").
In case someone's not familiar with the concept, a 'generator' is
something which, when invoked, returns the 'next value' from some
ordered set of values. An extremly simple-minded example would be
(using the 'state' feature)
sub powers_of_2
{
state $next = 1;
my $cur;
$cur = $next;
$next *= 2;
return $cur;
}
which will return all members of the set of 'powers of 2' from first
to last provided it is called an infinite number of times (or any
finite subset of this set for a finite number of calls). This is
obviously similar to a lazily evaluated sequence (a term I've read
here and there).
This is, of course, not very useful, as contrived examples are wont to
be (and it is also less efficient than calculating the values in a
loop).
The most interesting feature of generators is that they can be
used to arrange independent subroutines in a 'processing pipeline' in
order to perform a multi-step transformation of some set of 'input
data items' to a set of 'output data items' while isolating the code
for each step in its own subroutine. A real-word example I had to deal
with yesterday was in some code supposed to preprocess a set of 'port
numbers' (as in 'TCP' or 'UDP') including port ranges such that the
output set is composed of a set of distinct ports or port ranges which
is equivalent (contains the same ports) to the input set. Further,
while the input set (produced by some other, intermediate processing
step) represented individual ports as 'ranges' with an identical start
and end value, the output set should contain bare numbers for these
cases and 'ranges' (represented as anonymous arrays with two elements)
only when the corresponding input element was actually a range
encompassing more than one port number.
The first implementation of that looked like this:
sub dedupe_ports
{
my ($cur, @in, @out);
@in = sort { $a->[0] <=> $b->[0] } @{$_[0]};
$cur = shift(@in);
for (@in) {
if ($_->[0] <= $cur->[1] + 1) {
$cur->[1] = $_->[1] if $_->[1] > $cur->[1];
next;
}
push(@out, $cur->[1] > $cur->[0] ? $cur : $cur->[1]);
$cur = $_;
}
push(@out, $cur->[1] > $cur->[0] ? $cur : $cur->[1]);
return \@out;
}
But I really didn't like the way the two separate processing steps of
'merging overlapping ranges' and 'collapsing single-port ranges' where
intermingled here, especially as I originally forgot the collapsing step
for the duplicated final push.
One way to deal with that is to use a generator to create a sequence
of merged ranges and apply the 'collapsing' step to the set of output
values produced by that. The subroutine included below creates a
closure which is such a generator:
sub range_merger
{
my (@in, $cur);
@in = sort { $a->[0] <=> $b->[0] } @{$_[0]};
$cur = shift(@in);
return sub {
my ($next, $res);
$next->[1] > $cur->[1] and $cur->[1] = $next->[1]
while ($next = shift(@in)) && $next->[0] <= $cur->[1] + 1;
$res = $cur;
$cur = $next;
return $res;
};
}
Whenever the returned closure is invoked, it will return the next
isolated, distinct range (my English feels somewhat lacking here)
aggregated from the elements of the anonymous array passed as input to
the ranger_merger subroutine, until all elements of that have been
processed. Afterwards, it will return an undefined value. This
generator can now be used by the collapsing subroutine,
sub dedupe_ports
{
my (@out, $merge_ranges, $range);
$merge_ranges = &range_merger;
push(@out, $range->[1] > $range->[0] ? $range : $range->[0])
while $range = $merge_ranges->();
return \@out;
}
in order to calculate the desired set of output values without code
duplication/ special-casing the final element case.
particular respect) coroutine, closure and generator articles in
Wikipedia ("Perl supports this *if* you download the following 10E24
modules ...").
In case someone's not familiar with the concept, a 'generator' is
something which, when invoked, returns the 'next value' from some
ordered set of values. An extremly simple-minded example would be
(using the 'state' feature)
sub powers_of_2
{
state $next = 1;
my $cur;
$cur = $next;
$next *= 2;
return $cur;
}
which will return all members of the set of 'powers of 2' from first
to last provided it is called an infinite number of times (or any
finite subset of this set for a finite number of calls). This is
obviously similar to a lazily evaluated sequence (a term I've read
here and there).
This is, of course, not very useful, as contrived examples are wont to
be (and it is also less efficient than calculating the values in a
loop).
The most interesting feature of generators is that they can be
used to arrange independent subroutines in a 'processing pipeline' in
order to perform a multi-step transformation of some set of 'input
data items' to a set of 'output data items' while isolating the code
for each step in its own subroutine. A real-word example I had to deal
with yesterday was in some code supposed to preprocess a set of 'port
numbers' (as in 'TCP' or 'UDP') including port ranges such that the
output set is composed of a set of distinct ports or port ranges which
is equivalent (contains the same ports) to the input set. Further,
while the input set (produced by some other, intermediate processing
step) represented individual ports as 'ranges' with an identical start
and end value, the output set should contain bare numbers for these
cases and 'ranges' (represented as anonymous arrays with two elements)
only when the corresponding input element was actually a range
encompassing more than one port number.
The first implementation of that looked like this:
sub dedupe_ports
{
my ($cur, @in, @out);
@in = sort { $a->[0] <=> $b->[0] } @{$_[0]};
$cur = shift(@in);
for (@in) {
if ($_->[0] <= $cur->[1] + 1) {
$cur->[1] = $_->[1] if $_->[1] > $cur->[1];
next;
}
push(@out, $cur->[1] > $cur->[0] ? $cur : $cur->[1]);
$cur = $_;
}
push(@out, $cur->[1] > $cur->[0] ? $cur : $cur->[1]);
return \@out;
}
But I really didn't like the way the two separate processing steps of
'merging overlapping ranges' and 'collapsing single-port ranges' where
intermingled here, especially as I originally forgot the collapsing step
for the duplicated final push.
One way to deal with that is to use a generator to create a sequence
of merged ranges and apply the 'collapsing' step to the set of output
values produced by that. The subroutine included below creates a
closure which is such a generator:
sub range_merger
{
my (@in, $cur);
@in = sort { $a->[0] <=> $b->[0] } @{$_[0]};
$cur = shift(@in);
return sub {
my ($next, $res);
$next->[1] > $cur->[1] and $cur->[1] = $next->[1]
while ($next = shift(@in)) && $next->[0] <= $cur->[1] + 1;
$res = $cur;
$cur = $next;
return $res;
};
}
Whenever the returned closure is invoked, it will return the next
isolated, distinct range (my English feels somewhat lacking here)
aggregated from the elements of the anonymous array passed as input to
the ranger_merger subroutine, until all elements of that have been
processed. Afterwards, it will return an undefined value. This
generator can now be used by the collapsing subroutine,
sub dedupe_ports
{
my (@out, $merge_ranges, $range);
$merge_ranges = &range_merger;
push(@out, $range->[1] > $range->[0] ? $range : $range->[0])
while $range = $merge_ranges->();
return \@out;
}
in order to calculate the desired set of output values without code
duplication/ special-casing the final element case.