G
Greg Bacon
I was writing code to scan an assembly-language definition of
operational data and produce a report and ended up writing code
that gave me the "there has to be a better way" feeling.
Single parameters are easy to spot, e.g.,
label1 .word 1234ABCDh
label2 .float 3.14159
Most arrays are trivial too:
label3 .word 1, 2, 3
Array specifications can span multiple lines, however. For example:
label4 .float 0.0, 0.5, 1.0
.float 1.5, 2.0, 2.5
At first, I used a regular expression to feed individual values into
a sub that kept track of the last label grabbed and determined whether
the current value was a new parameter or a continuation of an array.
The code -- and the approach, really -- was unsatisfying, so I
considered a two-pass scan: grab and decompose the chunks and then
coalesce the arrays in a second pass. I made a start in that
direction but didn't like the way it was playing out.
I saw that scanning an entire array would be straightforward too.
I could safely look ahead. Lines without labels continued the
current array, and I could pretend the values were on one line by
appending to the end of what I've already recognized.
If the lookahead line had a label, I could process what I had and then
C<redo> to process the lookahead line that's already in $_.
Here's a sketch of the code:
while (<$fh>) {
next unless /^(\w+)\s+\.(word|float)\s+(.+?),?\s*$/;
my($label,$type,$data) = ($1,$2,$3);
# look for continued spec
my $needredo = 0;
while (<$fh>) {
if (/^\s*\.(word|float)\s+(.+?),?\s*$/) {
$data .= ", $2";
}
else {
$needredo = 1;
}
}
# now $label, $type, and $data comprise an
# entire parameter
...;
redo if $needredo;
}
That's already kind of klunky, but I also saw that the inner while loop
will exhaust the input, which the outer loop's implicitly tests too. I
tested for C<eof $fh> at each iteration of the inner loop and reset
$needredo if I needed to C<last> out of the inner loop.
The code now feels very klunky. Is there a more elegant way to code
this scan?
Greg
operational data and produce a report and ended up writing code
that gave me the "there has to be a better way" feeling.
Single parameters are easy to spot, e.g.,
label1 .word 1234ABCDh
label2 .float 3.14159
Most arrays are trivial too:
label3 .word 1, 2, 3
Array specifications can span multiple lines, however. For example:
label4 .float 0.0, 0.5, 1.0
.float 1.5, 2.0, 2.5
At first, I used a regular expression to feed individual values into
a sub that kept track of the last label grabbed and determined whether
the current value was a new parameter or a continuation of an array.
The code -- and the approach, really -- was unsatisfying, so I
considered a two-pass scan: grab and decompose the chunks and then
coalesce the arrays in a second pass. I made a start in that
direction but didn't like the way it was playing out.
I saw that scanning an entire array would be straightforward too.
I could safely look ahead. Lines without labels continued the
current array, and I could pretend the values were on one line by
appending to the end of what I've already recognized.
If the lookahead line had a label, I could process what I had and then
C<redo> to process the lookahead line that's already in $_.
Here's a sketch of the code:
while (<$fh>) {
next unless /^(\w+)\s+\.(word|float)\s+(.+?),?\s*$/;
my($label,$type,$data) = ($1,$2,$3);
# look for continued spec
my $needredo = 0;
while (<$fh>) {
if (/^\s*\.(word|float)\s+(.+?),?\s*$/) {
$data .= ", $2";
}
else {
$needredo = 1;
}
}
# now $label, $type, and $data comprise an
# entire parameter
...;
redo if $needredo;
}
That's already kind of klunky, but I also saw that the inner while loop
will exhaust the input, which the outer loop's implicitly tests too. I
tested for C<eof $fh> at each iteration of the inner loop and reset
$needredo if I needed to C<last> out of the inner loop.
The code now feels very klunky. Is there a more elegant way to code
this scan?
Greg