eof and nested while (<$fh>) {...}

G

Greg Bacon

I was writing code to scan an assembly-language definition of
operational data and produce a report and ended up writing code
that gave me the "there has to be a better way" feeling.

Single parameters are easy to spot, e.g.,

label1 .word 1234ABCDh
label2 .float 3.14159

Most arrays are trivial too:

label3 .word 1, 2, 3

Array specifications can span multiple lines, however. For example:

label4 .float 0.0, 0.5, 1.0
.float 1.5, 2.0, 2.5

At first, I used a regular expression to feed individual values into
a sub that kept track of the last label grabbed and determined whether
the current value was a new parameter or a continuation of an array.

The code -- and the approach, really -- was unsatisfying, so I
considered a two-pass scan: grab and decompose the chunks and then
coalesce the arrays in a second pass. I made a start in that
direction but didn't like the way it was playing out.

I saw that scanning an entire array would be straightforward too.
I could safely look ahead. Lines without labels continued the
current array, and I could pretend the values were on one line by
appending to the end of what I've already recognized.

If the lookahead line had a label, I could process what I had and then
C<redo> to process the lookahead line that's already in $_.

Here's a sketch of the code:

while (<$fh>) {
next unless /^(\w+)\s+\.(word|float)\s+(.+?),?\s*$/;
my($label,$type,$data) = ($1,$2,$3);


# look for continued spec
my $needredo = 0;
while (<$fh>) {
if (/^\s*\.(word|float)\s+(.+?),?\s*$/) {
$data .= ", $2";
}
else {
$needredo = 1;
}
}

# now $label, $type, and $data comprise an
# entire parameter
...;

redo if $needredo;
}

That's already kind of klunky, but I also saw that the inner while loop
will exhaust the input, which the outer loop's implicitly tests too. I
tested for C<eof $fh> at each iteration of the inner loop and reset
$needredo if I needed to C<last> out of the inner loop.

The code now feels very klunky. Is there a more elegant way to code
this scan?

Greg
 
S

Steven Kuo

I was writing code to scan an assembly-language definition of
operational data and produce a report and ended up writing code
that gave me the "there has to be a better way" feeling.

Single parameters are easy to spot, e.g.,

label1 .word 1234ABCDh
label2 .float 3.14159

Most arrays are trivial too:

label3 .word 1, 2, 3

Array specifications can span multiple lines, however. For example:

label4 .float 0.0, 0.5, 1.0
.float 1.5, 2.0, 2.5

(snipped)

Here's a sketch of the code:

while (<$fh>) {
next unless /^(\w+)\s+\.(word|float)\s+(.+?),?\s*$/;
my($label,$type,$data) = ($1,$2,$3);


# look for continued spec
my $needredo = 0;
while (<$fh>) {
if (/^\s*\.(word|float)\s+(.+?),?\s*$/) {
$data .= ", $2";
}
else {
$needredo = 1;
}
}

# now $label, $type, and $data comprise an
# entire parameter
...;

redo if $needredo;
}

That's already kind of klunky, but I also saw that the inner while loop
will exhaust the input, which the outer loop's implicitly tests too. I
tested for C<eof $fh> at each iteration of the inner loop and reset
$needredo if I needed to C<last> out of the inner loop.

The code now feels very klunky. Is there a more elegant way to code
this scan?




I don't think nested loops are needed. How about:

#!/usr/local/bin/perl
use strict;
use warnings;

my ($label, $type, $data);

while (<DATA>) {
if (/^\s*\.(word|float)\s+(.+?),?\s*$/) {
$data .= ", $2";
} elsif (/^(\w+)\s+\.(word|float)\s+(.+?),?\s*$/) {
do_stuff($label, $type, $data) if ($label); # previously found label
($label, $type, $data) = ($1, $2, $3);
}

}

do_stuff($label, $type, $data) if ($label);

sub do_stuff {
print "$label, $type, $data\n";
}

__DATA__

label1 .word 1234ABCDh
label2 .float 3.14159

label3 .word 1, 2, 3

label4 .float 0.0, 0.5, 1.0
.float 1.5, 2.0, 2.5
 
A

Anno Siegel

Greg Bacon said:
I was writing code to scan an assembly-language definition of
operational data and produce a report and ended up writing code
that gave me the "there has to be a better way" feeling.

Single parameters are easy to spot, e.g.,

label1 .word 1234ABCDh
label2 .float 3.14159

Most arrays are trivial too:

label3 .word 1, 2, 3

Array specifications can span multiple lines, however. For example:

label4 .float 0.0, 0.5, 1.0
.float 1.5, 2.0, 2.5

[snip]

Ah, ye olde continuation line problem. Subtype 2, where you know if a
line *is* a continuation but not if a line *has* a continuation.

Here is one way of doing that. A continuation line is one that starts
with 10 blanks.

my $coll = '';
while ( <DATA> ) {
chomp;
if ( substr( $_, 0, 10) =~ /\S/ ) {
print "$coll\n" if length $coll;
$coll = $_;
} else {
$coll .= $_;
}
}
print "$coll\n" if length $coll;

This only collects continued lines into one. It would be simple
to add further processing to the loop so that it spits out ready-
to-use records.

Anno
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top