Complex data structures and variable scope

H

Henry Law

I wonder if someone can help me with working out where to define my
variables so as to give the desired result here...

Personnel data is to be read from a file. Attributes for each
employee are presented as name/value pairs; some pairs occur once
only, but each person can have more than one qualification. The data
is to be accumulated in a hash, keyed by the person's name; each
person's hash contains their attributes, plus an reference to an array
containing their qualifications. This enables me (using other
programming not relevant here) to look up any employee by name and
present their attributes and qualifications.

Here's a runnable program which shows where I've got to. It gives the
wrong results and I'm trying to work out how to fix it.

--------------- <start of example> -----------------
#! C:\Perl\bin\perl.exe

use strict;
use warnings;
use Data::Dumper;

my %people;
my %attrs;
my @quals;

while (my $rec = <DATA>) {
#my %attrs;
#my @quals;
chomp $rec;
if ($rec) {
my ($name,$value) = split /:/, $rec;
if ($name eq "qual") {
push @quals,$value;
next;
}
$attrs{$name} = $value;
} else { # End of person
$attrs{quals} = \@quals;
$people{$attrs{name}} = \%attrs;
#undef %attrs;
#undef @quals;
}
}
print Dumper(%people);

__END__
name:alice
occupation:analyst
born:aberdeen
qual:a1
qual:a2

name:bob
occupation:baker
status:broken
qual:b1
qual:b2
---------------- <end of example> ------------------

This results in
$VAR1 = 'alice';
$VAR2 = {
'quals' => [
'a1',
'a2',
'b1',
'b2'
],
'status' => 'broken',
'name' => 'bob',
'born' => 'aberdeen',
'occupation' => 'baker'
};

I can see what's wrong: %attrs and @quals have scope outside the
file-read loop, so when the program gets to Bob's records they either
add to or replace Alice's. But I can't define them inside the read
loop because they then disappear and reappear with each record (you
can see where I tried this in the comments). I've also tried
undef-ing the arrays at the end of each person (also as shown),
without getting the desired result.

Can someone make some suggestions? I'm prepared to structure the
whole program differently but the data format is a given.
 
A

Anno Siegel

Henry Law said:
I wonder if someone can help me with working out where to define my
variables so as to give the desired result here...

Personnel data is to be read from a file. Attributes for each
employee are presented as name/value pairs; some pairs occur once
only, but each person can have more than one qualification. The data
is to be accumulated in a hash, keyed by the person's name; each
person's hash contains their attributes, plus an reference to an array
containing their qualifications. This enables me (using other
programming not relevant here) to look up any employee by name and
present their attributes and qualifications.

Here's a runnable program which shows where I've got to. It gives the
wrong results and I'm trying to work out how to fix it.

I haven't completely analyzed your code. The two remarks I added are
inessential to your problem.
--------------- <start of example> -----------------
#! C:\Perl\bin\perl.exe

use strict;
use warnings;
use Data::Dumper;

my %people;
my %attrs;
my @quals;

while (my $rec = <DATA>) {
#my %attrs;
#my @quals;
chomp $rec;
if ($rec) {
my ($name,$value) = split /:/, $rec;
if ($name eq "qual") {
push @quals,$value;
next;
}
$attrs{$name} = $value;
} else { # End of person
$attrs{quals} = \@quals;
$people{$attrs{name}} = \%attrs;
#undef %attrs;
#undef @quals;

The normal way to clear an aggregate is by setting it to (), so the
commented code should be

%attrs = ();
@quals = ();

Undef'ing does work as expected, but has the effect of destroying
the aggregate entirely. That is normally not what you want.
}
}
print Dumper(%people);

That dump is a bit misleading. Data::Dumper dumps references, so
that should be

print Dumper \ %people;
__END__
name:alice
occupation:analyst
born:aberdeen
qual:a1
qual:a2

name:bob
occupation:baker
status:broken
qual:b1
qual:b2
---------------- <end of example> ------------------

This results in
$VAR1 = 'alice';
$VAR2 = {
'quals' => [
'a1',
'a2',
'b1',
'b2'
],
'status' => 'broken',
'name' => 'bob',
'born' => 'aberdeen',
'occupation' => 'baker'
};

With the corrected dump the result looks a little more plausible:

$VAR1 = {
'alice' => {
'status' => 'broken',
'born' => 'aberdeen',
'name' => 'bob',
'quals' => [
'a1',
'a2',
'b1',
'b2'
],
'occupation' => 'baker'
}
};

though that's still not what you want.
I can see what's wrong: %attrs and @quals have scope outside the
file-read loop, so when the program gets to Bob's records they either
add to or replace Alice's. But I can't define them inside the read
loop because they then disappear and reappear with each record (you
can see where I tried this in the comments). I've also tried
undef-ing the arrays at the end of each person (also as shown),
without getting the desired result.

One difficulty is that you are reading the input line-wise, but
process it by paragraphs. This is always a bit tricky because
some passes through the loop are different from the others. Reading
the file paragraph-wise simplifies things because each loop processes
one person. In particular, you can now declare a lexical hash %person
in the loop body. It will be re-used each time through the loop (which
wouldn't work if each person required multiple passes).

my %people;
$/ = ''; # paragraph mode
while ( <DATA> ) {
my %person;
for ( split /\n/ ) { # process all lines for one person
my ( $key, $val) = split /:/;
if ( $key eq 'qual' ) {
push @{ $person{ quals}}, $val;
} else {
$person{ $key} = $val;
}
}
$people{ $person{ name}} = \ %person;
}

print Dumper \ %people;

__DATA__
(etc)

Anno
 
H

Henry Law

Reading
the file paragraph-wise simplifies things because each loop processes
one person.

That's the key suggestion; I hadn't even considered trying paragraphs.
In particular, you can now declare a lexical hash %person
in the loop body. It will be re-used each time through the loop (which
wouldn't work if each person required multiple passes).

.... which is precisely what I need.
my %people;
.... followed by code that does exactly what I want.
__DATA__
(etc)

Thank you Anno. How good it is that you're in a European time zone,
so I don't have to wait for Sinan, Paul etc to wake up :)
 
A

A. Sinan Unur

--------------- <start of example> -----------------
#! C:\Perl\bin\perl.exe

I have nothing useful to add to Anno's response, but I am just going to
point out that you do not need the shebang line above on Windows.

It is convenient to use

#!/usr/bin/perl

in case you later want to move the script to a *nix system. In setups I
have worked with, that tends to be symlinked to the system default perl.

Note that you can still specify options on the shebang line above on
Windows. perl will respect them.


Hope this helps.

Sinan
 
A

Anno Siegel

Henry Law said:
That's the key suggestion; I hadn't even considered trying paragraphs.

The key is really to have the loop operate one person at a time. Then
"my %person" recreates %person in sync with how it is consumed into
%people. Paragraph mode is a convenient way to do that, given the
data format.

An alternative would be to enclose an inner loop that builds a person
(well a %person, somewhat less of a challenge). The outer loop would
only ever read the first line of each person, like this:

my %people;
while ( <DATA> ) {
my %person;
while ( defined ) {
chomp;
last unless length;
my ( $key, $val) = split /:/;
if ( $key eq 'qual' ) {
push @{ $person{ quals}}, $val;
} else {
$person{ $key} = $val;
}
$_ = <DATA>;
}
$people{ $person{ name}} = \ %person;
}
__DATA__
...

It could be streamlined into doing all input in the inner loop.

Anno
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,823
Latest member
Nadia88

Latest Threads

Top