Hi,
I am reading in an comma separator file,
line 1: Field 1, field 2, field 3.., field n
line 2: Field 1, field 2, field 3.., field n
I would like to create a data structure to representing the whole
file.
What is an efficient and easy way to store the above such that, for a
given value for field i, I can extract an array of values for field j,
where 1<=i,j<=n? It's a bit like asking what the best way to store a
spreadsheet is.
Hash of anonymous arrays? How will the syntax be?
First off, have you tried DBD-CSV?
http://search.cpan.org/~jzucker/DBD-CSV-0.22/lib/DBD/CSV.pm
This has the advantage that your code can keep on working when the data
are moved to a real database later on.
* How large is the data matrix?
* How large is the typical file?
* How large is the typical result set?
If the file is smallish (dunno, maybe fewer than 10,000 lines with say
10 fields each), the contents static and simple, the queries really as
simple as you make them out to be the obvious solution would be the way
to go.
Obvious solution follows. Generalize as required.
#!/usr/bin/perl
use strict;
use warnings;
use Data:
umper;
my @data;
while ( my $obs = <DATA> ) {
$obs =~ s/^\s+//;
$obs =~ s/\s+$//;
next unless length $obs;
push @data, [ split /\s+/, $obs ];
}
print Dumper vlookup( \@data, 2, 50, 3 );
print Dumper vlookup( \@data, 0, 100, 0);
print Dumper vlookup( \@data, 3, 0, 3);
print Dumper vlookup( \@data, 3, 0, 1);
sub vlookup {
my ( $data, $key_col, $key_val, $lookup_col ) = @_;
my @retlist;
for my $obs ( @$data ) {
if ( $obs->[ $key_col ] == $key_val ) {
push @retlist, $obs->[ $lookup_col ];
}
}
return \@retlist;
}
__DATA__
70 140 45 85
190 150 40 30
80 40 45 30
110 20 30 5
50 140 70 95
50 80 60 65
170 150 45 40
150 0 30 0
10 30 50 60
30 130 70 100
110 160 75 75
60 120 80 80
170 130 33 30
90 140 75 75
120 90 35 35
130 40 5 5
50 170 100 100
120 30 5 5
90 170 90 90
190 150 30 30
60 20 30 30
130 150 70 60
0 20 60 60
40 120 70 90
70 160 100 95
180 150 20 35
110 70 40 30
140 140 50 50
170 30 0 0
70 130 60 80
80 20 20 20
100 130 65 65
190 20 0 0
110 30 0 10
180 70 0 0
180 50 0 0
40 120 45 90
60 70 55 55
140 160 60 60
60 140 90 90
150 80 20 15
40 10 30 35
160 100 20 20
80 170 90 95
150 90 30 20
90 70 40 40
40 0 30 30
170 50 0 0
110 180 80 85