dynamically naming arrays

C

ccc31807

I'm processing a SQL file, and want to stuff the data into arrays that
will substitute for the database tables. I want each array element
element to be a reference to a hash. The index of the array will be my
'key', and the hash will represent the columns with the associated
values. In the real database, the key is a serial value which is
exactly the behavior of the array.

Since the SQL file will vary, I don't want to hard code the table
names, but dynamically create the arrays from the file. I've copied a
sample of the file below (from 'Beginning Databases with PostgreSQL'
by Matthew and Stoned), and the subroutine I'm using to process the
file. I create the $table, $keys, and $values for each row. I can use
a hash slice to create the hashes, but how do I create the appropriate
array (named after the table name)?

__CODE__
sub process_infile
{
open INFILE, '<', 'pop-all-tables.sql' or die "Cannot process
INFILE, $!";
while (<INFILE>)
{
next unless /^insert/ ;
chomp;
$_ =~ /^insert\s+into\s+ # match and skip 'insert into '
([^(]+)\( # match and save any word
character before the first open parens as $1
([^)]+)\) # match and save any word
character between the first set of parens as $2
\ values\( # match and skip ' values'
([^)]+)\) # match and save any word
character between the second set of parens as $3
/x;
my $table = $1;
my $keys = $2;
my $values = $3;
my @keys = parse_line(',', 0, $keys);
my @values = parse_line(',', 0, $values);
#print " TABLE $table: KEYS: [@keys] => VALUES: [@values]
\n";
my %hash;
@hash{@keys} = @values;
#foreach my $key (keys %hash) { print "$key => $hash{$key}
\n"; }
my $test = eval($table);
print "test is $test\n";
#push @{eval($table)}, \%hash;
}
close INFILE;
}

__SQL__
insert into customer(title, fname, lname, addressline, town, zipcode,
phone) values('Miss','Jenny','Stones','27 Rowan
Avenue','Hightown','NT2 1AQ','023 9876');
insert into customer(title, fname, lname, addressline, town, zipcode,
phone) values('Mr','Andrew','Stones','52 The Willows','Lowtown','LT5
7RA','876 3527');
insert into customer(title, fname, lname, addressline, town, zipcode,
phone) values('Miss','Alex','Matthew','4 The Street','Nicetown','NT2
2TX','010 4567');
insert into customer(title, fname, lname, addressline, town, zipcode,
phone) values('Mr','Adrian','Matthew','The Barn','Yuleville','YV67
2WR','487 3871');
insert into customer(title, fname, lname, addressline, town, zipcode,
phone) values('Mr','Simon','Cozens','7 Shady Lane','Oakenham','OA3
6QW','514 5926');
insert into customer(title, fname, lname, addressline, town, zipcode,
phone) values('Mr','Neil','Matthew','5 Pasture Lane','Nicetown','NT3
7RT','267 1232');
insert into customer(title, fname, lname, addressline, town, zipcode,
phone) values('Mr','Richard','Stones','34 Holly Way','Bingham','BG4
2WE','342 5982');
insert into customer(title, fname, lname, addressline, town, zipcode,
phone) values('Mrs','Ann','Stones','34 Holly Way','Bingham','BG4
2WE','342 5982');
insert into customer(title, fname, lname, addressline, town, zipcode,
phone) values('Mrs','Christine','Hickman','36 Queen
Street','Histon','HT3 5EM','342 5432');
insert into customer(title, fname, lname, addressline, town, zipcode,
phone) values('Mr','Mike','Howard','86 Dysart Street','Tibsville','TB3
7FG','505 5482');
insert into customer(title, fname, lname, addressline, town, zipcode,
phone) values('Mr','Dave','Jones','54 Vale Rise','Bingham','BG3
8GD','342 8264');
insert into customer(title, fname, lname, addressline, town, zipcode,
phone) values('Mr','Richard','Neill','42 Thatched
Way','Winnersby','WB3 6GQ','505 6482');
insert into customer(title, fname, lname, addressline, town, zipcode,
phone) values('Mrs','Laura','Hardy','73 Margarita Way','Oxbridge','OX2
3HX','821 2335');
insert into customer(title, fname, lname, addressline, town, zipcode,
phone) values('Mr','Bill','O\'Neill','2 Beamer Street','Welltown','WT3
8GM','435 1234');
insert into customer(title, fname, lname, addressline, town, zipcode,
phone) values('Mr','David','Hudson','4 The Square','Milltown','MT2
6RT','961 4526');
 
C

ccc31807

Never mind. I bit the bullet, did an Indiana Jones, and used a global
hash instead.

Each hash element is a reference to an array, and each array is a
reference to a hash. The syntax is the typical Perl 'write once, read
never' mess, but it works.

CC.
 
U

Uri Guttman

c> Never mind. I bit the bullet, did an Indiana Jones, and used a global
c> hash instead.

global? no need for a global. just pass the reference around.

also XML::Simple will parse the xml to a perl data tree in one call.

c> Each hash element is a reference to an array, and each array is a
c> reference to a hash. The syntax is the typical Perl 'write once,
c> read never' mess, but it works.

arrays can't be references to hashes. they can CONTAIN hash refs. that
is called a perl data tree and is very common.

and that is YOUR typical mess. good perl code doesn't make a mess of
data.

uri
 
C

ccc31807

and that is YOUR typical mess. good perl code doesn't make a mess of
data.

Okay, Uir, I'm teachable. Here is my code and the output. Give me some
pointers on how to clean up the mess.

Thanks, CC.

__CODE__
sub process_infile
{
open INFILE, '<', 'pop-all-tables.sql' or die "Cannot process
INFILE, $!";
while (<INFILE>)
{
next unless /^insert/ ;
chomp;
$_ =~ /^insert\s+into\s+ # match and skip 'insert into '
([^(]+)\( # match and save any word
character before the first open parens as $1
([^)]+)\) # match and save any word
character between the first set of parens as $2
\ values\( # match and skip ' values'
([^)]+)\) # match and save any word
character between the second set of parens as $3
/x;
my $table = $1;
my $keys = $2;
my $values = $3;
$table =~ s/ //g;
$keys =~ s/ //g;
my @keys = parse_line(',', 0, $keys);
my @values = parse_line(',', 0, $values);
#print " TABLE $table: KEYS: [@keys] => VALUES: [@values]
\n";
my %hash;
@hash{@keys} = @values;
#foreach my $key (keys %hash) { print "$key => $hash{$key}
\n"; }
push @{$database{$table}}, \%hash;
}
close INFILE;
}

sub test_hash
{
foreach my $k1 (keys %database)
{
print "$k1 => $database{$k1}\n";
foreach my $ele (@{$database{$k1}})
{
print " $ele =>\n";
foreach my $k2 (keys %{$ele})
{
print " $k2 => $ele->{$k2}\n";
}
}
}
}

__OUTPUT__
customer => ARRAY(0x184796c)
HASH(0x184768c) =>
addressline => 27 Rowan Avenue
lname => Stones
fname => Jenny
town => Hightown
title => Miss
phone => 023 9876
zipcode => NT2 1AQ
HASH(0x184797c) =>
addressline => 52 The Willows
lname => Stones
fname => Andrew
town => Lowtown
title => Mr
phone => 876 3527
zipcode => LT5 7RA
HASH(0x1847a3c) =>
addressline => 4 The Street
lname => Matthew
fname => Alex
town => Nicetown
title => Miss
phone => 010 4567
zipcode => NT2 2TX
HASH(0x1847aec) =>
addressline => The Barn
lname => Matthew
fname => Adrian
town => Yuleville
title => Mr
phone => 487 3871
zipcode => YV67 2WR
HASH(0x1847b7c) =>
addressline => 7 Shady Lane
lname => Cozens
fname => Simon
town => Oakenham
title => Mr
phone => 514 5926
zipcode => OA3 6QW
HASH(0x1847c0c) =>
addressline => 5 Pasture Lane
lname => Matthew
fname => Neil
town => Nicetown
title => Mr
phone => 267 1232
zipcode => NT3 7RT
HASH(0x1847c9c) =>
addressline => 34 Holly Way
lname => Stones
fname => Richard
town => Bingham
title => Mr
phone => 342 5982
zipcode => BG4 2WE
HASH(0x1847d2c) =>
addressline => 34 Holly Way
lname => Stones
fname => Ann
town => Bingham
title => Mrs
phone => 342 5982
zipcode => BG4 2WE
HASH(0x1847dbc) =>
addressline => 36 Queen Street
lname => Hickman
fname => Christine
town => Histon
title => Mrs
phone => 342 5432
zipcode => HT3 5EM
HASH(0x1847e4c) =>
addressline => 86 Dysart Street
lname => Howard
fname => Mike
town => Tibsville
title => Mr
phone => 505 5482
zipcode => TB3 7FG
HASH(0x1847edc) =>
addressline => 54 Vale Rise
lname => Jones
fname => Dave
town => Bingham
title => Mr
phone => 342 8264
zipcode => BG3 8GD
HASH(0x1847f6c) =>
addressline => 42 Thatched Way
lname => Neill
fname => Richard
town => Winnersby
title => Mr
phone => 505 6482
zipcode => WB3 6GQ
HASH(0x1847ffc) =>
addressline => 73 Margarita Way
lname => Hardy
fname => Laura
town => Oxbridge
title => Mrs
phone => 821 2335
zipcode => OX2 3HX
HASH(0x184808c) =>
addressline => 2 Beamer Street
lname => O\'Neill
fname => Bill
town => Welltown
title => Mr
phone => 435 1234
zipcode => WT3 8GM
HASH(0x184811c) =>
addressline => 4 The Square
lname => Hudson
fname => David
town => Milltown
title => Mr
phone => 961 4526
zipcode => MT2 6RT
orderinfo => ARRAY(0x187145c)
HASH(0x18713ec) =>
date_placed => 03-13-2004
customer_id => 3
date_shipped => 03-17-2004
shipping => 2.99
HASH(0x187146c) =>
date_placed => 06-23-2004
customer_id => 8
date_shipped => 06-24-2004
shipping => 0.00
HASH(0x18714cc) =>
date_placed => 09-02-2004
customer_id => 15
date_shipped => 09-12-2004
shipping => 3.99
HASH(0x187154c) =>
date_placed => 09-03-2004
customer_id => 13
date_shipped => 09-10-2004
shipping => 2.99
HASH(0x18715ac) =>
date_placed => 07-21-2004
customer_id => 8
date_shipped => 07-24-2004
shipping => 0.00
item => ARRAY(0x184817c)
HASH(0x18481ac) =>
sell_price => 21.95
description => Wood Puzzle
cost_price => 15.23
HASH(0x184821c) =>
sell_price => 11.49
description => Rubik Cube
cost_price => 7.45
HASH(0x184825c) =>
sell_price => 2.49
description => Linux CD
cost_price => 1.99
HASH(0x18481ec) =>
sell_price => 3.99
description => Tissues
cost_price => 2.11
HASH(0x18482cc) =>
sell_price => 9.95
description => Picture Frame
cost_price => 7.54
HASH(0x184831c) =>
sell_price => 15.75
description => Fan Small
cost_price => 9.23
HASH(0x184836c) =>
sell_price => 19.95
description => Fan Large
cost_price => 13.36
HASH(0x18483bc) =>
sell_price => 1.45
description => Toothbrush
cost_price => 0.75
HASH(0x184840c) =>
sell_price => 2.45
description => Roman Coin
cost_price => 2.34
HASH(0x184845c) =>
sell_price => 0.0
description => Carrier Bag
cost_price => 0.01
HASH(0x18484ac) =>
sell_price => 25.32
description => Speakers
cost_price => 19.73
barcode => ARRAY(0x184854c)
HASH(0x18484fc) =>
barcode_ean => 6241527836173
item_id => 1
HASH(0x1870f0c) =>
barcode_ean => 6241574635234
item_id => 2
HASH(0x184851c) =>
barcode_ean => 6264537836173
item_id => 3
HASH(0x1870f4c) =>
barcode_ean => 6241527746363
item_id => 3
HASH(0x1870f8c) =>
barcode_ean => 7465743843764
item_id => 4
HASH(0x1870fcc) =>
barcode_ean => 3453458677628
item_id => 5
HASH(0x187100c) =>
barcode_ean => 6434564564544
item_id => 6
HASH(0x187104c) =>
barcode_ean => 8476736836876
item_id => 7
HASH(0x187108c) =>
barcode_ean => 6241234586487
item_id => 8
HASH(0x18710cc) =>
barcode_ean => 9473625532534
item_id => 8
HASH(0x187110c) =>
barcode_ean => 9473627464543
item_id => 8
HASH(0x187114c) =>
barcode_ean => 4587263646878
item_id => 9
HASH(0x187118c) =>
barcode_ean => 9879879837489
item_id => 11
HASH(0x18711cc) =>
barcode_ean => 2239872376872
item_id => 11
orderline => ARRAY(0x187166c)
HASH(0x187160c) =>
quantity => 1
orderinfo_id => 1
item_id => 4
HASH(0x187167c) =>
quantity => 1
orderinfo_id => 1
item_id => 7
HASH(0x187163c) =>
quantity => 1
orderinfo_id => 1
item_id => 9
HASH(0x187170c) =>
quantity => 1
orderinfo_id => 2
item_id => 1
HASH(0x187175c) =>
quantity => 1
orderinfo_id => 2
item_id => 10
HASH(0x18717ac) =>
quantity => 2
orderinfo_id => 2
item_id => 7
HASH(0x18717fc) =>
quantity => 2
orderinfo_id => 2
item_id => 4
HASH(0x187184c) =>
quantity => 1
orderinfo_id => 3
item_id => 2
HASH(0x187189c) =>
quantity => 1
orderinfo_id => 3
item_id => 1
HASH(0x18718ec) =>
quantity => 2
orderinfo_id => 4
item_id => 5
HASH(0x187193c) =>
quantity => 1
orderinfo_id => 5
item_id => 1
HASH(0x187198c) =>
quantity => 1
orderinfo_id => 5
item_id => 3
stock => ARRAY(0x187125c)
HASH(0x187120c) =>
quantity => 12
item_id => 1
HASH(0x18712ac) =>
quantity => 2
item_id => 2
HASH(0x187128c) =>
quantity => 8
item_id => 4
HASH(0x18712ec) =>
quantity => 3
item_id => 5
HASH(0x187132c) =>
quantity => 8
item_id => 7
HASH(0x187136c) =>
quantity => 18
item_id => 8
HASH(0x18713ac) =>
quantity => 1
item_id => 10
 
U

Uri Guttman

c> sub process_infile
c> {
c> open INFILE, '<', 'pop-all-tables.sql' or die "Cannot process
c> INFILE, $!";

don't use bareword file handles. use lexicals.

open my $infile, '<', 'pop-all-tables.sql' or
die "Cannot open infile $!" ;


c> while (<INFILE>)
c> {

avoid the use of $_ when you can use named variables. much better for
the reader of the code.

while( my $line = <$infile> )

c> next unless /^insert/ ;
c> chomp;
c> $_ =~ /^insert\s+into\s+ # match and skip 'insert into '

why skip here when you look for insert just before?

since you are using /x mode it is usually better to use a pair delim
like {} for this.


c> ([^(]+)\( # match and save any word
c> character before the first open parens as $1

if it is a word char, use \w. also you say singular word char in the
comment but show + in the regex. which is it? be accurate when
commenting on regexes.


c> ([^)]+)\) # match and save any word
c> character between the first set of parens as $2

ditto


c> ([^)]+)\) # match and save any word
c> character between the second set of parens as $3
c> /x;

you don't check the regex for success. what happens if it fails?

c> my $table = $1;
c> my $keys = $2;
c> my $values = $3;

you could just assign those from the regex and save the use of $1, etc.

my( $table, $keys, $values ) = $line =~ m{ .... }x

c> my @keys = parse_line(',', 0, $keys);

what is parse_line()?

c> my @values = parse_line(',', 0, $values);
c> #print " TABLE $table: KEYS: [@keys] => VALUES: [@values]
c> \n";
c> my %hash;
c> @hash{@keys} = @values;
c> #foreach my $key (keys %hash) { print "$key => $hash{$key}
c> \n"; }
c> push @{$database{$table}}, \%hash;
c> }
c> close INFILE;
c> }

nothing too grotty there. just building up some hashes in a hash. common
stuff.

c> sub test_hash
c> {
c> foreach my $k1 (keys %database)
c> {

for this code learn to use the each func. much easier when looping over
a hash:

while( my( $key, $val ) = each %database ) {

now $val will be a hash ref and you do the same with %{$val} for the
next level. choosing a better name for $val would be good too but i
don't know the context of what is in the hash. or if this is for
debugging, just use Data::Dumper.

uri
 
C

ccc31807

Thank you. Please see comments below.
don't use bareword file handles. use lexicals.
Okay

avoid the use of $_ when you can use named variables. much better for
the reader of the code.
Okay.

why skip here when you look for insert just before?

It doesn't cause any harm, does it? I built up the RE piece by piece
and left it the way it was.
if it is a word char, use \w. also you say singular word char in the
comment but show + in the regex. which is it? be accurate when
commenting on regexes.

I was actually looking for any character (not necessarily a word
character) that isn't a literal parens.
you don't check the regex for success. what happens if it fails?

Good idea! Why didn't I think of that?
you could just assign those from the regex and save the use of $1, etc.

See your comment above about readability and using named variables. I
wanted the $1 to be explicit in the code, lest I look at it tomorrow
and wonder what I had been smoking the day before.
what is parse_line()?

It's an exported function in Text::parseWords.
for this code learn to use the each func. much easier when looping over
a hash:

        while( my( $key, $val ) = each %database ) {

now $val will be a hash ref and you do the same with %{$val} for the
next level. choosing a better name for $val would be good too but i
don't know the context of what is in the hash. or if this is for
debugging, just use Data::Dumper.

Dumper doesn't name the hash elements, using $VAR1, etc. I use Dumper
for quick tests, but when I really want to look at what's going on, I
prefer to use a custom function.

CC.
 
U

Uri Guttman

c> I was actually looking for any character (not necessarily a word
c> character) that isn't a literal parens.

then say that in your comment. comments which don't accurately reflect
the code are nasty.

c> Good idea! Why didn't I think of that?

that should be a standard idiom for you.

c> See your comment above about readability and using named variables. I
c> wanted the $1 to be explicit in the code, lest I look at it tomorrow
c> and wonder what I had been smoking the day before.

but you don't even NEED $1 if you assign grabbed stuff to a list. and
your comments should say what is being grabbed and assigned. that is a
useful regex comment.

c> Dumper doesn't name the hash elements, using $VAR1, etc. I use Dumper
c> for quick tests, but when I really want to look at what's going on, I
c> prefer to use a custom function.

no, Dumper can use whatever you tell it too for the top level. and you
can take the dumper text and simply do an s/// on $VAR1 to whatever you
want. writing code just to dump a hash tree is silly.

uri
 
C

ccc31807

Thanks for pointing this out to me, CC.

Regexes are good for processing strings of characters,
tranliteration is good for processing characters.

You are processing characters.

    $table =~ tr/ //d;
    $keys  =~ tr/ //d;

has the same effect and will run faster too...

--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.liamg\100cm.j.dat/"
The above message is a Usenet post.
I don't recall having given anyone permission to use it on a Web site.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top