A
alt.testing
Hi all,
I am writing a script to parse files, and insert data into mysql.
The task is simple enough with files containing "standard" fields.
However; there are many files, and this is not the case.
Some of the files even vary in the number of fields therein.
Example: (fields are email, name, postcode, phone)
(e-mail address removed), Firstname Lastname
(e-mail address removed), Firstname Lastname, 2004, 0412 321 512
(e-mail address removed), Firstname Lastname, 0412 321 512
Now; other than the obvious and easy solution of breaking up the files
into chunks that are "known" and consistent in themselves, in terms of
data fields, I want to build a mechanism that can:
1. Autodetect the number of fields and "line-by-line" respectively
build the data structure as it goes.
2. Verify (or guess the "type" of field using regex)
I don't mind using modules, but would prefer to use ones shipped as
standard. Else, build my own, as I really want to start a bit of "OO",
and this could be a good start.
I have a felling, that creating a class, and building some methods
that can create objects (each respective to a different set) that
reference/manipulate the actual data structures (or something similar)
might be a good approach. This way operations can actually be built on
the fly? Mind you, I've not yet created a module, so this is my first
time. Best approach, or something else, perhaps?
Could anyone suggest some things, that I might try?
tia
Full Context (some rough ideas as a starting point)
===============================================================================
#!/usr/bin/perl
use strict;
use warnings;
use DBI;
my $email_index;
my $name_index;
my $location_index;
my $mobile_index;
my $input_file = $ARGV[0];
my @working_data_array;
my $email;
my $mobile;
my $name;
my $location;
my $counter;
my $email_regex = qr/^
*[a-zA-Z0-9_.-]*@[a-zA-Z0-9_.-]*\.[a-zA-Z0-9_.-]*/;
my $mobile_regex = qr/^ *[04][0-9 ]{8,12}/;
my $name_regex = qr/^ *[a-z -]*/;
my $location_regex = qr/^ *[a-zA-Z0-9 ]*/;
&set_indexes;
open ( IN_FILE, "< $input_file" ) or die "$!";
while ( <IN_FILE> ) {
next unless ( /@/ );
chomp;
@working_data_array = split( /,/ );
$email = $working_data_array[$email_index];
$name = $working_data_array[$name_index];
$location = $working_data_array[$location_index];
$mobile = $working_data_array[$mobile_index];
print "$email";
print "$name";
print "$location";
print "$mobile\n";
}
close IN_FILE;
exit;
sub set_indexes() {
for $counter ( 0 .. $#ARGV ){
$email_index = $counter-1 if ( $ARGV[$counter] =~ /email/ );
$name_index = $counter-1 if ( $ARGV[$counter] =~ /name/ );
$location_index = $counter-1 if ( $ARGV[$counter] =~ /location/ );
$mobile_index = $counter-1 if ( $ARGV[$counter] =~ /mobile/ );
}
}
I am writing a script to parse files, and insert data into mysql.
The task is simple enough with files containing "standard" fields.
However; there are many files, and this is not the case.
Some of the files even vary in the number of fields therein.
Example: (fields are email, name, postcode, phone)
(e-mail address removed), Firstname Lastname
(e-mail address removed), Firstname Lastname, 2004, 0412 321 512
(e-mail address removed), Firstname Lastname, 0412 321 512
Now; other than the obvious and easy solution of breaking up the files
into chunks that are "known" and consistent in themselves, in terms of
data fields, I want to build a mechanism that can:
1. Autodetect the number of fields and "line-by-line" respectively
build the data structure as it goes.
2. Verify (or guess the "type" of field using regex)
I don't mind using modules, but would prefer to use ones shipped as
standard. Else, build my own, as I really want to start a bit of "OO",
and this could be a good start.
I have a felling, that creating a class, and building some methods
that can create objects (each respective to a different set) that
reference/manipulate the actual data structures (or something similar)
might be a good approach. This way operations can actually be built on
the fly? Mind you, I've not yet created a module, so this is my first
time. Best approach, or something else, perhaps?
Could anyone suggest some things, that I might try?
tia
Full Context (some rough ideas as a starting point)
===============================================================================
#!/usr/bin/perl
use strict;
use warnings;
use DBI;
my $email_index;
my $name_index;
my $location_index;
my $mobile_index;
my $input_file = $ARGV[0];
my @working_data_array;
my $email;
my $mobile;
my $name;
my $location;
my $counter;
my $email_regex = qr/^
*[a-zA-Z0-9_.-]*@[a-zA-Z0-9_.-]*\.[a-zA-Z0-9_.-]*/;
my $mobile_regex = qr/^ *[04][0-9 ]{8,12}/;
my $name_regex = qr/^ *[a-z -]*/;
my $location_regex = qr/^ *[a-zA-Z0-9 ]*/;
&set_indexes;
open ( IN_FILE, "< $input_file" ) or die "$!";
while ( <IN_FILE> ) {
next unless ( /@/ );
chomp;
@working_data_array = split( /,/ );
$email = $working_data_array[$email_index];
$name = $working_data_array[$name_index];
$location = $working_data_array[$location_index];
$mobile = $working_data_array[$mobile_index];
print "$email";
print "$name";
print "$location";
print "$mobile\n";
}
close IN_FILE;
exit;
sub set_indexes() {
for $counter ( 0 .. $#ARGV ){
$email_index = $counter-1 if ( $ARGV[$counter] =~ /email/ );
$name_index = $counter-1 if ( $ARGV[$counter] =~ /name/ );
$location_index = $counter-1 if ( $ARGV[$counter] =~ /location/ );
$mobile_index = $counter-1 if ( $ARGV[$counter] =~ /mobile/ );
}
}