C
Cheez
Hi there,
I have a script that inserts and updates a mySQL table using data from
a text file. I am checking for duplicate records by first doing a
SELECT. If the record exists and 1) increase a counter by one and 2)
concatenate text in a field with a string. If the record doesn't
exist, I INSERT the data normally.
It's slow. I am performing this update with about 50,000 records at a
time. I think there's an exponential increase in processing time as
the table grows in size. Initially I have about 40 of these files to
process to create a table with about 2 million records.
My questions are:
1. using Perl is this the most efficient approach?
2. would a native SQL query using the database GUI be faster?
3. I am really poor at choosing when and where for 'die' statements.
Any pointers of where I should put these would be great.
Thanks,
Cheez
code below:
print "\nINSERTER_SEQUENCE: inserting data into table\n";
use DBI;
my $dbh = DBI->connect("dbi:mysql:mydb", "", "");
open(FILE, "seqparsed.txt");
while (my $data = <FILE>) {
($wherefound,$sequence) = split /\t/, $data;
dupcheck (); #this sub does all the work
}
close (FILE);
sub dupcheck {
$sth = $dbh->prepare("SELECT sequence, wherefound FROM
sequence5 WHERE sequence=?");
$sth->execute($sequence);
#selecting any records that match $sequence
if (@row = $sth->fetchrow_array) {
#a match results in frequency++
#and update of where it was found
my ($seq, $wheref) = @row;
$wherefound = "$wherefound, $wheref";
$sth = $dbh->prepare("update sequence5 SET
frequency=frequency+1, wherefound=? WHERE sequence=?");
$sth->execute($wherefound,$sequence);
}
else {
# if no records match $sequence then INSERT new data
$sth = $dbh->prepare("INSERT INTO sequence5 (wherefound,
sequence) VALUES (?, ?)");
$sth->execute( $wherefound, $sequence);
}
}
I have a script that inserts and updates a mySQL table using data from
a text file. I am checking for duplicate records by first doing a
SELECT. If the record exists and 1) increase a counter by one and 2)
concatenate text in a field with a string. If the record doesn't
exist, I INSERT the data normally.
It's slow. I am performing this update with about 50,000 records at a
time. I think there's an exponential increase in processing time as
the table grows in size. Initially I have about 40 of these files to
process to create a table with about 2 million records.
My questions are:
1. using Perl is this the most efficient approach?
2. would a native SQL query using the database GUI be faster?
3. I am really poor at choosing when and where for 'die' statements.
Any pointers of where I should put these would be great.
Thanks,
Cheez
code below:
print "\nINSERTER_SEQUENCE: inserting data into table\n";
use DBI;
my $dbh = DBI->connect("dbi:mysql:mydb", "", "");
open(FILE, "seqparsed.txt");
while (my $data = <FILE>) {
($wherefound,$sequence) = split /\t/, $data;
dupcheck (); #this sub does all the work
}
close (FILE);
sub dupcheck {
$sth = $dbh->prepare("SELECT sequence, wherefound FROM
sequence5 WHERE sequence=?");
$sth->execute($sequence);
#selecting any records that match $sequence
if (@row = $sth->fetchrow_array) {
#a match results in frequency++
#and update of where it was found
my ($seq, $wheref) = @row;
$wherefound = "$wherefound, $wheref";
$sth = $dbh->prepare("update sequence5 SET
frequency=frequency+1, wherefound=? WHERE sequence=?");
$sth->execute($wherefound,$sequence);
}
else {
# if no records match $sequence then INSERT new data
$sth = $dbh->prepare("INSERT INTO sequence5 (wherefound,
sequence) VALUES (?, ?)");
$sth->execute( $wherefound, $sequence);
}
}