E
ela
I'm new to database programming and just previously learnt to use loops to
look up and enrich information using the following codes. However, when the
tables are large, I find this process is very slow. Then, somebody told me I
can build a database for one of the file real time and so no need to read
the file from the beginning till the end again and again. However, perl DBI
has a lot of sophisticated functions there and in fact my tables are only
large but nothing special, linked by an ID. Is there any simple way to
achieve the same purpose? I just wish the ID can be indexed and then
everytime I access the record through memory and not through I/O...
#!/usr/bin/perl
my ($listfile, $format, $accfile, $infofile) = @ARGV;
print '($listfile, $accfile, $infofile)'; <STDIN>;
print "Working on $listfile...\n";
$outname = $listfile . "_" . $infofile . ".xls";
open (OFP, ">$outname");
open(FP, $listfile);
$line = <FP>;
chomp $line;
if ($format ne "") {
@fields = split(/\t/, $line);
for ($i=0; $i<@fields; $i++) {
############## check fields ###############################
if ( $fields[$i] =~ /accession/) {
$acci = $i;
}
}
}
print OFP "$line\tgene info\n";
$nl = '\n';
while (<FP>) {
$line = $_;
if ($line eq "\n") {
print OFP $line;
next;
}
chomp $line;
if ($format eq "") {
@cells = split (/:/, $line);
$tag = $cells[0];
} else {
@cells = split (/\t/, $line);
$tag = $cells[$acci];
}
open(AFP, $accfile);
while (<AFP>) {
@cells = split (/\t/, $_);
if ($cells[5] =~ /$tag/) {
$des = $cells[1];
last;
}
}
close AFP;
if ($found == 0) {
print OFP "$line\tNo gene info available\n";
}
}
close FP;
look up and enrich information using the following codes. However, when the
tables are large, I find this process is very slow. Then, somebody told me I
can build a database for one of the file real time and so no need to read
the file from the beginning till the end again and again. However, perl DBI
has a lot of sophisticated functions there and in fact my tables are only
large but nothing special, linked by an ID. Is there any simple way to
achieve the same purpose? I just wish the ID can be indexed and then
everytime I access the record through memory and not through I/O...
#!/usr/bin/perl
my ($listfile, $format, $accfile, $infofile) = @ARGV;
print '($listfile, $accfile, $infofile)'; <STDIN>;
print "Working on $listfile...\n";
$outname = $listfile . "_" . $infofile . ".xls";
open (OFP, ">$outname");
open(FP, $listfile);
$line = <FP>;
chomp $line;
if ($format ne "") {
@fields = split(/\t/, $line);
for ($i=0; $i<@fields; $i++) {
############## check fields ###############################
if ( $fields[$i] =~ /accession/) {
$acci = $i;
}
}
}
print OFP "$line\tgene info\n";
$nl = '\n';
while (<FP>) {
$line = $_;
if ($line eq "\n") {
print OFP $line;
next;
}
chomp $line;
if ($format eq "") {
@cells = split (/:/, $line);
$tag = $cells[0];
} else {
@cells = split (/\t/, $line);
$tag = $cells[$acci];
}
open(AFP, $accfile);
while (<AFP>) {
@cells = split (/\t/, $_);
if ($cells[5] =~ /$tag/) {
$des = $cells[1];
last;
}
}
close AFP;
if ($found == 0) {
print OFP "$line\tNo gene info available\n";
}
}
close FP;