That depends on how you do it.
That depends on what you are comparing it to. Compared to an in memory
hash, DB_File makes things slower, not faster. Except in the sense that
something which runs out of memory and dies before completing the job is
infinitely slow, so preventing that is, in a sense, faster. One exception
I know of would be if one of the files is constant, so it only needs to be
turned into a DB_File once, and if only a small fraction of the keys are
ever probed by the process driven by other file. Then it could be faster.
Also, DB_File doesn't take nested structures, so you would have to flatten
your HoA. Once you flatten it, it might fit in memory anyway.
here is what I am trying to do.
I have two large files. I will read one file and see if that is also
present in second file. I also need count how many time it is appear
in both the file. And according I do other processing.
If you *only* need to count, then you don't need the HoA in the first
place.
so if I process line by line both the file then it will be like (eg.
file1 has 10 line and file2 has 10 line. for each line file1 it will
loop 10 times. so total 100 loops.) I am dealing millions of lines so
this approach will be very slow.
I don't think anyone was recommending that you do a Cartesian join on the
files. You could break the data up into files by hashing on IP address and
making a separate file for each hash value. For each hash bucket you would
have two files, one from each starting file, and they could be processed
together with your existing script. Or you could reformat the two files
and then sort them jointly, which would group all the like keys together
for you for later processing.
@pri_ip_id_table_ = keys(%pri_ip_id_table);
For very large hashes when you have memory issues, you should iterate
over it with "each" rather than building a list of keys.
for($i =3D 0; $i < @pri_ip_id_table_; $i++) #file 2
{
if($time_table{"$pri_ip_dns_table_[$i]"})
{
#do some processing.
Could you "do some processing" incrementally, as each line from file 2 is
encountered, rather than having to load all keys of file2 into memory
at once?
Xho
--
--------------------
http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.