"Out of memory!" ???

J

Jie

Hi, I have this script trying to do something for an array of files.
The script is posted at http://www.humanbee.com/CHE/MERGE.txt

As you can see, for each file in
("1","2","3","4","5","6","7","8","9"....
some processing will be carried out. Even though my Mac has 6G memory,
the program will give this following message after processing file
"4". My question is, if there is enough memory to process each one
individually, why there is not enough memory to process the array in a
loop by using "foreach". I assume within this foreach loop, when one
file is finished processing and the output file is written, the memory
will be released, to begin processing the next one. Isn't that true??

Also, for another program, when I use "foreach <IN-FILE>", it runs out
of memory. but when I change to "while<IN-FILE>", it works fine...
Strange....


=======the error message========
perl(14232) malloc: *** vm_allocate(size=1069056) failed (error
code=3)
perl(142Out of memory!32) malloc: *** error: can't allocate region
perl(14232) malloc: *** set a breakpoint in szone_error to debug
 
X

xhoster

Jie said:
Hi, I have this script trying to do something for an array of files.
The script is posted at http://www.humanbee.com/CHE/MERGE.txt

As you can see, for each file in
("1","2","3","4","5","6","7","8","9"....
some processing will be carried out. Even though my Mac has 6G memory,
the program will give this following message after processing file
"4". My question is, if there is enough memory to process each one
individually, why there is not enough memory to process the array in a
loop by using "foreach". I assume within this foreach loop, when one
file is finished processing and the output file is written, the memory
will be released, to begin processing the next one. Isn't that true??

No. You make heavy use symbolic references. They aren't automatically
cleaned up, as Perl has no way of knowing when they won't be used again.
This is one of the reasons (probably a lesser one of them) why symbolic
references are usually not a good thing.

Xho
 
M

Mirco Wahab

Jie said:
Hi, Joe:

then is there a way to loop over an array by using "while" and
eliminating the using of "foreach"?

"foreach" is probably not the problem here. Xhoster
already identified the problem, you create (probably huge)
arrays and hashes(!) from symbols just read by
the program from files.

If your snp-files are large (more than one million lines)
and you are addung stuff like:

($sample_ID, @cols) = split;
...
${$sample_ID}{$marker} = $Allele[$A];
...
push @allSamples, $sample_ID;

in the loop, it will fill up memory somewhere -
and, what makes the fun hotter:

while( ... ) {
if ( ... {regexp} ... ) {
push @$1, $2; ## for each chromosome, create an array of its SNPs
}
}

...

@files = ("1","2","3", ... "X","XY","Y");
...
foreach $f (@files) {
...
foreach $marker (@$f) { ## here @$f means the array of desired SNP list in the map file
...
}

renders this code completely unmaintainable, unpredictable,
error-prone; so it has to be rewritten if it's really used
for something.

my €0,02

Regards

M.
 
J

Jie

Hi,

my navie question is that if there is enough memory to handle the
biggest file, which is "1", why could it not do the rest in a
loop? ..... I realize there is too much computation for each file,
that is why i write the output for each file into a temp file under
the "OUTPUT" folder..

Yes, this code is definitely real. Actually, it is for a quite
important project.
I put the complete code and all the input files here http://www.humanbee.com/perl/

If you could offer 2 more cents to take a peek and give me some hint
on how to re-write it, I would deeply deeply appreciate!!!

Jie


Jie said:
then is there a way to loop over an array by using "while" and
eliminating the using of "foreach"?

"foreach" is probably not the problem here. Xhoster
already identified the problem, you create (probably huge)
arrays and hashes(!) from symbols just read by
the program from files.

If your snp-files are large (more than one million lines)
and you are addung stuff like:

($sample_ID, @cols) = split;
...
${$sample_ID}{$marker} = $Allele[$A];
...
push @allSamples, $sample_ID;

in the loop, it will fill up memory somewhere -
and, what makes the fun hotter:

while( ... ) {
if ( ... {regexp} ... ) {
push @$1, $2; ## for each chromosome, create an array of its SNPs
}
}

...

@files = ("1","2","3", ... "X","XY","Y");
...
foreach $f (@files) {
...
foreach $marker (@$f) { ## here @$f means the array of desired SNP list in the map file
...
}

renders this code completely unmaintainable, unpredictable,
error-prone; so it has to be rewritten if it's really used
for something.

my €0,02

Regards

M.
 
J

Jürgen Exner

Jie said:
my navie question is that if there is enough memory to handle the
biggest file, which is "1", why could it not do the rest in a
loop? ..... I realize there is too much computation for each file,
that is why i write the output for each file into a temp file under
the "OUTPUT" folder..

Did you read what xhoster wrote?
Symbolic references create an entry in the global symbol table and perl has
no way of knowing when they go out of scope, therefore the memory used by
them will never be recycled. And your code is infested with symbolic
references.

Solution: rewrite your code in a clean way that does not use symbolic
references (which are evil to begin with).

jue
 
J

Jie

I do not know if symbolic reference means Global reference. And
therefore I should try to define local variable. This might be a very
naive question. Do I just add "my" to define a local variable such as
below:

foreach my $f (@files) {
my @marker_array =();
my $map ....
my ....
}
 
C

Clenna Lumina

No. You make heavy use symbolic references. They aren't
automatically cleaned up, as Perl has no way of knowing when they
won't be used again. This is one of the reasons (probably a lesser
one of them) why symbolic references are usually not a good thing.

I seem to recall something like the following can be one solution:

foreach $f (@files) {
local(*IN);
local(*OUT);
...
open OUT, "> OUTPUT-cc/$f" . ".ped";
...
open IN, "< cc555k/$map";
...
...

Would localize the symbolic references (file handles in this case) for
each iteration of the outer most foreach loop, IIRC.

Of cource it may just be better to declare lexical variables and use
those as handles:

foreach $f (@files) {
my ($in, $out);
...
open $out, "> OUTPUT-cc/$f" . ".ped";
...
open $in, "< cc555k/$map";
...
...

I think you could even use a hash or array instead of individual scalars
if you wanted as well.
 
M

Michele Dondi

Also, for another program, when I use "foreach <IN-FILE>", it runs out
of memory. but when I change to "while<IN-FILE>", it works fine...
Strange....

Not that strange. In fact we recommend doing so all the time. Perl 6
will be different in this regard, but then the iterator over a
filehandle won't be given by angular parentheses either.


Michele
 
M

Michele Dondi

loop? ..... I realize there is too much computation for each file,
that is why i write the output for each file into a temp file under
the "OUTPUT" folder..

Not too much computation, but too much memory usage. Now is it all
*necessary*? If so, then you could go the way of trading speed
execution for disk space perhaps thanks to some Tie::* module,
otherwise just don't waste it!


Michele
 
M

Michele Dondi

foreach $f (@files) {
local(*IN);
local(*OUT);

No need for these with modern enough perls. One can just use lexical
filehandles.
Would localize the symbolic references (file handles in this case) for
each iteration of the outer most foreach loop, IIRC.

The symrefs which probably eat up memory for the OP are stuff like
${$sample_ID} in his code.


Michele
 
J

Jürgen Exner

[Please do not full-quote but shorten the quoted text to the relevant
portion]
[Please do not top-post, trying to correct]
Jie said:
I do not know if symbolic reference means Global reference.

No, they are different things. Symbolic describes _how_ they are created,
while globel describes _where_ they are created.
Symbolic reference can only be created globally, but global references (or
variables) are only symbolic if you use poor coding style.
And
therefore I should try to define local variable. This might be a very
naive question. Do I just add "my" to define a local variable such as
below:

Well, using local variables instead of global variables is certainly
helpful, too, but it has nothing to do with symbolic references. Even if you
define the reference itself as a local variable and use it to create a
symbolic reference, then the target data structure of that reference is
still a global object.

You may want to read up on
perldoc -q "variable name"
"How can I use a variable as a variable name?"

jue
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,150
Members
46,696
Latest member
BarbraOLog

Latest Threads

Top