Segmentation fault: problem with perl threads

K

kath

Hi,

Background:
I am working on a replication project, where i have to replicate
dependencies of projects. I use rsync command to replicate
dependencies. Since the dependency list of a project varies and will
usually have a huge list, hence i want to it replicate parallely by
giving a bunch of file-names to replicate to each rsync command(using
<code>--files-from=<filename></code>) through a file.
I am doing the same using perl's threads module.

Problem:
When i have more than one project(a common case), i create 10 threads
to run rsync commands. This way i want to achieve replication of all
dependencies parallely. I wait in main thread to finish all child
threads using join() function of threads module. The script was
running fine for somedays, but recently the script terminates, saying
'Segmentation fault', no line number is printed.

I have tried to debug(using <code>perl -d <script-name></code>) the
script, but the debugger does not comeup or no response when it lands
on statement to create thread(<code>threads->create(...)</code>), i
had to wait long time but i did not comeup so i have to kill the
process.

I cannot post the whole code, because it is big and uses some of
private packages, but here is how i create 10 threads for each
project.

The script terminates by printing Segmentation fault, after executing
last print statement in _replicate function. I not able to make out
where and why that error is occuring. Or is there a other way to
achieve parallel processing withoug using threads? Could someone help
me here please?
I am using perl, v5.8.3 built for x86_64-linux-thread-multi
<CODE>
sub _replicate{
my $ref = shift;
my $logger = get_logger();
print "Starting replication of dependency files", $/;

foreach my $sc(@{$ref}){
next unless (defined $sc);

mkdir($LOG_FOLDER."/".$sc->{sc_name});
print "\tScenario: ".$sc->{sc_name}, $/;
print "\tLatest Dependencies: ".$sc->{total_dep}." of size
"._get_readable_size($sc->{total_size}), $/;
my @thr_arr = ();
print "Creating parallel threads", $/;

foreach my $robj(@{$sc->{rsync}}){
my $th = threads->create(\&worker, $robj); # i create threads this
way
push @thr_arr, $th;
}
#$logger->info("\twaiting for threads to finish its job...");
print "\twaiting for threads to finish its job...", $/;
foreach my $t(@thr_arr){
if (defined $t){
my $k = $t->join(); # this is how i wait for all threads to
finish
}
}
#map {my $k = $_->join} threads->list;
# map{
# my $th = $_;
# my $k = $th->join if($th); # just a blind belief whether this
might cause 'Segmentation fault', hence the check.
# }@thr_arr;
#$logger->info("\tFinished replicating dependencies of ".$sc-
{sc_name});
print "\tFinished replicating dependencies of ".$sc->{sc_name}, $/;
}
}

sub worker{
my $robj = shift;
my ($rsync, $server, $from, $to) = @{$robj->{elements}};
my $alt_server = $RSYNC_CONN_STR_2;

my $rsync_cmd = $rsync.$server.$from.$to;
print "Thread-",threads->self->tid," executing ", $rsync_cmd;
}
</CODE>

Thanks in advance,
katharnakh.
 
Z

zentara

Problem:
When i have more than one project(a common case), i create 10 threads
to run rsync commands. This way i want to achieve replication of all
dependencies parallely. I wait in main thread to finish all child
threads using join() function of threads module. The script was
running fine for somedays, but recently the script terminates, saying
'Segmentation fault', no line number is printed.

I have tried to debug(using <code>perl -d <script-name></code>) the
script, but the debugger does not comeup or no response when it lands
on statement to create thread(<code>threads->create(...)</code>), i
had to wait long time but i did not comeup so i have to kill the
process.
The script terminates by printing Segmentation fault, after executing
last print statement in _replicate function. I not able to make out
where and why that error is occuring. Or is there a other way to
achieve parallel processing withoug using threads? Could someone help
me here please?
Thanks in advance,
katharnakh.

I can't run the code, so I can only guess.

1. rsync may be not cleaning up and you are running out of memory.
Watch your memory usage as it runs. Try detach. Also remember that
for a thread to be joined, it must return. Are you sure your thread
is completing the rsync command and returning (or reaching the end
of it's code block?)

2. You don't seem to be sharing any data between threads in realtime, so
why use threads? Run this with parallel fork manager and your
problem will probably go away.


zentara
 
K

kath

I can't run the code, so I can only guess.

1. rsync may be not cleaning up and you are running out of memory.
   Watch your memory usage as it runs. Try detach. Also remember that
   for a thread to be joined, it must return. Are you sure your thread
   is completing the rsync command and returning (or reaching the end
   of it's code block?)

Yes it is returning to main thread. And main thread even prints the
last print statement after every thread returns.
2. You don't seem to be sharing any data between threads in realtime, so
    why use threads? Run this with parallel fork manager and your
    problem will probably go away.

Could you please give some more detail or link for me to go through
your idea.

Thank you,
katharnakh.
 
X

xhoster

kath said:
Problem:
When i have more than one project(a common case), i create 10 threads
to run rsync commands. This way i want to achieve replication of all
dependencies parallely. I wait in main thread to finish all child
threads using join() function of threads module. The script was
running fine for somedays, but recently the script terminates, saying
'Segmentation fault', no line number is printed.

Does it do this every time?
I have tried to debug(using <code>perl -d <script-name></code>) the
script, but the debugger does not comeup or no response when it lands
on statement to create thread(<code>threads->create(...)</code>), i
had to wait long time but i did not comeup so i have to kill the
process.

I'm sure what this means. Anyway, I've never found the debugger very
useful, much less so on threaded code. I'd probably just run the command
under strace -f and see what the last thing attempted before the segfault
was.

I cannot post the whole code, because it is big and uses some of
private packages, but here is how i create 10 threads for each
project.

Could you strip down the code to the smallest that replicates the problem?
The script terminates by printing Segmentation fault, after executing
last print statement in _replicate function.

The last print statement (lexically) is in a loop. On which time through
the loop does it segfault? Is it the very last time? You should add a
print statement after the loop as well (and $| is true, right?) so you can
see if it is after the loop.

I not able to make out
where and why that error is occuring. Or is there a other way to
achieve parallel processing withoug using threads?

Yes. In my opinion, forking is much better, at least on linux.


Could someone help
me here please?
I am using perl, v5.8.3 built for x86_64-linux-thread-multi

You might want to test this under 5.8.8 and see if the problem goes away.
foreach my $robj(@{$sc->{rsync}}){
my $th = threads->create(\&worker, $robj); #
push @thr_arr, $th;
}

foreach my $robj(@{$sc->{rsync}}){
my $pid = fork;
die $! unless defined $pid;
if ($pid) {
push @thr_arr, $pid;
} else {
worker($robj);
exit;
}
};

....
foreach my $t(@thr_arr){
if (defined $t){
my $k = $t->join();
}
}

Why check for defined? If the thread creation failed, you probably should
have died at the time of the creation, not delay until later.


foreach my $t(@thr_arr){
$t == waitpid $t, 0 or die "No $t!";
warn "$t ended with $?" if $?;
}


Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 
K

kath

Hi,
Does it do this every time? Yes.

Could you strip down the code to the smallest that replicates the problem?

I believe i have done this. Because the two sub-routines which i have
posted gives the problem. Because this were i use threads fuctionality
or rest part does not gave any problem so far.
The last print statement (lexically) is in a loop. On which time through
the loop does it segfault? Is it the very last time? You should add a
print statement after the loop as well (and $| is true, right?) so you can
see if it is after the loop.
Thanks, will get back to on this.
You might want to test this under 5.8.8 and see if the problem goes away.
So far i tried without actually executing the rsync command in
threads. Just a small test whether threads return to main and will
main thread be able to continue with the next project. This test
resulted ok. I will try and actual processing now.
foreach my $robj(@{$sc->{rsync}}){
my $pid = fork;
die $! unless defined $pid;
if ($pid) {
push @thr_arr, $pid;
} else {
worker($robj);
exit;
}
};

...

Why check for defined? If the thread creation failed, you probably should
have died at the time of the creation, not delay until later.

foreach my $t(@thr_arr){
$t == waitpid $t, 0 or die "No $t!";
warn "$t ended with $?" if $?;
}

Thanks for the descriptive reply with code and it helped me for sure.
Will try and get back.

thank you,
katharnakh.
 
K

kath

Hello Xho,
I would like to get help.
They can't give the problem, not by themselves. The sub has to be called
by something. And _replicate has to be passed some complex structure that
I don't know how to create. I could guess about it's internal structure
based on how your code uses it, but I'm reluctant to do that. It is better
if you post a *complete* program that we can run and see the problem,
without having to reverse engineer the driving code and the data structures
(and the "use" statement).
_replicate() is passed with a datastructure and it looks like this,
$VAR1 = [
{
'sc_name' => 'XYZ_Scenario',
'rsync' => [
{
'elements' => [
'rsync --archive --relative
--stats --verbose --links --copy-links --copy-unsafe-links --safe-
links --times --files-from=\'./Sep_17_2008(22h.50m.52s)/gen/file-from/
XYZ_Scenario/rsync_input-1.txt\' ',
'rsync://xxx.yyy.zzz.corp:
1873/',
'contexts',
' /var/workshare/contexts2>&1'
],
'statistics' => {
'total_files' => 863,
'size' => 232563375
},
'status' => undef # here i write something
meaningful for db updation later.
},
...
...
]# this array will have 10 or less, such objects with diff.
'rsync_input-*.txt' files to replicate
'total_size' => 2184209735,
'total_dep' => 13725
},
...
...
]# varies, based on scenarios
I checked this datastructure carefully, it looks like what i intended,
so no problem till here
I don't have a version of 5.8.3 handy, so I couldn't test your code under
it anyway. All the more reason to make complete, runnable scripts. The
easier you make it on us, the more people will be inclined to help, maybe
one with a copy of 5.8.3 hanging around.
Firstly, thanks so much.
Below is the same set of functions im posting as earlier. Because i
think the problem is lies here. I tried running my script on perl5.8.8
and still i get 'Segementation fault' even if i actually execute the
rsync command in thread or just print rsync command in thread and
return.
I strongly believe, i might be calling join() method on thread object
which might have died after finishing its job. Hence i try to
dereference a reference which is deallocated(may be or ..?).

This happens because, when 10 threads are running parallely and i wait
for a 2nd thread, suppose, to join. Meanwhile 3rd or 4th or
8th(anything till 10) might have finished running. Once 2nd joins and
main thread tries to call join() on next thread object, in the
array(either returned by threads->list or i keep thread object in a
array), which no more exists, or no clue whether the thread is
joinnable.

I tried to make sure whether thread is running as you can see in below
code, _replicate(),
sub _replicate{
my $ref = shift;
my $logger = get_logger();
print "Starting replication of dependency files", $/;
$logger->info("Starting replication of dependency files");
foreach my $sc(@{$ref}){
next unless (defined $sc);
mkdir($LOG_FOLDER."/".$sc->{sc_name});
$logger->info("\tScenario: ".$sc->{sc_name});
$logger->info("\tLatest Dependencies: ".$sc->{total_dep}." of size
"._get_readable_size($sc->{total_size}));
my @thr_arr = ();
foreach my $robj(@{$sc->{rsync}}){
# I will add a key to this datastructure, to check whether thread
is joinnable or it is still running?
$robj->{thr} => 'running';

my $th = threads->create(\&worker, $robj);
$logger->info("\tThread-".$th->tid.", Total files: ".$robj-
{statistics}->{total_files}.", Size: "._get_readable_size($robj-
{statistics}->{size})."[".$robj->{statistics}->{size}."B]");
$logger->info("\tcmd: ".join("", @{$robj->{elements}}));
push @thr_arr, $th->tid;
}
$logger->info("\twaiting for threads to finish its job...");

# 3rd try
foreach my $k(0..$#thr_arr){
# lets check tid and then access the thread object!
print $k," ",$sc->{rsync}->[$k]->{thr}, $/;
if ($sc->{rsync}->[$k]->{thr} eq 'running'){# if not, thread might
have died and we try to acces the mem. which is deallocated after
thread's death
my $t = $thr_arr[$k];
my $th = threads->object($t);
$th->join() if ($th);
}
}

# 2nd try
# map{
# my $th = $_;
# just a blind belief whether this might cause 'Segmentation
fault', hence the check. But here may, the thread object im referring
might have been deallocated due to death of thread, hence i get
'Segmentation fault' .... ?
# my $k = $th->join if($th);
# }@thr_arr;

# 1st try
# May be the thread objects returned by threads->list are unjoined,
but are they joinnable? no clue...!
#map {my $k = $_->join} threads->list;

$logger->info("\tFinished replicating dependencies of ".$sc-
{sc_name});
}
}

sub worker{
my $robj = shift;
my ($rsync, $server, $from, $to) = @{$robj->{elements}};
my $alt_server = $RSYNC_CONN_STR_2;

print "Thread-".threads->self->tid." running";
my $i = 0;
while(++$i <= $MAX_REPL_ATTEMPT){
#$logger->info("\t\t[Attempt-".$i."]Thread-".threads->self->tid."
executing [".$rsync_cmd."]");
#$logger->info("\t\t\tTotal files: ".$robj->{statistics}-
{total_files}.", Size: "._get_readable_size($robj->{statistics}-
{size})."[".$robj->{statistics}->{size}."B]");
my $rsync_cmd = $rsync.$server.$from.$to;
`$rsync_cmd`;
if ($?){ # because of connection refusal from server, command fails
$robj->{status} = "Completed with error!";
$rsync_cmd = $rsync.$server.$from.$to;
$server = ($i%2) ? $RSYNC_CONN_STR_1 : $RSYNC_CONN_STR_2; # just a
small trick to use other port on the same server for connection
#$logger->error("ERROR: Thread-".threads->self->tid." says,
replication Attempt-".$i." failed, trying again after 2 mins.");
sleep(120);
}else{
$robj->{status} = "Completed";
last;
}
}
$robj->{thr} = 'done';
my $etime = time;

my $spent_time = $etime - $stime;
my $logger = get_logger();
$logger->info("\t\t[Attempt-".$i."]Thread-".threads->self->tid." took
"._format_spent_time($spent_time)." time");
}

I would ask, is there anyway i would make sure all threads are
finished or call join on only those threads which are joinnable or i
have to go with other solution which sent earlier, fork() ing
processes, instead thread?


Thanks in advance,
katharnakh.
 
X

xhoster

Hi Kath,

The data structure you posted only has one sc_name section and that sc_name
only has one elements section. Is that enough to reliably replicate the
seg fault?

I've stripped most parts of your code that seem to be irrelevant
to Perl's threading and are relevant only to your specific use case. It
is just not feasible to work with unsimplified code. The extra stuff is
too distracting, and the line-wrapping as we post back and forth make
it far to confusing.

use strict;
use warnings;
use threads;
my $VAR1 = [1..25];
_replicate($VAR1);

sub _replicate {
my $ref = shift;
my @thr_arr = ();
print "Creating parallel threads", $/;

foreach my $robj(@$ref){
my $th = threads->create(\&worker, $robj);
push @thr_arr, $th;
}
print "\twaiting for threads to finish its job...", $/;
foreach my $t(@thr_arr){
if (defined $t){
my $k = $t->join();
}
}
print "\tFinished replicating dependencies", $/;
}

sub worker{
my $robj = shift;

print "Thread-",threads->self->tid," executing ", $robj, "\n";
my $x=int rand(10);
system "sleep $x";
}
__END__

It does not segfault for me on 5.8.8. Does it segfault for you?

Firstly, thanks so much.
Below is the same set of functions im posting as earlier. Because i
think the problem is lies here. I tried running my script on perl5.8.8
and still i get 'Segementation fault' even if i actually execute the
rsync command in thread or just print rsync command in thread and
return.
I strongly believe, i might be calling join() method on thread object
which might have died after finishing its job. Hence i try to
dereference a reference which is deallocated(may be or ..?).

This happens because, when 10 threads are running parallely and i wait
for a 2nd thread, suppose, to join. Meanwhile 3rd or 4th or
8th(anything till 10) might have finished running.

When they finish running, they should wait patiently to be joined.
Ensuring this happens is Perl's job. Adding your own code to try to ensure
it will just complicate the issue. For use help, you need to make your
code simpler, not more complicated.


Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,967
Messages
2,570,148
Members
46,694
Latest member
LetaCadwal

Latest Threads

Top