Return code of 4294967295 (UINT_MAX)

V

Vineet

If anybody has any insight into this problem I'm running into I would
really appreciate if you could write to me...

I'm running a simple C++ program on Solaris 8 that forks and execs a
bunch of processes. It's been running fine for years, but now that
I've moved to faster hardware, I'm running into a problem that
surfaces more frequently as the hardware I'm using gets better/faster
-- it seems like some sort of race condition issue.

Basically, at random, a handful of the process immediately fail
(before actually doing anything) and return an exit code of 4294967295
(UINT_MAX). I imagine that this is just an umbrella status code for
all unexpected/unexplained errors, so I'm not sure if it means
anything?

One thing to note is that if I just trap this error and execute the
process again, it runs fine. It just seems like at the time the
fork/exec takes place, something in the system temporarily screws up
but I don't know what. Of course I do have a workaround (just re-run
the process) but I'd like to know what's going on.

If anybody has encounted anything like this, please let me know, and I
can provide you with more information if need be...thanks.
 
D

Dan Mercer

: If anybody has any insight into this problem I'm running into I would
: really appreciate if you could write to me...
:
: I'm running a simple C++ program on Solaris 8 that forks and execs a
: bunch of processes. It's been running fine for years, but now that
: I've moved to faster hardware, I'm running into a problem that
: surfaces more frequently as the hardware I'm using gets better/faster
: -- it seems like some sort of race condition issue.

Sounds like fork is failing. If the program is not running as root,
you are probably exceeding maxuproc (see your kernel parameters
documentation). If it is running as root, you're exceeding nprocs.
Nprocs determines how many structures controlling processes are
created. Maxuproc determines how many processes a user can
have - it should always be less than half nprocs and is usually quite
a bit smaller. Default parameters for most machines are not set up
properly to handle heavy usage by a small number of programs or users.

One other problem, it is likely that you are not handling wait situations
properly and may have a large number of zombie processes. These
will consume limited process resources. Even if your processes
are being reaped by init it's possible to get in race conditions where zombie's
are created faster than init can reap them. Without seeing any
actual code - like your fork-exec code - it's impossible to say.

Post some code if you want more help.

Good luck,

Dan

:
: Basically, at random, a handful of the process immediately fail
: (before actually doing anything) and return an exit code of 4294967295
: (UINT_MAX). I imagine that this is just an umbrella status code for
: all unexpected/unexplained errors, so I'm not sure if it means
: anything?

It's a -1.

:
: One thing to note is that if I just trap this error and execute the
: process again, it runs fine. It just seems like at the time the
: fork/exec takes place, something in the system temporarily screws up
: but I don't know what. Of course I do have a workaround (just re-run
: the process) but I'd like to know what's going on.
:
: If anybody has encounted anything like this, please let me know, and I
: can provide you with more information if need be...thanks.
 
J

joe

If anybody has any insight into this problem I'm running into I
would really appreciate if you could write to me...

I'm running a simple C++ program on Solaris 8 that forks and execs a
bunch of processes. It's been running fine for years, but now that
I've moved to faster hardware, I'm running into a problem that
surfaces more frequently as the hardware I'm using gets better/faster
-- it seems like some sort of race condition issue.

Basically, at random, a handful of the process immediately fail
(before actually doing anything) and return an exit code of
4294967295 (UINT_MAX). I imagine that this is just an umbrella
status code for all unexpected/unexplained errors, so I'm not sure
if it means anything?

That's -1, which the man page documents fork() to return in case of an
error. In that case you should check to see what the value of errno
is. You can get a textual version of the error by calling either
perror() or strerror(errno). That will probably enlighten things.

Joe
 
H

Howard

Post some code if you want more help.

Really, guys, this discussion belongs off-line or in a forum that is more
appropriate. This newsgroup doesn't discuss platform-specific stuff like
processes/threads, etc. This is a language newsgroup.

-Howard
 
K

Keith Thompson

Howard said:
Really, guys, this discussion belongs off-line or in a forum that is more
appropriate. This newsgroup doesn't discuss platform-specific stuff like
processes/threads, etc. This is a language newsgroup.

This thread is cross-posted to comp.lang.c++, comp.unix.programmer,
comp.lang.c, and comp.unix.solaris. It's probably appropriate in
comp.unix.programmer and/or comp.unix.solaris, which do discuss
platform-specific stuff. Please trim the newsgroups line on any
followups.
 
C

Casper H.S. Dik

Basically, at random, a handful of the process immediately fail
(before actually doing anything) and return an exit code of 4294967295
(UINT_MAX). I imagine that this is just an umbrella status code for
all unexpected/unexplained errors, so I'm not sure if it means
anything?

A process cannot fail with an exit code of 4294967295 (UINT_MAX);
the only valid exit codes are between 0 and 255 (inclusive).

So the first question is: what is returning -1 (whatever returns
a number with all bits set is more ikely to return -1 than UINT_MAX)
One thing to note is that if I just trap this error and execute the
process again, it runs fine. It just seems like at the time the
fork/exec takes place, something in the system temporarily screws up
but I don't know what. Of course I do have a workaround (just re-run
the process) but I'd like to know what's going on.

Have you tried "truss -f"?

Casper
 
P

Prateek R Karandikar

If anybody has any insight into this problem I'm running into I would
really appreciate if you could write to me...

I'm running a simple C++ program on Solaris 8 that forks and execs a
bunch of processes. It's been running fine for years, but now that
I've moved to faster hardware, I'm running into a problem that
surfaces more frequently as the hardware I'm using gets better/faster
-- it seems like some sort of race condition issue.

Basically, at random, a handful of the process immediately fail
(before actually doing anything) and return an exit code of 4294967295
(UINT_MAX). I imagine that this is just an umbrella status code for
all unexpected/unexplained errors, so I'm not sure if it means
anything?

One thing to note is that if I just trap this error and execute the
process again, it runs fine. It just seems like at the time the
fork/exec takes place, something in the system temporarily screws up
but I don't know what. Of course I do have a workaround (just re-run
the process) but I'd like to know what's going on.

If anybody has encounted anything like this, please let me know, and I
can provide you with more information if need be...thanks.

And your question about Standard C++ is?

-- --
Abstraction is selective ignorance.
-Andrew Koenig
-- --
 
V

Vineet

Thank you very much for your replies.

I apologize if the post was not appropriate for this newsgroup -- it's
my first time posting to one of these, and now that I know what the
appropriate groups are, I will only post any followups to this thread
on those newsgroups (comp.unix.programmer and comp.unix.solaris). For
those of you that offered suggestions, please see those groups if you
wish to follow this thread through.

As for the post from Prateek, I now understand that this wasn't the
right newsgroup, but I don't think it's necessary to be a jerk about
it. There were 2 previous posts that already pointed out my mistake,
so for you to post an extraneous message just to flame me is just as
bad if not worse than posting a message to the wrong newsgroup.
 
V

Vineet

Thank you very much for your replies.

I apologize if the post was not appropriate for this newsgroup -- it's
my first time posting to one of these, and now that I know what the
appropriate groups are, I will only post any followups to this thread
on those newsgroups (comp.unix.programmer and comp.unix.solaris). For
those of you that offered suggestions, please see those groups if you
wish to follow this thread through.

As for the post from Prateek, I now understand that this wasn't the
right newsgroup, but I don't think it's necessary to be a jerk about
it. There were 2 previous posts that already pointed out my mistake,
so for you to post an extraneous message just to flame me is just as
bad if not worse than posting a message to the wrong newsgroup.
 
L

LibraryUser

Howard said:
.... snip ...

Really, guys, this discussion belongs off-line or in a forum that
is more appropriate. This newsgroup doesn't discuss
platform-specific stuff like processes/threads, etc. This is a
language newsgroup.

So, instead of simply muttering, set followups.
 
A

all mail refused

Have you tried "truss -f"?


This might help spot the difference between 2 truss outputs
(which you name on the command line). Random interleaving
of parent and child contributions is left for you to sort
out in this version. I might have done more if I'd thoght
I'd need it regularly.

#!/usr/bin/perl -w

sub display
{

for ($ln=$lineno-5; $ln<=$lineno; $ln++) {
next if ($ln<1);
printf("%s(%d) %s\n", $ARGV[0], $ln, $left{$ln});
}
print "\n";
for ($ln=$lineno-5; $ln<=$lineno; $ln++) {
next if ($ln<1);
printf("%s(%d) %s\n", $ARGV[1], $ln, $right{$ln});
}

}

sub lineparse
{
$_=shift;

TEST: while (1) {
$syscall="NA";
$result="NA";
if ($_ =~ /\s(\S+)$/) {
$result=$1;
}
if ($_ =~ /^\d+:\s+([^\(]+)\(/) {
$syscall=$1;
last TEST;
}
if ($_ =~ /^\d+:\s+\*\*\*.*\*\*\*$/) {
print "$_\n";
last TEST;
}
die("RE not matched: $_\n");
}
}

#######################################

open(LFH, "<$ARGV[0]") or die("open $ARGV[0]");
open(RFH, "<$ARGV[1]") or die("open $ARGV[1]");

for($lineno=1;;$lineno++) {

$left=<LFH>;
$right=<RFH>;
if ( (!defined($left)) && (defined($right)) ) {
die("end of $ARGV[0]");
}
if ( (defined($left)) && (!defined($right)) ) {
die("end of $ARGV[1]");
}
chomp($left);
chomp($right);

#print "DEBUG $left\n";

lineparse($left);
$left_syscall=$syscall;
$left_result=$result;
$left{$lineno}=$left;
lineparse($right);
$right_syscall=$syscall;
$right_result=$result;
$right{$lineno}=$right;

if ($right_syscall ne $left_syscall) {
print "syscall difference\n\n";
display();
exit(1);
}
if ( ($right_result =~ /^\d+$/) && ($left_result !~ /^\d+$/)) {
print "Non-Numerical Result (left)\n\n";
display();
exit(1);
}
if ( ($right_result !~ /^\d+$/) && ($left_result =~ /^\d+$/)) {
print "Non-Numerical Result (right)\n\n";
display();
exit(1);
}
if ( ( ("0" eq $right_result) && ("0" ne $left_result) ) ||
( ("0" ne $right_result) && ("0" eq $left_result) ) ){
print "Numerical Results\n\n";
display();
exit(1);
}

}

exit(0);
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,175
Messages
2,570,942
Members
47,476
Latest member
blackwatermelon

Latest Threads

Top