Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > help analyzing cause of return code

Reply
Thread Tools

help analyzing cause of return code

 
 
axeman
Guest
Posts: n/a
 
      02-22-2006
Synopsis:

A variant of a typical host availability / pinger script has performed
well for many years. Multiple daemons process various lists at various
intervals with various timeouts. The tool was recently modified to
support attempting sequences of tests (i.e. ping and TCP port test, ...
vs. just one test). The daemons will run fine for days, but then some
will suddenly receive non-zero return codes for every command/test they
perform. Specifically, return code 16777215 (-1 before shift >> .
Searches have suggested problems with CHLD signals, though they have
never been a problem before. Appreciate any insight.

Code versions:

AIX 4.3.3
Perl 5.005_03

Basic daemon model:

....

sub timed_out { # ALRM signal handler for command time-out
die "timed out";
}

....

$SIG{'HUP'} = 'IGNORE'; # don't die on these signals
$SIG{'PIPE'} = 'IGNORE';
$SIG{'TERM'} = 'IGNORE';
$SIG{'ALRM'} = \&timed_out;
$SIG{'USR1'} = \&quiesce;
use POSIX ":sys_wait_h";

....

foreach $test ( split(/;/,$TESTS) ) {

# std wrapper for timed operation, return code in $rc,
output in @out
($rc,@out) = eval {
alarm($timeout);
$test =~ s/HOST/$check/g;
$test[$testCount] = $test;
@eout = `$test 2>&1`;
$erc = ($? >> ;
alarm(0);
return ($erc,@eout);
};
if( $@ =~ /^timed out/ ) {
$rc = 1;
$timeouts++;
$test_timeout[$testCount] = 1;
}
$test_rc[$testCount] = $rc;
$test_console[$testCount] = join('',@out);
$testCount++;
$spawned++;

last if( $rc == 0 ); # successful test
}

....

# clean up any hung children for every 10 or more spawned
processes

if( $spawned > 10 ) {
reap; # NOTE - also new code - this recursively traverses
the process tree
# and kill KILL's any children
$spawned = 0;
}

# clean up zombies - not done w/signal handler due to unreliable
signals

while( ($waitedPid = waitpid(-1, &WNOHANG)) > 0 ) {}

....

 
Reply With Quote
 
 
 
 
usenet@DavidFilmer.com
Guest
Posts: n/a
 
      02-22-2006
axeman wrote:
> vs. just one test). The daemons will run fine for days, but then some
> will suddenly receive non-zero return codes for every command/test they
> perform.


Is your process reaper reaping? For some odd reason, AIX has an
insanely-low default max-pid-per-user limitation (I think default is
256 - I usually run it at 1024). Check "smitty chgsys" and check your
process table.

You would have messages in /var/spool/mail if you were pid-starved.
And, of course, if the process is running as root, I don't think it
would matter, since (I believe) root is not limited.

FWIW, whatever is happening here probably (almost surely) has nothing
to do with Perl.

--
http://DavidFilmer.com

 
Reply With Quote
 
 
 
 
axeman
Guest
Posts: n/a
 
      02-22-2006
Thanks David.

Unfortunately, it is running as root (even thought the limit is low -
128 - and no related mail). The reaper is misnamed (not my code), it
just kills hung test procs, but does not reap their exit status, thats
what the asynchronous 'while( ($waitedPid = waitpid(-1, &WNOHANG)) > 0
) {}' line does. CHLD signals are not mapped (i.e. left to DEFAULT).
Curiously, if I do map them to a handler or IGNORE, the bad return code
occurs always.

 
Reply With Quote
 
xhoster@gmail.com
Guest
Posts: n/a
 
      02-23-2006
"axeman" <(E-Mail Removed)> wrote:
> Synopsis:
>
> A variant of a typical host availability / pinger script has performed
> well for many years. Multiple daemons process various lists at various
> intervals with various timeouts.


How often are the timeouts actually activated?

> The tool was recently modified to
> support attempting sequences of tests (i.e. ping and TCP port test, ...
> vs. just one test).


Did these changes change how often timeout were actually activated?

>
> AIX 4.3.3
> Perl 5.005_03
> ...
> sub timed_out { # ALRM signal handler for command time-out
> die "timed out";
> }


Does the handler need to re=install itself after being activated
on your system?


> ($rc,@out) = eval {
> alarm($timeout);
> $test =~ s/HOST/$check/g;
> $test[$testCount] = $test;
> @eout = `$test 2>&1`;
> $erc = ($? >> ;
> alarm(0);
> return ($erc,@eout);
> };
> if( $@ =~ /^timed out/ ) {
> $rc = 1;
> $timeouts++;
> $test_timeout[$testCount] = 1;
> }


If $@ is defined but not timed out, shouldn't you do something about it?

Xho

--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
 
Reply With Quote
 
xhoster@gmail.com
Guest
Posts: n/a
 
      02-23-2006
"axeman" <(E-Mail Removed)> wrote:
> Thanks David.
>
> Unfortunately, it is running as root (even thought the limit is low -
> 128 - and no related mail). The reaper is misnamed (not my code), it
> just kills hung test procs, but does not reap their exit status, thats
> what the asynchronous 'while( ($waitedPid = waitpid(-1, &WNOHANG)) > 0
> ) {}' line does. CHLD signals are not mapped (i.e. left to DEFAULT).
> Curiously, if I do map them to a handler or IGNORE, the bad return code
> occurs always.


qx{} automatically waits for the job it spawns--that is how it sets $?.
If you set SIG{CHLD}, it will interfer with qw{}'s wait.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
 
Reply With Quote
 
axeman
Guest
Posts: n/a
 
      02-23-2006
>> Multiple daemons process various lists at various
>> intervals with various timeouts.

> How often are the timeouts actually activated?


Rarely, i.e. only when a test fails / system is down, and most are
usually up.

> Did these changes change how often timeout were actually activated?


No.

> Does the handler need to re=install itself after being activated
> on your system?


As mentioned, there is no handler, exit statuses are gathered
asynchronously.

> If $@ is defined but not timed out, shouldn't you do something about it?


Yes, clearly. That code was left out (the elipses ...) because it was
not relevant to the problem.

> qx{} automatically waits for the job it spawns--that is how it sets $?.
> If you set SIG{CHLD}, it will interfer with qw{}'s wait.


Thanks, that makes sense.

 
Reply With Quote
 
xhoster@gmail.com
Guest
Posts: n/a
 
      02-23-2006
"axeman" <(E-Mail Removed)> wrote:


Note: snipped material restored with "] ]".

] ] > sub timed_out { # ALRM signal handler for command time-out
] ] > die "timed out";
] ] > }

> > Does the handler need to re=install itself after being activated
> > on your system?

>
> As mentioned, there is no handler, exit statuses are gathered
> asynchronously.


If the thing whose comment says "ALRM signal handler" is not a handler,
then what the heck is it? And why is it commented thusly?

Xho

--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
 
Reply With Quote
 
axeman
Guest
Posts: n/a
 
      02-23-2006
Lol. Thought you meant a handler for CHLD. No, the ALRM handler does
not need to be reinstalled.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
analyzing C code? Zach C Programming 12 02-04-2007 07:56 PM
Source Code Analyzing Daniel Zinn Perl Misc 5 03-13-2006 12:16 AM
will all these messages cause a problem . I am a new subscriber and my computer is downloading 100,000 messages. Will this cause any kind of a problem with my ability to store other items?? Camille White Camille White Computer Support 9 11-08-2004 01:13 AM
help analyzing these threads from a stack trace Cross Eyed Admin Java 3 01-16-2004 12:09 AM
Analyzing and tyding Java code Mike Landis Java 5 10-23-2003 02:41 PM



Advertisments