Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > The non-terminating script, revisited

Reply
Thread Tools

The non-terminating script, revisited

 
 
Debo
Guest
Posts: n/a
 
      06-23-2005
I have a script that is severely misbehaving. Its purpose is simple: It is
supposed to retrieve some records from a local database, compare these
records to others fetched from the NCBI genbank database via BioPerl, and
then insert the results of these comparisons into the local database.

It does all of these things admirably: However, it seems that it is
implicitly forking or something because after it terminates, execution
still continues. I'll elaborate on this in a bit.

I don't like posting code, but I'm sort of stumped here. Here's a
trimmed down version of what I'm doing:


#!/usr/bin/perl -w

use strict;
use lib "D:/cygwin/site/www";
use Site;
use Bio::Seq;
use Bio:B::GenPept;
<SNIP: Many more Bio:: includes>
use DBI;
use Error qw(:try);
use File::Copy;
use Log::Log4perl qw(get_logger :levels);

######################### Main routine ##############################
#Initialize the logger we'll be using to record output
Log::Log4perl->init("mlogger.conf");
my $mlogger = get_logger('MITOSEEK');

$mlogger -> info("Job started.");

#define a single handle to use everywhere here.
my $dbh = Site::createDBHandle()
or &logexit("Could not open handle to database.");

$mlogger -> info("Examining genbank N. crassa homologues.");
&examineHomologues();

eval
{
$dbh -> commit();
$dbh -> disconnect();
};

$mlogger -> error("Commit/disconnect failed; $@") if ($@);

#donesville
exit(0);

#Sanity check 1 -- should never get here
die("DIDN'T EXIT");

#Sanity check 2... pull out big guns, aim at perl -- shouldn't happen
system('D:/cygwin/usr/local/bin/killall -s 9 perl');
#----------------------- End main routine ---------------------------



######################### Helper methods ############################

sub examineHomologues()
{
my $sth = Site::createStmt($dbh, "SELECT <UNINTERESTING DATABASE
SCHTUFF HERE>';", 0)
or &logexit("Couldn't fetch homologue data ");

#create a cached query to do the insertion
my $insert_handle = Site::createStmt($dbh, "INSERT INTO
<MORE DATABASE STUFF>", 1);

Site::executeStmt($sth) or &logexit("Couldn't execute stmt to
retrieve homologue data");

#connection to NCBI genbank database
my $fetcher = new Bio:B::GenBank;

eval
{
my $count = 0;

while (my $dp = $sth -> fetchrow_hashref)
{
my $cur_accession = $dp -> {'accession'};
my $refseq;
eval
{
$refseq = $fetcher -> get_Seq_by_id($cur_accession);
};

#we can afford data loss
next if ($@ || !defined($refseq));

<LOTS OF BORING REGEXP STRINGY STUFF HERE, FOLLOWED
BY SOME DATABASE INSERTIONS>
}
};

$mlogger -> error("Homologue analysis died early. $@") if ($@);
}

sub logexit($)
{
my $logerr_string = shift(@_);

$mlogger -> fatal($logerr_string);

eval{ $dbh -> rollback();};

if ($@)
{
$mlogger -> fatal("Also, an attempted rollback failed to
execute with error: '$@'.");
}

exit(1);
}
#----------------------- End helper methods -------------------------

The Site::createHandle/Stmt etc. commands are just calls to the
corresponding DBI commands (connect, prepare, execute, etc.) wrapped
in evals. I'm using PrintError=0, RaiseError=1.


The abbreviated output (in the log file) looks like this (note that
there are about 24650 records to be processed):

2005/06/22 09:44:53 INFO> Job started.
2005/06/22 09:44:53 INFO> Examining genbank N. crassa homologues.
2005/06/22 09:47:32 INFO> 50 processed.
2005/06/22 09:49:54 INFO> 100 processed.
<..SNIP...>
2005/06/23 05:40:03 INFO> 24550 processed.
2005/06/23 05:41:27 INFO> 24600 processed.
2005/06/23 05:42:06 ERROR> Commit/disconnect failed; DBD:g::db
commit failed: commit failed at
../cgi/mitoseek line 44.
2005/06/23 05:42:25 INFO> 24550 processed.
2005/06/23 05:43:49 INFO> 24600 processed.
2005/06/23 05:44:28 ERROR> Commit/disconnect failed; DBD:g::db
commit failed: commit failed at
../cgi/mitoseek line 44.
2005/06/23 05:45:07 INFO> 23500 processed.
2005/06/23 05:46:41 INFO> 23550 processed.
2005/06/23 05:47:49 INFO> 23600 processed.
<...SNIP...>
2005/06/23 06:16:45 INFO> 24550 processed.
2005/06/23 06:18:09 INFO> 24600 processed.
2005/06/23 06:18:48 ERROR> Commit/disconnect failed; DBD:g::db
commit failed: commit failed at
../cgi/mitoseek line 44.

....and so forth until I arrive in the morning. When I check 'ps',
there are about 20-30 perl processes happily chugging away. I haven't
checked if the 'original' process that started it all is still alive,
which I should really do.

Some things I've noticed:
- the first commit actually does succeed, since I find new records
in the database. I assume subsequent commits fail because of
the inital disconnect call on the handle
- My 'sanity checks' (the die() and system('killall...') calls)
never actually get executed, so there is some exiting going on
somewhere
- exit(0) behaves 'normally' if it occurs before the execution of
the eval{}-wrapped while loop in examineHomologues()


I apologize for the glut of information, but this is sort of the bare
minimum that I thought would be required to diagnose the problem (although
it might even be less than that).

Any suggestions for what I might be able to do to pick apart this problem
would be greatly appreciated. I've never seen this sort of 'implicit
forking' behaviour from perl before.

-Debo
 
Reply With Quote
 
 
 
 
Mark
Guest
Posts: n/a
 
      06-23-2005
Debo wrote:
> I have a script that is severely misbehaving. Its purpose is simple: It is
> supposed to retrieve some records from a local database, compare these
> records to others fetched from the NCBI genbank database via BioPerl, and
> then insert the results of these comparisons into the local database.
>
> It does all of these things admirably: However, it seems that it is
> implicitly forking or something because after it terminates, execution
> still continues. I'll elaborate on this in a bit.
>
> I don't like posting code, but I'm sort of stumped here. Here's a
> trimmed down version of what I'm doing:
>
>


<snip a lot of code>

> The Site::createHandle/Stmt etc. commands are just calls to the
> corresponding DBI commands (connect, prepare, execute, etc.) wrapped
> in evals. I'm using PrintError=0, RaiseError=1.


PrintError => 1

may help here with diagnosis.

>
>
> The abbreviated output (in the log file) looks like this (note that
> there are about 24650 records to be processed):
>
> 2005/06/22 09:44:53 INFO> Job started.
> 2005/06/22 09:44:53 INFO> Examining genbank N. crassa homologues.
> 2005/06/22 09:47:32 INFO> 50 processed.
> 2005/06/22 09:49:54 INFO> 100 processed.
> <..SNIP...>
> 2005/06/23 05:40:03 INFO> 24550 processed.
> 2005/06/23 05:41:27 INFO> 24600 processed.
> 2005/06/23 05:42:06 ERROR> Commit/disconnect failed; DBD:g::db
> commit failed: commit failed at
> ../cgi/mitoseek line 44.
> 2005/06/23 05:42:25 INFO> 24550 processed.
> 2005/06/23 05:43:49 INFO> 24600 processed.
> 2005/06/23 05:44:28 ERROR> Commit/disconnect failed; DBD:g::db
> commit failed: commit failed at
> ../cgi/mitoseek line 44.
> 2005/06/23 05:45:07 INFO> 23500 processed.
> 2005/06/23 05:46:41 INFO> 23550 processed.
> 2005/06/23 05:47:49 INFO> 23600 processed.
> <...SNIP...>
> 2005/06/23 06:16:45 INFO> 24550 processed.
> 2005/06/23 06:18:09 INFO> 24600 processed.
> 2005/06/23 06:18:48 ERROR> Commit/disconnect failed; DBD:g::db
> commit failed: commit failed at
> ../cgi/mitoseek line 44.
>
> ...and so forth until I arrive in the morning. When I check 'ps',
> there are about 20-30 perl processes happily chugging away. I haven't
> checked if the 'original' process that started it all is still alive,
> which I should really do.

Are you using Linux? Multiple threads appear as separate processes with
a ps/top listing under Linux. Does Bioperl create separate threads, or,
for that matter, is it multiprocessed?

>
> Some things I've noticed:
> - the first commit actually does succeed, since I find new records
> in the database. I assume subsequent commits fail because of
> the inital disconnect call on the handle
> - My 'sanity checks' (the die() and system('killall...') calls)
> never actually get executed, so there is some exiting going on
> somewhere
> - exit(0) behaves 'normally' if it occurs before the execution of
> the eval{}-wrapped while loop in examineHomologues()
>


> I apologize for the glut of information, but this is sort of the bare
> minimum that I thought would be required to diagnose the problem (although
> it might even be less than that).
>
> Any suggestions for what I might be able to do to pick apart this problem
> would be greatly appreciated. I've never seen this sort of 'implicit
> forking' behaviour from perl before.


Well, you have RaiseError set, but then you are wrapping it in evals.
You also have PrintError set to false, so you can't see what is going on
at the database.

I would probably write a small test script that verifies that your
wrapped DBI calls are doing what you think they're doing, leaving out
Bioperl. Next write a small script that tests that Bioperl is behaving
as expected. Following success on both fronts, you can then try both
aspects together.

Mark
 
Reply With Quote
 
 
 
 
A. Sinan Unur
Guest
Posts: n/a
 
      06-23-2005
Debo <(E-Mail Removed)> wrote in
news(E-Mail Removed) nt.cs.uwaterloo.ca:

> I don't like posting code,


You are not going to be able receive the maximum amount of help possible
then.

Posting code requires you to think and work very hard at arriving at the
smallest possible program that still exhibits the problem. Working
toward that aim generally results in the discovery of the cause and
sometimes even a solution to the problem.

It looks like you are going to have to do some more work on that.

Your code is very, very hard to follow for me, especially since you use
the &subcall method of calling subroutines, and the subroutines seem to
operate on global data etc etc. See

perldoc perlsub

for why you should not use & in subroutines call unless you know what
you are doing. The general impression I get from looking over your code
is that you indeed do not know what you are doing but keep putting
various things in until it works for some values of "works". This kind
of programming is very prevalent in academia, and it is very
unproductive. Instead, build things step by step.

So, im summary, you are going have to put some effort into posting code
if you are going to post code. Posting code that we have no hope of
being able to run is not going to do anyone much good.

> #!/usr/bin/perl -w


BTW:

use warnings;

is better because it allows you turn specific classes of warnings on and
off in lexical scopes.

> use strict;
> use lib "D:/cygwin/site/www";
> use Site;
> use Bio::Seq;
> use Bio:B::GenPept;
> <SNIP: Many more Bio:: includes>
> use DBI;
> use Error qw(:try);
> use File::Copy;
> use Log::Log4perl qw(get_logger :levels);


So, let's try to reduce the sample code you posted to its essentials.

#!/usr/bin/perl

package Site;

use strict;
use warnings;

sub createDBHandle {
warn "createDBHandle called\n";
}

sub createStmt {
warn "createStmt: ", "@_\n";
"returned from: createStmt: @_";
}

sub executeStmt {
warn "executeStmt: ", "@_\n";
"returned from: executeStmt: @_";
}

package main;

use strict;
use warnings;

my $dbh = Site::createDBHandle()
or die "Could not open handle to database\n";

examineHomologues($dbh);

sub examineHomologues {
my ($dbh) = @_;

my $sth = Site::createStmt($dbh,
"SELECT <UNINTERESTING DATABASE SCHTUFF HERE>", 0
) or die "Couldn't fetch homologue data ";

my $insert_handle = Site::createStmt($dbh,
"INSERT INTO <MORE DATABASE STUFF>", 1
);

Site::executeStmt($sth)
or die "Couldn't execute stmt to retrieve homologue data\n";

eval
{
1
};

die "Homologue analysis died early. $@\n" if $@;
}

__END__

> Any suggestions for what I might be able to do to pick apart this
> problem would be greatly appreciated. I've never seen this sort of
> 'implicit forking' behaviour from perl before.


It seems clear to me that whatever behavior you are observing must be
created by code you have not shown us.

Sinan

--
A. Sinan Unur <(E-Mail Removed)>
(reverse each component and remove .invalid for email address)

comp.lang.perl.misc guidelines on the WWW:
http://mail.augustmail.com/~tadmc/cl...uidelines.html
 
Reply With Quote
 
Debo
Guest
Posts: n/a
 
      06-23-2005
M> Are you using Linux? Multiple threads appear as separate processes with
M> a ps/top listing under Linux.

Cygwin -- and that not by choice.

M> Does Bioperl create separate threads, or,
M> for that matter, is it multiprocessed?

I investigated this, and the answer seems to be no. The library I'm using
is essentially a parser that uses HTTP::Requst::Common to push data
around.

M> Next write a small script that tests that Bioperl is behaving
M> as expected.

I did this before the putting everything together the first time, and it
appeared that Bioperl was doing what it was supposed to. Now, though, I've
narrowed the problem down to a specific Bioperl module, and for the first
time I'm able to consistently reproduce the bug.

So, at this point, this discussion is no longer really topical -- nor
recommended for oral delivery, for that matter.

Thanks for your suggestions.

-Debo


 
Reply With Quote
 
Debo
Guest
Posts: n/a
 
      06-23-2005

ASU> See
ASU>
ASU> perldoc perlsub
ASU>
ASU> for why you should not use & in subroutines call unless you know what
ASU> you are doing.

That's bizarre -- for some reason, the & does the exact opposite of what I
thought it did. I'm not sure exactly where I picked up that habit, but
thank you for pointing out my mistake.

ASU> > #!/usr/bin/perl -w
ASU> BTW:
ASU>
ASU> use warnings;
ASU>
ASU> is better because it allows you turn specific classes of warnings on and
ASU> off in lexical scopes.

Will do. Again, thanks for this.

ASU> It seems clear to me that whatever behavior you are observing must be
ASU> created by code you have not shown us.

And that was sort of the case... see my reply to the previous poster, if
you are (for some reason) curious in the outcome.

Thank you for your helpful suggestions.

-Debo
 
Reply With Quote
 
xhoster@gmail.com
Guest
Posts: n/a
 
      06-23-2005
Debo <(E-Mail Removed)> wrote:
> I have a script that is severely misbehaving. Its purpose is simple: It
> is supposed to retrieve some records from a local database, compare these
> records to others fetched from the NCBI genbank database via BioPerl, and
> then insert the results of these comparisons into the local database.
>
> It does all of these things admirably: However, it seems that it is
> implicitly forking or something because after it terminates, execution
> still continues. I'll elaborate on this in a bit.


Put near the top of your program:

my $oldpid=$$;

Then, after every other statment of your program, put:

die "$oldpid is not $$" unless $oldpid==$$;

> use lib "D:/cygwin/site/www";
> use Site;
> use Bio::Seq;
> use Bio:B::GenPept;
> <SNIP: Many more Bio:: includes>
> use DBI;
> use Error qw(:try);
> use File::Copy;
> use Log::Log4perl qw(get_logger :levels);


grep through all those module sources for fork or thread.

>
> The Site::createHandle/Stmt etc. commands are just calls to the
> corresponding DBI commands (connect, prepare, execute, etc.) wrapped
> in evals. I'm using PrintError=0, RaiseError=1.


Why oh why do people do this? Over and over I've had to clean up code
where people used their own no-value-added, buggy, half-assed wrappers
around DBI. If you are going to use DBI, why not use DBI?

Also, what is the point of setting RaiseError=1 if you are just going
to put evals around every single call?

>
> The abbreviated output (in the log file) looks like this (note that
> there are about 24650 records to be processed):
>
> 2005/06/22 09:44:53 INFO> Job started.
> 2005/06/22 09:44:53 INFO> Examining genbank N. crassa homologues.
> 2005/06/22 09:47:32 INFO> 50 processed.


Can you configure your logger to log the pid along with the time?

Xho

--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
 
Reply With Quote
 
Debo
Guest
Posts: n/a
 
      06-23-2005

> > The Site::createHandle/Stmt etc. commands are just calls to the
> > corresponding DBI commands (connect, prepare, execute, etc.) wrapped
> > in evals. I'm using PrintError=0, RaiseError=1.

>
> Why oh why do people do this? Over and over I've had to clean up code
> where people used their own no-value-added, buggy, half-assed wrappers
> around DBI. If you are going to use DBI, why not use DBI?


Well, I don't know why those people did it, but I can tell you why I did
it.

I did a quick perusal through Programming the Perl DBI, and they suggested
putting evals around your calls to DBI methods so that you could catch a
raised error when it occurred. However, I got sick of seeing evals all
over my code, and so I pushed them all into one general interface. When
an error is raised, it is logged by a different logger located in the
Site module, so I can still see the output; it doesn't just quash the error.


> Also, what is the point of setting RaiseError=1 if you are just going
> to put evals around every single call?


Am I totally missing the point here? It is possible that I am. I set
RaiseError to 1, place an eval around the call, and then check $@ to see
if anything has gone awry. Most of the time, I don't *want* my script to
die if the call fails -- I would like to attempt some sort of recovery.

Is eval-wrapping a bad thing to do? Again, it was
something I picked up in Programming the Perl DBI. (Admittedly, setting
PrintError to 0 was an asinine thing to do. I've since fixed that.)

Perhaps I should move these questions to a more DBI-centric forum.

> Can you configure your logger to log the pid along with the time?
>
> Xho


I believe I can. If this becomes an issue again, that's what I'll do.

Thanks.

-Debo

 
Reply With Quote
 
Paul Lalli
Guest
Posts: n/a
 
      06-23-2005
Debo wrote:
> > >


{please don't snip the attributions... it's nice to know who said what
you're quoting}

> > Also, what is the point of setting RaiseError=1 if you are just going
> > to put evals around every single call?

>
> Am I totally missing the point here? It is possible that I am. I set
> RaiseError to 1, place an eval around the call, and then check $@ to see
> if anything has gone awry. Most of the time, I don't *want* my script to
> die if the call fails -- I would like to attempt some sort of recovery.
>
> Is eval-wrapping a bad thing to do?


It just doesn't make a whole lot of sense. It's an extra level of
indirection that serves no purpose. Under normal circumstances, if
there is a database error, a veriable is set ($DBI::err, I think?).
With RaiseError set, whenever there is a database error, the program
dies. But then you catch that die with an eval. And then you have to
go and check to see whether or not there was a failure in the eval. To
do this, you have to check the $@ variable. So why not just go back
two levels and just check the $DBI::err variable in the first place?
Why add the extra processing of dieing and catching the die?

Paul Lalli

 
Reply With Quote
 
Debo
Guest
Posts: n/a
 
      06-23-2005


On Thu, 23 Jun 2005, Paul Lalli wrote:
PL> {please don't snip the attributions... it's nice to know who said what
PL> you're quoting}

Sorry -- bad habit. I'm trying to quit.

PL> > > Also, what is the point of setting RaiseError=1 if you are just going
PL> > > to put evals around every single call?
PL> >
PL> > Am I totally missing the point here? It is possible that I am. I set
PL> > RaiseError to 1, place an eval around the call, and then check $@ to see
PL> > if anything has gone awry. Most of the time, I don't *want* my script to
PL> > die if the call fails -- I would like to attempt some sort of recovery.
PL> >
PL> > Is eval-wrapping a bad thing to do?
PL>
PL> It just doesn't make a whole lot of sense. It's an extra level of
PL> indirection that serves no purpose. Under normal circumstances, if
PL> there is a database error, a veriable is set ($DBI::err, I think?).
PL> With RaiseError set, whenever there is a database error, the program
PL> dies. But then you catch that die with an eval. And then you have to
PL> go and check to see whether or not there was a failure in the eval. To
PL> do this, you have to check the $@ variable. So why not just go back
PL> two levels and just check the $DBI::err variable in the first place?
PL> Why add the extra processing of dieing and catching the die?

A few reasons:

1) It sort of 'feels' like exception handling, which is something I'm more
familiar with -- although ultimately, as you've observed, I still have to
poll a variable to figure out what went wrong. So be it.

2) Sometimes I am forgetful. If I have RaiseError on, and I have forgotten
to do an error check that was necessary, my program will let me know in no
uncertain terms that a check was required there. Relying on the error-set
variables in the DBI namespace is a more passive process, and I'm not so
meticulous that I can rely on myself to always do this properly.

3) This may be a cop-out, but I ultimately switched to this method of
handling because I assumed Programming the Perl DBI was an authoritative
source for DBI-ish things, and that book seemed to be stating that
eval-wrapping was the 'better' way of handling errors. This may have been
a mistaken inference on my part; I am not trying to put words in the mouth
of the author.

Thank you for your reply.

-Debo


 
Reply With Quote
 
gimme_this_gimme_that@yahoo.com
Guest
Posts: n/a
 
      06-23-2005
Off hand I'd guess there is a problem with the data.

Perhaps there's a name like O'Reilly (see the quote) or
someone entered a back slash,
or a hacker entered a name like SELECT.

Something that is messing up an
insert and reeking havoc the program.

When the program aborts the commit doesn't get
called. The exit() doesn't get called.

The actual behavior may vary - an error may occur
and the commit could get executed by default - depending
upon the database. Or when there are no offending
names it all works OK.

I'd recommend better error handling. As described here.
But also checking for an error after every insert and printing
the data when something bad happens.

Also, if the perl program is an interface which gets called
every 30 minutes when there is data in a que, you may
be getting all those calls because of that and because of
those funky commits. This would explain the forks.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
NVIDIA Forceware Revisited article Silverstrand Front Page News 0 09-22-2005 01:37 PM
Mnenhy revisited Caploc Firefox 3 12-16-2004 12:37 AM
ESMTP Problems Revisited Brian Cisco 9 01-06-2004 06:02 PM
REVISITED: Cisco to Dell PowerConnect via Fiber Brad Tarver Cisco 0 07-09-2003 12:01 AM
REVISITED: connecting a dell powerconnect 3024 to a cisco 2924M over fiber Brad Tarver Cisco 0 07-09-2003 12:00 AM



Advertisments