Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > More Forks!

Reply
Thread Tools

More Forks!

 
 
it_says_BALLS_on_your forehead
Guest
Posts: n/a
 
      11-02-2005
I am getting erratic behavior with this script...

#!/apps/webstats/bin/perl

use File::Copy;
use Parallel::ForkManager;


my $pm = Parallel::ForkManager->new(10);

$pm->run_on_start(
sub { my ($pid,$ident)=@_;
# print "** $ident started, pid: $pid\n";
}
);


# NOTE: this MUST be assigned BEFORE the $pm->run_on_finish
my @tokens = "log1" .. "log5";
# test with a 'late' file.
my @data = "file01" .. "file10";

$pm->run_on_finish(
sub {
my ($pid, $exit_code, $ident) = @_;
my ($outlog, $missingfile) = split /\|/, $_[2];
push( @tokens, $outlog );
# only push if the file is missing...HOW TO KNOW
IF IT'S MISSING?? or do a next if (-e $missing
# ...but the next messes up the ForkManager b/c
child process does not go to $pm->finish and so
# tries to spawn an extra process...breaks the
limit
# could put $pm->finish in continue block, search
in ...perl.misc Google Groups: fork ftp
unless (-e "data/$missingfile") {
push( @data, $missingfile );
}
# print "ID: $ident (pid: $pid) had exit code:
$exit_code.\n";
}
);

my $counter = 0;
for (@data) {
$counter++;
if ($counter > 20 ) {
print "\n*******counter was above 20:
$counter**********************\n\n";
last; # maybe this doesn't ->finish...
}
my $outfile = shift(@tokens);
$pm->start("$outfile|$_") and next;
print "$counter: ";
print "reading data/$_: writing to log -$outfile-\n";
my $func_ref = hello($_);
$func_ref->("Simon");
$pm->finish;
}

$pm->wait_all_children;
#--- subs ---

sub hello {
my ($type) = @_;

if ($type eq "file01") {
print "type: $type\n";
return \&func1;
}
else {
print "type: $type\n";
return \&func2;
}
return 0;
}

sub func1 {
my ($noun) = @_;
print "* using func1. you stink, $noun\n";
}

sub func2 {
my ($noun) = @_;
print "* * using func2. yay!!! it's $noun!!!\n";
}

#-----
RESULT 1:
[mymachine] ~/simon/1-perl > tryFork.pl
1: reading data/file01: writing to log -log1-
type: file01
* using func1. you stink, Simon
2: reading data/file02: writing to log -log2-
type: file02
* * using func2. yay!!! it's Simon!!!
3: reading data/file03: writing to log -log3-
type: file03
* * using func2. yay!!! it's Simon!!!
4: reading data/file04: writing to log -log4-
type: file04
* * using func2. yay!!! it's Simon!!!
5: reading data/file05: writing to log -log5-
type: file05
* * using func2. yay!!! it's Simon!!!
6: reading data/file06: writing to log -log1-
type: file06
* * using func2. yay!!! it's Simon!!!
7: reading data/file07: writing to log -log2-
type: file07
* * using func2. yay!!! it's Simon!!!
8: reading data/file08: writing to log -log3-
type: file08
* * using func2. yay!!! it's Simon!!!
9: reading data/file09: writing to log -log4-
type: file09
* * using func2. yay!!! it's Simon!!!
10: reading data/file10: writing to log -log5-
type: file10
* * using func2. yay!!! it's Simon!!!
11: reading data/file07: writing to log -log1-
type: file07
* * using func2. yay!!! it's Simon!!!
12: reading data/file08: writing to log -log2-
type: file08
* * using func2. yay!!! it's Simon!!!
13: reading data/file09: writing to log -log3-
type: file09
* * using func2. yay!!! it's Simon!!!
14: reading data/file10: writing to log -log4-
type: file10
* * using func2. yay!!! it's Simon!!!
15: reading data/file07: writing to log -log5-
type: file07
* * using func2. yay!!! it's Simon!!!
16: reading data/file08: writing to log -log1-
type: file08
* * using func2. yay!!! it's Simon!!!
17: reading data/file09: writing to log -log2-
type: file09
* * using func2. yay!!! it's Simon!!!
18: reading data/file10: writing to log -log3-
type: file10
* * using func2. yay!!! it's Simon!!!
19: reading data/file07: writing to log -log4-
type: file07
* * using func2. yay!!! it's Simon!!!

*******counter was above 20: 21**********************

20: reading data/file08: writing to log -log5-
type: file08
* * using func2. yay!!! it's Simon!!!

RESULT2:
[mymachine] ~/simon/1-perl > tryFork.pl
1: reading data/file01: writing to log -log1-
type: file01
* using func1. you stink, Simon
2: reading data/file02: writing to log -log2-
type: file02
* * using func2. yay!!! it's Simon!!!
3: reading data/file03: writing to log -log3-
type: file03
* * using func2. yay!!! it's Simon!!!
4: reading data/file04: writing to log -log4-
type: file04
* * using func2. yay!!! it's Simon!!!
5: reading data/file05: writing to log -log5-
type: file05
* * using func2. yay!!! it's Simon!!!
6: reading data/file06: writing to log -log1-
type: file06
* * using func2. yay!!! it's Simon!!!
7: reading data/file07: writing to log -log2-
type: file07
* * using func2. yay!!! it's Simon!!!
8: reading data/file08: writing to log -log3-
type: file08
* * using func2. yay!!! it's Simon!!!
9: reading data/file09: writing to log -log4-
type: file09
* * using func2. yay!!! it's Simon!!!
10: reading data/file10: writing to log -log5-
type: file10
* * using func2. yay!!! it's Simon!!!
11: reading data/file07: writing to log -log1-
type: file07
* * using func2. yay!!! it's Simon!!!
12: reading data/file08: writing to log -log2-
type: file08
* * using func2. yay!!! it's Simon!!!
13: reading data/file09: writing to log -log3-
type: file09
* * using func2. yay!!! it's Simon!!!
14: reading data/file10: writing to log -log4-
type: file10
* * using func2. yay!!! it's Simon!!!
15: reading data/file07: writing to log -log5-
type: file07
* * using func2. yay!!! it's Simon!!!
16: reading data/file08: writing to log -log1-
type: file08
* * using func2. yay!!! it's Simon!!!
17: reading data/file09: writing to log -log2-
type: file09
* * using func2. yay!!! it's Simon!!!

....sometimes it stops at 10, sometimes at 15...MOST of the time it gets
all the way to 20. does anyone understand this behavior? do i need to
stick a continue block around the $pm->finish? i understand that if i
do this, the parent will also call finish, but this will be a silent
no-op when called by the parent.

 
Reply With Quote
 
 
 
 
xhoster@gmail.com
Guest
Posts: n/a
 
      11-02-2005
"it_says_BALLS_on_your forehead" <(E-Mail Removed)> wrote:
> I am getting erratic behavior with this script...
>
> my $pm = Parallel::ForkManager->new(10);

....
> my @tokens = "log1" .. "log5";

....
> for (@data) {

....
> my $outfile = shift(@tokens);


You can have up to 10 parallel processes at a time, but you try to
make them share 5 tokens. What happens if a sixth job is started before
one of the previous 5 ends, therefore @tokens is empty and $outfile gets
set to be undefined? Could that cause the problem? The number of tokens
should be at least one more than the max number of children.


Your "for (@data) {" loop has a lot of stuff going in inside of it,
including sort-of-asynchronous calls. Who knows if $_ is getting stomped
on by something? Any non-trivial foreach loop should declare a "my"
variable, rather than defaulting to $_.


> $pm->run_on_finish(
> sub {
> my ($pid, $exit_code, $ident) = @_;

....
> unless (-e "data/$missingfile") {
> push( @data, $missingfile );
> }
> }
> );


This routine modifies @data while it is being iterated over with
a for statement. That could cause problems.

perldoc perlsyn:
If any part of LIST is an array, "foreach" will get very confused if
you add or remove elements within the loop body, for example with
"splice". So don't do that.

So don't use a foreach:

##for (@data) {
while (@data) { $_=shift @data;

(although really you should use a lexical variable, rather than $_)
while (@data) { my $new_var=shift @data;

This still has the problem of what happens if @data is empty, the while
statement sees that it is empty, falls through, and only then does some
job shove something back into @data, too late to be noticed? In fact, I
think this may be the root of your problem. You need to wait for all the
stragglers to have come in and shoved whatever they have back onto the
queue, then give it another go.

do {
while (@data) { $_=shift @data;
...<the rest of what used to be your for loop but is now a while loop>
};
$pm->wait_all_children;
} while @data;



>
> ...sometimes it stops at 10, sometimes at 15...MOST of the time it gets
> all the way to 20. does anyone understand this behavior?


I think so.

> do i need to
> stick a continue block around the $pm->finish? i understand that if i
> do this, the parent will also call finish, but this will be a silent
> no-op when called by the parent.


I don't think that that is the problem, but why not do it anyway?

Xho

--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
 
Reply With Quote
 
 
 
 
it_says_BALLS_on_your forehead
Guest
Posts: n/a
 
      11-02-2005

(E-Mail Removed) wrote:
> "it_says_BALLS_on_your forehead" <(E-Mail Removed)> wrote:
> > I am getting erratic behavior with this script...
> >
> > my $pm = Parallel::ForkManager->new(10);

> ...
> > my @tokens = "log1" .. "log5";

> ...
> > for (@data) {

> ...
> > my $outfile = shift(@tokens);

>
> You can have up to 10 parallel processes at a time, but you try to
> make them share 5 tokens. What happens if a sixth job is started before
> one of the previous 5 ends, therefore @tokens is empty and $outfile gets
> set to be undefined? Could that cause the problem? The number of tokens
> should be at least one more than the max number of children.
>


ahh! thank you for pointing that out.

>
> Your "for (@data) {" loop has a lot of stuff going in inside of it,
> including sort-of-asynchronous calls. Who knows if $_ is getting stomped
> on by something? Any non-trivial foreach loop should declare a "my"
> variable, rather than defaulting to $_.
>


hmm, i thought (perhaps this is a misapprehension on my part--i will
investigate) that:

for (@data)

....automatically 'lexified' $_.

>
> > $pm->run_on_finish(
> > sub {
> > my ($pid, $exit_code, $ident) = @_;

> ...
> > unless (-e "data/$missingfile") {
> > push( @data, $missingfile );
> > }
> > }
> > );

>
> This routine modifies @data while it is being iterated over with
> a for statement. That could cause problems.
>


yeah, i was debating whether or not this was a good idea. i'm thinking
now that it's not.

> perldoc perlsyn:
> If any part of LIST is an array, "foreach" will get very confused if
> you add or remove elements within the loop body, for example with
> "splice". So don't do that.
>
> So don't use a foreach:
>
> ##for (@data) {
> while (@data) { $_=shift @data;
>
> (although really you should use a lexical variable, rather than $_)
> while (@data) { my $new_var=shift @data;
>
> This still has the problem of what happens if @data is empty, the while
> statement sees that it is empty, falls through, and only then does some
> job shove something back into @data, too late to be noticed? In fact, I
> think this may be the root of your problem. You need to wait for all the
> stragglers to have come in and shoved whatever they have back onto the
> queue, then give it another go.
>
> do {
> while (@data) { $_=shift @data;
> ...<the rest of what used to be your for loop but is now a while loop>
> };
> $pm->wait_all_children;
> } while @data;
>


this sounds sensible. i need to wrap my brain around it

>
>
> >
> > ...sometimes it stops at 10, sometimes at 15...MOST of the time it gets
> > all the way to 20. does anyone understand this behavior?

>
> I think so.
>
> > do i need to
> > stick a continue block around the $pm->finish? i understand that if i
> > do this, the parent will also call finish, but this will be a silent
> > no-op when called by the parent.

>
> I don't think that that is the problem, but why not do it anyway?
>


i'm not too comfortable with continues. i've read that they aren't used
much in real-life code. is the use of continue blocks indicative of a
wrong way of thinking? should i re-design my code so that it is not
necessary?

 
Reply With Quote
 
xhoster@gmail.com
Guest
Posts: n/a
 
      11-02-2005
"it_says_BALLS_on_your forehead" <(E-Mail Removed)> wrote:
>
> hmm, i thought (perhaps this is a misapprehension on my part--i will
> investigate) that:
>
> for (@data)
>
> ...automatically 'lexified' $_.


It automatically localizes it, which is quite different. If it lexified
it, then only things within the foreach block's lexical scope could stomp
on $_. But because it localizes it, anything within the foreach's dynamic
scope can stomp on it. Because of your use of modules and callbacks, there
is a lot of invisible stuff within the dynamic scope.

This shows the dynamic nature:

$ perl -le 'sub foo {$_="bar"}; foreach (1..10) {foo(); print $_}'
bar
bar
bar
....


> > > do i need to
> > > stick a continue block around the $pm->finish? i understand that if i
> > > do this, the parent will also call finish, but this will be a silent
> > > no-op when called by the parent.

> >
> > I don't think that that is the problem, but why not do it anyway?
> >

>
> i'm not too comfortable with continues. i've read that they aren't used
> much in real-life code. is the use of continue blocks indicative of a
> wrong way of thinking?


I very rarely use continue blocks myself, but when I see someoneelse's code
with a well-placed continue I always think to myself that that saved a lot
of messy code, and that I should remember to use them more often.

The one thing, other than failure to think of it, that does stop me from
using them more often is their lexical isolation from the main loop block.

foreach my $foo (@foo) {
##...
my $bar=something();
##...
next if something_else();
##..
} continue {
##D'oh
##Can't access $bar here
}


Anyway, they probably aren't all that rare.

#sorry about the wrap

~/perl_misc]$ find /usr/lib/perl5/ -name "*.pm" -exec perl -lne 'print
"$ARGV\t$_" if /continue\s*{/' {} \;


/usr/lib/perl5/5.8.0/File/Find.pm continue {
/usr/lib/perl5/5.8.0/File/Find.pm continue {
/usr/lib/perl5/5.8.0/ExtUtils/MakeMaker.pm } continue {
/usr/lib/perl5/5.8.0/ExtUtils/MM_MacOS.pm } continue {
/usr/lib/perl5/5.8.0/ExtUtils/MM_Unix.pm } continue {
....


> should i re-design my code so that it is not
> necessary?


Um, it already isn't necessary. As far as I can tell, the only reason
one *might* erroneously think it is necessary is if you tried to
conditionally "next" out of the run_on_finish code, which you thankfully
aren't trying to do. That would be a bad thing to do in general, even if
not using ForkManager. You are supposed to "return" out of subroutines,
or just let them finish naturally, not "next" out of them. If you use
warnings, you will get warnings when you do such uncouth things.

Put since you are using ForkManager, the "next" in the subroutine would not
only be bad manners, it would also be highly broken. The run_on_finish
coderef is called from within the parent, not the child (otherwise, pushing
the token back on the stack would have no effect, the useful stack lives in
the parent, not the child). If the the run_on_finish code invokes "next",
it is the parent that will suffer this invocation. And who knows what
havoc that will wreak.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Kamaelia 0.4.0 RELEASED - Faster! More Tools! More Examples! More Docs! ;-) Michael Python 4 06-26-2006 08:00 AM
With a Ruby Yell: more, more more! Robert Klemme Ruby 5 09-29-2005 06:37 AM
DVD Verdict reviews: SYLVESTER AND THE MAGIC PEBBLE AND MORE MAGICAL TALES and more! DVD Verdict DVD Video 0 04-07-2005 08:10 AM
Sygate uses more and more memory? Louise Computer Security 0 06-01-2004 05:30 AM
Re: With More Flash More Lumix: using an external flash unit with the FZ1 and other digicams Hans-Georg Michna Digital Photography 4 08-24-2003 06:05 PM



Advertisments