nolo contendere <> wrote:
> On Apr 24, 2:35=A0pm, xhos...@gmail.com wrote:
> >
> > I still don't understand. =A0How is the timing significant? =A0If you
> > want each file to start being processed as soon as it shows up, then
> > what difference does it make whether they tend to show up in clumps of
> > three? As soon as they show up is as soon as they show up, regardless
> > of when that is. Is there something significant that the clumps of
> > three have in common *other* than merely their timing?
>
> The difference lies in my implementation of the solution, not
> necessarily the problem.
....
> So to answer your earlier questions of the difference it makes, and
> the significance: it changed my thought process (hopefully for the
> better) around how to handle this incarnation of staggered-yet-
> concurrent job processing.
OK, so let me try to change your thought process yet again, then
The master process does all the waiting. That way, it fights with no
one but itself. First it waits for a file to exist (if necessary) then
it waits for ForkManager to let it start a new process (if necessary).
It does a rename in between. There is no one else trying to do the rename,
so no worry about race conditions (unless you unwisely start two master
processes!).
## Set a reasonable upper limit of 10. May never be reached!
my $pm=Parallel::ForkManager->new(10);
while ( is_before($stop_checking_time) && !$done ) {
my @files = glob "${class}_*";
sleep 1 unless (@files);
foreach my $file (@files) {
my $new_name="foo_$file";
## it is important that the renamed file won't match the glob on
## the next time through the loop!
rename $files, $new_name or die $!;
$pm->start() and next;
process($new_name);
$pm->finish();
};
};
If the main process remembers what files were already started, then
it could remember to skip those ones the next time through and wouldn't
need to bother with the renaming.
Of course, you could always change the sleep loop into some kind of FS
change notifier, as was discussed elsewhere in the thread. But doing a
glob once a second is probably not going to be a problem. At least, I
wouldn't worry about until it proves itself to be a problem.
> >
> > If you do that, the "threads" will be fighting over the files. You
> > will have to code that very, very carefully. But depending on your
> > answer to my first question, it might be moot.
>
> Yes, the "very, very carefully" is why I posted to begin with, hoping
> for an elegant and efficient solution.
It's best to avoid needing to do it all, like above. But if I needed
to do this, this is how I would try. If you are using threads and they
have the same PID, then you will have to use something other than $$.
my $file="some_file_we_are_fighting_over";
my $newname="$$.$file";
if (-e $newname) {
die "Should never happen. A previous job must have had the same PID,
Accomplished the rename, then failed to clean up after itself";
};
if (rename $file, $newname) {
## we won. Do whatever needs doing;
## But since we are paranoid...
-e $newname or die "$newname: how could this be?";
process($newname);
} elsif ($!{ENOENT}) {
## The file didn't exist, most likely someone else beat us to it.
## Do nothing, fall through to next iteration
} else {
## something *else* went wrong. What could it be?
die "Rename $file, $newname failed in an unexpected way: $!";
}
> > $num_threads determines the *maximum* number of processes that will be
> > live at any one time. This should be determined based on the number of
> > CPUs or the amount of main memory or the IO bandwidth that your server
> > has. It should not be determined by the count of the number of tasks
> > to be done, as you seem to be doing here.
>
> Yeah, I know, it's dangerous. There *shouldn't* be more than 40 files
> at a time (I know, I know, stupid to believe this will actually be
> true),
But there is no reason to take this risk. Hard code 40 as the max number
of processes (I'd probably go lower myself, but if you think 40 is the
number below which you don't need to worry...). If there are ever more
than forty, then some will have to wait in line, and don't crash your
machine. If there are never more than forty, then hardcoding the value of
40 instead of passing around $num_threads doesn't change the behavior at
all (and makes the code cleaner to boot).
Xho
--
--------------------
http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.