Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > examples of realistic multiprocessing usage?

Reply
Thread Tools

examples of realistic multiprocessing usage?

 
 
TomF
Guest
Posts: n/a
 
      01-16-2011
I'm trying to multiprocess my python code to take advantage of multiple
cores. I've read the module docs for threading and multiprocessing,
and I've done some web searches. All the examples I've found are too
simple: the processes take simple inputs and compute a simple value.
My problem involves lots of processes, complex data structures, and
potentially lots of results. It doesn't map cleanly into a Queue,
Pool, Manager or Listener/Client example from the python docs.

Instead of explaining my problem and asking for design suggestions,
I'll ask: is there a compendium of realistic Python multiprocessing
examples somewhere? Or an open source project to look at?

Thanks,
-Tom

 
Reply With Quote
 
 
 
 
Adam Tauno Williams
Guest
Posts: n/a
 
      01-16-2011
> Instead of explaining my problem and asking for design suggestions,
> I'll ask: is there a compendium of realistic Python multiprocessing
> examples somewhere?


Not that I've ever seen.

> Or an open source project to look at?


OpenGroupware Coils uses multiprocessing [in conjunction with AMQ].
<http://sourceforge.net/projects/coils/>

 
Reply With Quote
 
 
 
 
Philip Semanchuk
Guest
Posts: n/a
 
      01-16-2011

On Jan 16, 2011, at 2:05 PM, TomF wrote:

> I'm trying to multiprocess my python code to take advantage of multiple cores. I've read the module docs for threading and multiprocessing, and I've done some web searches. All the examples I've found are too simple: the processes take simple inputs and compute a simple value. My problem involves lots of processes, complex data structures, and potentially lots of results. It doesn't map cleanly into a Queue, Pool, Manager or Listener/Client example from the python docs.
>
> Instead of explaining my problem and asking for design suggestions, I'll ask: is there a compendium of realistic Python multiprocessing examples somewhere? Or an open source project to look at?



A colleague pointed me to this project the other day.

http://gluino.com/


I grepped through the code to see that it's using multiprocessing.Listener. I didn't go any further than that because our project is BSD licensed and the license for Gluino is unclear. Until I find out whether or not its under an equally permissive license, I can't borrow ideas and/or code from it.

Hope it's of some help to you, though.

Cheers
Philip
 
Reply With Quote
 
Adam Skutt
Guest
Posts: n/a
 
      01-17-2011
On Jan 16, 2:05*pm, TomF <(E-Mail Removed)> wrote:
> Instead of explaining my problem and asking for design suggestions,
> I'll ask: is there a compendium of realistic Python multiprocessing
> examples somewhere? *Or an open source project to look at?


There are tons, but without even a knowledge domain, it's difficult to
recommend much of anything. Multiprocessing for I/O (e.g., web
serving) tends to look different and be structured differently from
multiprocessing for CPU-intensive tasking (e.g., digital signal
processing), and both look different from things with specific
requirements w.r.t latency (e.g., video game server, hard-real time
applications) or other requirements.

Even the level at which you parallel process can change things
dramatically. Consider the simple case of matrix multiplication. I
can make the multiplication itself parallel; or assuming I have
multiple sets of matricies I want to multiply (common), I can make
execute each multiplication in parallel. Both solutions look
different, and a solution that uses both levels of parallelism
frequently will look different still.

Adam
 
Reply With Quote
 
Dan Stromberg
Guest
Posts: n/a
 
      01-17-2011
On Sun, Jan 16, 2011 at 11:05 AM, TomF <(E-Mail Removed)> wrote:
> I'm trying to multiprocess my python code to take advantage of multiple
> cores. *I've read the module docs for threading and multiprocessing, and
> I've done some web searches. *All the examples I've found are too simple:
> the processes take simple inputs and compute a simple value. *My problem
> involves lots of processes, complex data structures, and potentially lots of
> results. *It doesn't map cleanly into a Queue, Pool, Manager or
> Listener/Client example from the python docs.
>
> Instead of explaining my problem and asking for design suggestions, I'll
> ask: is there a compendium of realistic Python multiprocessing examples
> somewhere? *Or an open source project to look at?


I'm unaware of a big archive of projects that use multiprocessing, but
maybe one of the free code search engines could help with that.

It sounds like you're planning to use mutable shared state, which is
generally best avoided if at all possible, in concurrent programming -
because mutable shared state tends to slow down things quite a bit.

But if you must have mutable shared state that's more complex than a
basic scalar or homogeneous array, I believe the multiprocessing
module would have you use a "server process manager".
 
Reply With Quote
 
TomF
Guest
Posts: n/a
 
      01-17-2011
On 2011-01-16 19:16:15 -0800, Dan Stromberg said:

> On Sun, Jan 16, 2011 at 11:05 AM, TomF <(E-Mail Removed)> wrote:
>> I'm trying to multiprocess my python code to take advantage of multiple
>> cores. *I've read the module docs for threading and multiprocessing, and
>> I've done some web searches. *All the examples I've found are too simple:
>> the processes take simple inputs and compute a simple value. *My problem
>> involves lots of processes, complex data structures, and potentially lots of
>> results. *It doesn't map cleanly into a Queue, Pool, Manager or
>> Listener/Client example from the python docs.
>>
>> Instead of explaining my problem and asking for design suggestions, I'll
>> ask: is there a compendium of realistic Python multiprocessing examples
>> somewhere? *Or an open source project to look at?

>
> I'm unaware of a big archive of projects that use multiprocessing, but
> maybe one of the free code search engines could help with that.
>
> It sounds like you're planning to use mutable shared state, which is
> generally best avoided if at all possible, in concurrent programming -
> because mutable shared state tends to slow down things quite a bit.
>

I'm trying to avoid mutable shared state since I've read the cautions
against it. I think it's possible for each worker to compute changes
and return them back to the parent (and have the parent coordinate all
changes) without too much overhead. So far It looks like
multiprocessing.Pool.apply_async is the best match to what I want.

One difficulty is that there is a queue of work to be done and a queue
of results to be incorporated back into the parent; there is no
one-to-one correspondence between the two. It's not obvious to me how
to coordinate the queues in a natural way to avoid deadlock or
starvation.

>
> But if you must have mutable shared state that's more complex than a
> basic scalar or homogeneous array, I believe the multiprocessing
> module would have you use a "server process manager".


I've looked into Manager but I don't really understand the trade-offs.
-Tom

 
Reply With Quote
 
Adam Skutt
Guest
Posts: n/a
 
      01-17-2011
On Jan 16, 11:39*pm, TomF <(E-Mail Removed)> wrote:
> One difficulty is that there is a queue of work to be done and a queue
> of results to be incorporated back into the parent; there is no
> one-to-one correspondence between the two. *It's not obvious to me how
> to coordinate the queues in a natural way to avoid deadlock or
> starvation.
>


Depends on what you are doing. If you can enqueue all the jobs before
waiting for your results, then two queues are adequate. The first
queue is jobs to be accomplished, the second queue is the results.
The items you put on the result queue have both the result and some
sort of id so the results can be ordered after the fact. Your parent
thread of execution (thread hereafter) then:

1. Adds jobs to the queue
2. Blocks until all the results are returned. Given that you
suggested that there isn't a 1:1 correspondence between jobs and
results, have the queue support a message saying, 'Job X is done'.
You're finished when all jobs send such a message.
3. Sorts the results into the desired ordered.
4. Acts on them.

If you cannot enqueue all the jobs before waiting for the results, I
suggest turning the problem into a pipeline, such that the thread
submitting the jobs and the thread acting on the results are
different: submitter -> job processor -> results processor.

Again though, the devil is in the details and without more details,
it's hard to suggest an explicit approach. The simplest way to avoid
contention between two queues is to just remove it entirely (by
converting the processing to a single pipeline like I suggested). If
that is not possible, then I suggest moving to pipes (or some other
form of I/O based IPC) and asynchronous I/O. But I'd only do that if
I really couldn't write a pipeline.

Adam
 
Reply With Quote
 
TomF
Guest
Posts: n/a
 
      01-17-2011
On 2011-01-16 20:57:41 -0800, Adam Skutt said:

> On Jan 16, 11:39*pm, TomF <(E-Mail Removed)> wrote:
>> One difficulty is that there is a queue of work to be done and a queue
>> of results to be incorporated back into the parent; there is no
>> one-to-one correspondence between the two. *It's not obvious to me how
>> to coordinate the queues in a natural way to avoid deadlock or
>> starvation.
>>

>
> Depends on what you are doing. If you can enqueue all the jobs before
> waiting for your results, then two queues are adequate. The first
> queue is jobs to be accomplished, the second queue is the results.
> The items you put on the result queue have both the result and some
> sort of id so the results can be ordered after the fact. Your parent
> thread of execution (thread hereafter) then:
>
> 1. Adds jobs to the queue
> 2. Blocks until all the results are returned. Given that you
> suggested that there isn't a 1:1 correspondence between jobs and
> results, have the queue support a message saying, 'Job X is done'.
> You're finished when all jobs send such a message.
> 3. Sorts the results into the desired ordered.
> 4. Acts on them.
>
> If you cannot enqueue all the jobs before waiting for the results, I
> suggest turning the problem into a pipeline, such that the thread
> submitting the jobs and the thread acting on the results are
> different: submitter -> job processor -> results processor.
> Adam


Thanks for your reply. I can enqueue all the jobs before waiting for
the results, it's just that I want the parent to process the results as
they come back. I don't want the parent to block until all results are
returned. I was hoping the Pool module had a test for whether all
processes were done, but I guess it isn't hard to keep track of that
myself.

-Tom

 
Reply With Quote
 
Adam Skutt
Guest
Posts: n/a
 
      01-17-2011
On Jan 17, 12:44*am, TomF <(E-Mail Removed)> wrote:
> Thanks for your reply. *I can enqueue all the jobs before waiting for
> the results, it's just that I want the parent to process the results as
> they come back. *I don't want the parent to block until all results are
> returned. *I was hoping the Pool module had a test for whether all
> processes were done, but I guess it isn't hard to keep track of that
> myself.
>


Regardless of whether it does or doesn't, you don't really want to be
blocking in two places anyway, so the "FINISHED" event in the queue is
the superior solution.

It's certainly possible to build a work pool w/ a queue such that you
block on both for entries added to the queue and job completion, but
I'm pretty sure it's something you'd have to write yourself.

Adam
 
Reply With Quote
 
Adam Tauno Williams
Guest
Posts: n/a
 
      01-17-2011
On Mon, 2011-01-17 at 13:55 +0000, Albert van der Horst wrote:
> In article <(E-Mail Removed)>,
> Philip Semanchuk <(E-Mail Removed)> wrote:
> <SNIP>
> >I grepped through the code to see that it's using =
> >multiprocessing.Listener. I didn't go any further than that because our =
> >project is BSD licensed and the license for Gluino is unclear. Until I =
> >find out whether or not its under an equally permissive license, I can't =
> >borrow ideas and/or code from it.

> You have been brain washed by the Intellectual Properties congsy.
> Of course you can read through code to borrow idea's from it.


I wouldn't; and there is no brain-washing.

It is very unwise to look at GPL'd code if you are working on a non-GPL
project; the GPL is specifically and intentionally viral. The
distinction between reading-through-code-and-borrowing-ideas and
copying-code is thin and best left to lawyers.

Aside: Comments to the contrary often stand-on-their-head to make such
cases. For example:

"You do have a choice under the GPL license: you can stop using the
stolen code and write your own, or you can decide you'd rather release
under the GPL. But the choice is yours. If you say, I choose neither,
then the court can impose an injunction to stop you from further
distribution, but it won't order your code released under the GPL. ...
Of course, you could avoid all such troubles in the first place by not
stealing GPL code to begin with"
<http://www.groklaw.net/article.php?story=20031214210634851>

Seriously? What that basically means is you can't use GPL'd code in a
non-GPL'd product/project. Saying if you do it is OK, but you'll be
required to replace the code or change your license is
standing-on-ones-head. Risking a forced reimplementation of a core
component of an existing application is 'just nuts'.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Are 22 megapixel APS-C sensors realistic? mexican_equivalent@yahoo.com Digital Photography 43 09-05-2006 12:48 PM
[Realistic] specs for Visual Studio 2005 Mantorok ASP .Net 3 07-12-2006 03:15 PM
Finally something realistic on CSI bob Digital Photography 1 02-19-2005 11:21 PM
Decentralized web site: Realistic? (Pete Cresswell) HTML 7 12-05-2004 02:10 PM
Pentax *st DS : Realistic release date? Spanyardo Digital Photography 8 11-21-2004 08:40 AM



Advertisments