Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Re: problem in implementing multiprocessing

Reply
Thread Tools

Re: problem in implementing multiprocessing

 
 
James Mills
Guest
Posts: n/a
 
      01-19-2009
On Mon, Jan 19, 2009 at 3:50 PM, gopal mishra <> wrote:
> i know this is not an io - bound problem, i am creating heavy objects in the
> process and add these objects in to queue and get that object in my main
> program using queue.
> you can test the this sample code
> import time
> from multiprocessing import Process, Queue
>
> class Data(object):
> def __init__(self):
> self.y = range(1, 1000000)
>
> def getdata(queue):
> data = Data()
> queue.put(data)
>
> if __name__=='__main__':
> t1 = time.time()
> d1 = Data()
> d2 = Data()
> t2 = time.time()
> print "without multiProcessing total time:", t2-t1
> #multiProcessing
> queue = Queue()
> Process(target= getdata, args=(queue, )).start()
> Process(target= getdata, args=(queue, )).start()
> s1 = queue.get()
> s2 = queue.get()
> t2 = time.time()
> print "multiProcessing total time::", t2-t1


The reason your code above doesn't work as you
expect and the multiprocessing part takes longer
is because your Data objects are creating a list
(a rather large list) of ints. Use xrange instead of range.

Here's what I get (using xrange):

$ python test.py
without multiProcessing total time: 1.50203704834e-05
multiProcessing total time:: 0.116630077362

cheers
James
 
Reply With Quote
 
 
 
 
Carl Banks
Guest
Posts: n/a
 
      01-19-2009
On Jan 18, 10:00*pm, "James Mills" <prolo...@shortcircuit.net.au>
wrote:
> On Mon, Jan 19, 2009 at 3:50 PM, gopal mishra <gop...@infotechsw.com> wrote:
> > i know this is not an io - bound problem, i am creating heavy objects in the
> > process and add these objects in to queue and get that object in my main
> > program using queue.
> > you can test the this sample code
> > import time
> > from multiprocessing import Process, Queue

>
> > class Data(object):
> > * *def __init__(self):
> > * * * *self.y = range(1, 1000000)

>
> > def getdata(queue):
> > * *data = Data()
> > * *queue.put(data)

>
> > if __name__=='__main__':
> > * *t1 = time.time()
> > * *d1 = Data()
> > * *d2 = Data()
> > * *t2 = time.time()
> > * *print "without multiProcessing total time:", t2-t1
> > * *#multiProcessing
> > * *queue = Queue()
> > * *Process(target= getdata, args=(queue, )).start()
> > * *Process(target= getdata, args=(queue, )).start()
> > * *s1 = queue.get()
> > * *s2 = queue.get()
> > * *t2 = time.time()
> > * *print "multiProcessing total time::", t2-t1

>
> The reason your code above doesn't work as you
> expect and the multiprocessing part takes longer
> is because your Data objects are creating a list
> (a rather large list) of ints.


I'm pretty sure gopal is creating a deliberately large object to use
as a
test case, so switching to xrange isn't going to help here.

Since multiprocessing serializes and deserializes the data while
passing
it from process to process, passing very large objects would have a
very
high latency and overhead. IOW, gopal's diagnosis is correct. It's
just not practical to share very large objects among seperate
processes.

For simple data like large arrays of floating point numbers, the data
can be shared with an mmaped file or some other memory-sharing scheme,
but actual Python objects can't be shared this way. If you have
complex
data (networks and heirarchies and such) it's a lot harder to share
this
information among processes.


Carl Banks
 
Reply With Quote
 
 
 
 
Aaron Brady
Guest
Posts: n/a
 
      01-19-2009
On Jan 19, 3:09*am, Carl Banks <pavlovevide...@gmail.com> wrote:
snip
> Since multiprocessing serializes and deserializes the data while
> passing
> it from process to process, passing very large objects would have a
> very
> high latency and overhead. *IOW, gopal's diagnosis is correct. *It's
> just not practical to share very large objects among seperate
> processes.


You could pass composite objects back and forth by passing pieces back
and forth. You'd have to construct it so as not to need access to the
entire data structure in any one piece; that is, only need access to
other small pieces.

> For simple data like large arrays of floating point numbers, the data
> can be shared with an mmaped file or some other memory-sharing scheme,
> but actual Python objects can't be shared this way. *If you have
> complex
> data (networks and heirarchies and such) it's a lot harder to share
> this
> information among processes.


It wouldn't hurt to have a minimal set of Python objects that are
'persistent live', that is, stored out of memory in their native
form. The only problem is, they can't contain references to volatile
objects. (I don't believe POSH addresses this.)
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: problem with multiprocessing and defaultdict Robert Kern Python 2 01-12-2010 12:16 PM
Problem with multiprocessing managers Metalone Python 0 01-06-2010 10:50 PM
Problem with multiprocessing tleeuwenburg@gmail.com Python 3 09-04-2009 05:48 AM
Multiprocessing problem with producer/consumer Wu Zhe Python 2 05-27-2009 07:06 PM
problem in implementing multiprocessing gopal mishra Python 0 01-16-2009 05:35 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57