Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > Process or Thread?

Reply
Thread Tools

Process or Thread?

 
 
Pito Salas
Guest
Posts: n/a
 
      08-22-2009
I have a parent application (which I think of as a test harness) that
wants to invoke a fairly intensive image processing application against
a directory full of image files. Each image is processed independently.

So, to get performance, I wanted to get the work happening on each of
those images in parallel. So I could divide the files in the directory
into two sets, and submit one set for processing in one process/thread
and the other set in another process/thread. Note that the
sub-process/threads are almost totally separate from the parent app, so
relatively little information needs to go back and forth.

Here is what I've learned so far from reading two books and lots of
googling:

One point is that there's no process support on Windows, which isn't a
deal killer for me.

Another point is the operation on multi-core CPUs: processes will, and
threads will not use the mutliple cores. This too is fairly "don't care"
for me.

I am interested in ease of implementation and debugging. And I am also
very interested in getting the cpu and disk active at the same time as
there is a fairly large amount of data to be read form the disk.

What are your recommendations?
--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
 
 
 
7stud --
Guest
Posts: n/a
 
      08-22-2009
Pito Salas wrote:
> I have a parent application (which I think of as a test harness) that
> wants to invoke a fairly intensive image processing application against
> a directory full of image files. Each image is processed independently.
>


It doesn't sound like your situation will result in improved performance
with threads. Things don't actually get done at the same time with
threads--that's an illusion. What happens is that there is very fast
switching between different tasks. However, if your tasks do not have
dead time during the processing, then using threads won't improve
performance. For instance, suppose you have two tasks that each take 3
minutes to complete. The processing might happen in this order with
threads:

task1: 1 minute
task2: 1 minute
task1: 1 minute
task2: 1 minute
task1: 1 minute
task2: 1 minute
--------------
total = 6 minutes

But if you just ran each task sequentially without using threads, the
total time would also be 6 minutes. Using threads will only speed up
processing time if your tasks have idle time when they are doing
nothing. During that down time, if you switch to another task in
another thread, then total processing time will be lower.




--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
 
 
 
Mario Camou
Guest
Posts: n/a
 
      08-22-2009
On Sat, Aug 22, 2009 at 16:47, 7stud -- <(E-Mail Removed)> wrote:

> Pito Salas wrote:
>

It doesn't sound like your situation will result in improved performance
> with threads. Things don't actually get done at the same time with
> threads--that's an illusion. What happens is that there is very fast
> switching between different tasks. However, if your tasks do not have
> dead time during the processing, then using threads won't improve
> performance. For instance, suppose you have two tasks that each take 3
> minutes to complete. The processing might happen in this order with
> threads:
>
> task1: 1 minute
> task2: 1 minute
> task1: 1 minute
> task2: 1 minute
> task1: 1 minute
> task2: 1 minute
> --------------
> total =3D 6 minutes
>


That's true if you=B4re running MRI, since it uses "green" threads (i.e., =
you
really have a single OS-level thread that gets task-switched by Ruby
itself). However, if you run on JRuby, the Ruby Thread support gets mapped
onto the Java Thread support, which *does* map to OS-level threads and
therefore will take advantage of multiple cores if you have them. In that
case you *would* get faster processing.

Hope this helps,
-Mario.

 
Reply With Quote
 
Gary Wright
Guest
Posts: n/a
 
      08-22-2009

On Aug 22, 2009, at 1:43 PM, Mario Camou wrote:
>
> That's true if you=B4re running MRI, since it uses "green" threads =20=


> (i.e., you
> really have a single OS-level thread that gets task-switched by Ruby
> itself). However, if you run on JRuby, the Ruby Thread support gets =20=


> mapped
> onto the Java Thread support, which *does* map to OS-level threads and
> therefore will take advantage of multiple cores if you have them. In =20=


> that
> case you *would* get faster processing.


For a CPU intensive task (image processing), i doubt that two OS
threads running on two core's is going to be any more efficient
than two processes running on two cores. Multi-threading introduces
complications that are neatly avoided by using multiple processes.
I'd much rather deal with a multi-process architecture than a
multi-threaded architecture.

Gary Wright




 
Reply With Quote
 
Pito Salas
Guest
Posts: n/a
 
      08-22-2009
Gary Wright wrote:
> For a CPU intensive task (image processing), i doubt that two OS
> threads running on two core's is going to be any more efficient
> than two processes running on two cores. Multi-threading introduces
> complications that are neatly avoided by using multiple processes.
> I'd much rather deal with a multi-process architecture than a
> multi-threaded architecture.
>
> Gary Wright


Thanks all for your responses.

A note: he files being processed are quite large and numerous. So
there's also plenty of file IO that has to happen. In the vanilla 'green
thread' case, would you expect performance improvements, because while
one thread was blocked for IO the other one could run?

Thanks again,

Pito



--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
Kent Friis
Guest
Posts: n/a
 
      08-22-2009
Den Sat, 22 Aug 2009 09:30:36 -0500 skrev Pito Salas:
> I have a parent application (which I think of as a test harness) that
> wants to invoke a fairly intensive image processing application against
> a directory full of image files. Each image is processed independently.
>
> So, to get performance, I wanted to get the work happening on each of
> those images in parallel. So I could divide the files in the directory
> into two sets, and submit one set for processing in one process/thread
> and the other set in another process/thread. Note that the
> sub-process/threads are almost totally separate from the parent app, so
> relatively little information needs to go back and forth.
>
> Here is what I've learned so far from reading two books and lots of
> googling:
>
> One point is that there's no process support on Windows, which isn't a
> deal killer for me.


Not quite. Look in Task Manager, there is a list of processes running.
What Windows possibly lacks is fork(), the unix way of creating
processes. It does however have CreateProcess (I think that's what
is called), which behaves like fork+exec.

If you split the "controller" process and the "worker" process into
two different programs, it won't be a problem. If you insist on
having them as one program, you'll need to do a bit more work
(add a comamnd line argument telling the new process that it's a
worker process).

> Another point is the operation on multi-core CPUs: processes will, and
> threads will not use the mutliple cores. This too is fairly "don't care"
> for me.


Native threads will, Ruby green threads won't.

> I am interested in ease of implementation and debugging.


Debugging is lots easier with processes, as one process cannot
accidentally overwrite data of another (shared memory is possible,
but needs to be allocated explicitly).

That may not be as big a problem with Ruby green threads, as the
runtime knows what each thread is up to.

> And I am also
> very interested in getting the cpu and disk active at the same time as
> there is a fairly large amount of data to be read form the disk.
>
> What are your recommendations?


I would go for processes. But that's coming from C, where there is no
runtime keeping track of what each thread is doing. With processes,
the OS will prevent one OS from overwriting the data of another.

/Kent
--
"The Brothers are History"
 
Reply With Quote
 
Gary Wright
Guest
Posts: n/a
 
      08-23-2009

On Aug 22, 2009, at 5:02 PM, Pito Salas wrote:
>
> A note: he files being processed are quite large and numerous. So
> there's also plenty of file IO that has to happen. In the vanilla
> 'green
> thread' case, would you expect performance improvements, because while
> one thread was blocked for IO the other one could run?


Whether you use threads or processes your CPU-bound tasks will run while
your IO-bound tasks are waiting for the disk.

Gary Wright

 
Reply With Quote
 
Robert Klemme
Guest
Posts: n/a
 
      08-23-2009
On 23.08.2009 01:28, Kent Friis wrote:

>> I am interested in ease of implementation and debugging.

>
> Debugging is lots easier with processes, as one process cannot
> accidentally overwrite data of another (shared memory is possible,
> but needs to be allocated explicitly).


IMHO a multitude of processes does not necessarily ease debugging. If
you need to find out which process is running berserk or exhibiting a
bug that may be more difficult than debugging of a single interpreter
process. Also, if there are communication issues between two processes
that may be difficult to debug as well.

Having said that, both approaches are pretty easy to implement, given
that DRb is a full fledged remote object call feature (similar to RMI
and CORBA).

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
 
Reply With Quote
 
Charles Oliver Nutter
Guest
Posts: n/a
 
      09-03-2009
On Sat, Aug 22, 2009 at 2:07 PM, Gary Wright<(E-Mail Removed)> wrote:
> For a CPU intensive task (image processing), i doubt that two OS
> threads running on two core's is going to be any more efficient
> than two processes running on two cores. =C2=A0Multi-threading introduces
> complications that are neatly avoided by using multiple processes.
> I'd much rather deal with a multi-process architecture than a
> multi-threaded architecture.


You're correct, if the processes don't talk to each other. But if you
have to pass information across processes, things suddenly get a lot
more tangled and IPC-bound than with threads. It's a tradeoff, as
always.

- Charlie

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Process Cannot access file "file_name" because it is being used by another process Rithesh Pai ASP .Net 1 08-22-2005 03:02 PM
(Win32) Timing out a process while reading process' output? rtm Perl 0 09-27-2004 10:06 PM
A process serving application pool 'DefaultAppPool' exceeded time limits during start up. The process id was '216'. jack ASP .Net 0 08-01-2004 09:49 PM
The process cannot access the file because it is being used by another process. Jerry ASP .Net 4 12-15-2003 06:07 PM
Are all the signals read in the process should appear in the sensitivity list of the process? walala VHDL 3 09-09-2003 07:47 AM



Advertisments