Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > IPC looking for simple/best way to communicate

Reply
Thread Tools

IPC looking for simple/best way to communicate

 
 
whansen_at_corporate-image_dot_com@us.com
Guest
Posts: n/a
 
      01-07-2005
I'm trying to find the best way to do something. I've got 50 processes
(may have 200 soon) that need to broadcast simple messages to each of
them. I tried doing this with sockets, but although I was able to get
them to read a socket without halting succesfully I found that when
one process reads a socket it removes the message from the socket and
the other processes don't see it. My current solution works using
IPC::Shareable, but is slow and hogs memory as well as the CPU.
Shareable lets you set a variablle that multiple programs can read and
write to. In my case they read and write to the list that they all run
off.

Basicly each process is iterating over a list (array) and every so
often a process gets a result that means that item no longer needs to
be ran, so it should remove it from it's list and notify the other
processes so that they can remove it from theirs as well. Whith
IPC::Shareable it works nicely as when one process removes the item,
all the others have it removed also, but it appers that the shareable
module is slowing things down considerablly (CPU usage doubled).

If someone could point me in the right direction, that would be great.
I have an idea for speeding up shareable a little, but it's still not
going to be fast.
 
Reply With Quote
 
 
 
 
xhoster@gmail.com
Guest
Posts: n/a
 
      01-07-2005
http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:
> I'm trying to find the best way to do something. I've got 50 processes
> (may have 200 soon) that need to broadcast simple messages to each of
> them. I tried doing this with sockets, but although I was able to get
> them to read a socket without halting succesfully I found that when
> one process reads a socket it removes the message from the socket and
> the other processes don't see it.


You need a separate socket to each process, and send the message to each of
those sockets. I'd probably have one house-keeper process which collects
messages from all 50 processes and redistributes them to all 50 processes,
although I don't know that that is necessary.

> My current solution works using
> IPC::Shareable, but is slow and hogs memory as well as the CPU.
> Shareable lets you set a variablle that multiple programs can read and
> write to. In my case they read and write to the list that they all run
> off.


You probably shouldn't do it that way. Make just an array holding
exceptions to the main array be shared. Then periodically update the
(local) main array based on the shared exception array.

> Basicly each process is iterating over a list (array) and every so
> often a process gets a result that means that item no longer needs to
> be ran, so it should remove it from it's list and notify the other
> processes so that they can remove it from theirs as well.


What are the consequences if a process doesn't get the message and runs
that task anyway? Is it just a waste of resources, or is it fatal to the
whole thing you are trying to do?

How many such removal messages do you generate, in relation to the full
size of the array to iterate over? If small, it would probably be most
efficient to just run even the "removed" tasks and then filter them out in
post-processing.

How does it work that your processes are iterating over an array which
is changing during the iteration? That seems like a problem waiting to
happen.


> Whith
> IPC::Shareable it works nicely as when one process removes the item,
> all the others have it removed also, but it appers that the shareable
> module is slowing things down considerablly (CPU usage doubled).
>
> If someone could point me in the right direction, that would be great.
> I have an idea for speeding up shareable a little, but it's still not
> going to be fast.


Each process could keep their own private version of the array, and
only refresh it against the shared version (or against a shared
exception list) every now and then. How often it would do this refresh
would depend on the cost of the refresh vs. the wasted effort that goes
into processing tasks that have been removed since the last refresh.


When I had to do something sort of like this I used a much simpler
approach. Each parallel child process was given a batch to do, then would
reported back to the house-keeper on which things should be removed, and
then exited. The house-keeper would process those exception to make a new
set of batches, and spawn another round of child processes.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
 
Reply With Quote
 
 
 
 
whansen_at_corporate-image_dot_com@us.com
Guest
Posts: n/a
 
      01-10-2005
Intersperesed below:

On 07 Jan 2005 20:24:52 GMT, (E-Mail Removed) wrote:

>(E-Mail Removed) wrote:
>> I'm trying to find the best way to do something. I've got 50 processes
>> (may have 200 soon) that need to broadcast simple messages to each of
>> them. I tried doing this with sockets, but although I was able to get
>> them to read a socket without halting succesfully I found that when
>> one process reads a socket it removes the message from the socket and
>> the other processes don't see it.

>
>You need a separate socket to each process, and send the message to each of
>those sockets. I'd probably have one house-keeper process which collects
>messages from all 50 processes and redistributes them to all 50 processes,
>although I don't know that that is necessary.


If I go this way I probably should set up a server on a socket and
have it maintain the list centrally, but I'm a little unsure of the
speed. The 50 processes can iterate as fast as 3 times a second or 150
times a second for all of them. A central server process is going to
need to be able to keep up with this rate.


>> My current solution works using
>> IPC::Shareable, but is slow and hogs memory as well as the CPU.
>> Shareable lets you set a variablle that multiple programs can read and
>> write to. In my case they read and write to the list that they all run
>> off.

>
>You probably shouldn't do it that way. Make just an array holding
>exceptions to the main array be shared. Then periodically update the
>(local) main array based on the shared exception array.


I actually did something like that, but used a string instead of an
array. It greately sped things up. Each process merely looks for a
change to the string and if there are any it then decodes the string
and modifies its internal list.

>> Basicly each process is iterating over a list (array) and every so
>> often a process gets a result that means that item no longer needs to
>> be ran, so it should remove it from it's list and notify the other
>> processes so that they can remove it from theirs as well.

>
>What are the consequences if a process doesn't get the message and runs
>that task anyway? Is it just a waste of resources, or is it fatal to the
>whole thing you are trying to do?
>
>How many such removal messages do you generate, in relation to the full
>size of the array to iterate over? If small, it would probably be most
>efficient to just run even the "removed" tasks and then filter them out in
>post-processing.


It's not fatal, but it wastes the request. Each request has a chance
of getting a product for my company. So, concentrating on products
that are still available and not wasting requests on taken products
will improve our chances of getting products 10-20% or so.

>How does it work that your processes are iterating over an array which
>is changing during the iteration? That seems like a problem waiting to
>happen.


I didn't like that ethior, so instead of removing an element I set it
to "". Then the main loop has a line:
unless ($number) {next}
to skip over removed entries. The extra time for the partial iteration
is tiny compared to the other factors so this seemed the most elegant
solution.

>> Whith
>> IPC::Shareable it works nicely as when one process removes the item,
>> all the others have it removed also, but it appers that the shareable
>> module is slowing things down considerablly (CPU usage doubled).
>>
>> If someone could point me in the right direction, that would be great.
>> I have an idea for speeding up shareable a little, but it's still not
>> going to be fast.

>
>Each process could keep their own private version of the array, and
>only refresh it against the shared version (or against a shared
>exception list) every now and then. How often it would do this refresh
>would depend on the cost of the refresh vs. the wasted effort that goes
>into processing tasks that have been removed since the last refresh.


I have it do it for every request. Durring actual conditions the
requests slow down to a 10 second response which is all the time in
the world for this use.

>When I had to do something sort of like this I used a much simpler
>approach. Each parallel child process was given a batch to do, then would
>reported back to the house-keeper on which things should be removed, and
>then exited. The house-keeper would process those exception to make a new
>set of batches, and spawn another round of child processes.


That would work nicely, but timing is very important and just bringing
up 50 processes can take up to ten minutes if they are really active.
I thought of using the parent/child communication methods, but I'd
rather do it outright as I am already down this design leg.

>Xho
>
>--
>-------------------- http://NewsReader.Com/ --------------------
>Usenet Newsgroup Service $9.95/Month 30GB


 
Reply With Quote
 
whansen_at_corporate-image_dot_com@us.com
Guest
Posts: n/a
 
      01-10-2005

So your advice boils down to, don't ask questions on Usenet?

some stats:
1.7GHz Processor
1024 MB
2x80GB raid 1
500 GB montly badwith
10mbps internet connection
Mandrake 9.2
Perl 5.8.1


Speed is important, each process should be able to iterate at 3 per
second with the entire lot comeing in at 150 itterations per second,
although under actual war conditions this slows down to 10 seconds per
cycle or about 5 iterations per second for all processes.

Cpu utalization is important as the current Shareable implementation
uses about 2% of CPU running at 3 requests per second. 50 processes
meaning we're at very high CPU usage. However this drops consdierably
durring the actual run time when things slow down to 10 seconds per
response.

Memory is important as it is limited. The current processes use less
than 1% of the gig memory each. There is about 400megs free on the
system when they are all running. This could be a factor if we
increase the 50 processes to 200 or so. We are unable to upgrade
memory without a new server and additional costs, so memory is the
major factor lmiiting the number of processes we run.

> - portability,

should work on any reasonable linux box with perl
> - maintainance,

the code you write needs its oil changed every once in a while?
> - scalability,

more processes doing the same thing as described above.
> - development time,

not a huge factor. Getting it done right is more important
> - development costs,

not really a factor. I'm getting paid
> - ease of deployment,

sftp
> - simplicity,

so long as I can figgure it out
> - operational costs,

not even going to try to figgure that out, doesn't matter or is
insignificant.


On 08 Jan 2005 20:33:03 GMT, Abigail <(E-Mail Removed)> wrote:

>(E-Mail Removed)
>((E-Mail Removed)) wrote on MMMMCXLVII September MCMXCIII in <URL:news:(E-Mail Removed)>:
>() I'm trying to find the best way to do something.
>
>This is an example of a really bad question. Who's going to decide what
>the "best way" to do something is? All you do is describe a problem,
>but you don't say *ANYTHING* at all which even remotely hints of what
>will be a good way for you, let alone the best way.
>
>Do you want a solution that is optimized for:
>
> - speed,
> - memory,
> - portability,
> - maintainance,
> - scalability,
> - development time,
> - development costs,
> - ease of deployment,
> - simplicity,
> - operational costs,
> - ...
>
>?
>
>Note that none of these can be answered without knowing a lot of your
>development and operation environments.
>
>If you want to know the "best way" (of anything, not this problem),
>hire a good consultant. Don't go to Usenet for answers.
>
>
>
>Abigail
>--
>$_ = "\nrekcaH lreP rehtona tsuJ"; my $chop; $chop = sub {print chop; $chop};
>$chop -> () -> () -> () -> () -> () -> () -> () -> () -> () -> () -> () -> ()
>-> () -> () -> () -> () -> () -> () -> () -> () -> () -> () -> () -> () -> ()


 
Reply With Quote
 
whansen_at_corporate-image_dot_com@us.com
Guest
Posts: n/a
 
      01-10-2005
I'm starting to think that Shareable may be the way to go unless I
want to go to a server/client connection. I already mentioned that I
have found a way to use Shareable for communication instead of for the
actual list and that sped things up a great deal.

A server would work like this:

Server accepts connections on given port. Connection can be just a
request or a request with a removal. Sever accepts any removals and
removes them from it's list and then replies with the next data item
from it's list for the client to run. Disadvantage - each process must
connect to the server once for each itteration. What happens if the
server is busy with another process. Advantage - Very exact control
over the list. The server can easily make sure the list is run evenly
where the current model uses a randomization so that the processes
won't all be running the same list in synyc.

Now the server process would be very simple. It just maintains the
list and gives out the next element. But I wonder if a single process
could deal well with 150 or more requests per second. I think I'd just
have to program it and give it a try. Can anyone comment on this? What
would happen if the server was not avialble? I'm assuming the
requesting process would just wait for the port to be clear. Again
this would be much less important in the actual use when it slows down
to 5 requests per second.



On Fri, 07 Jan 2005 12:29:16 -0800, Jim Gibson
<(E-Mail Removed)> wrote:

>In article <(E-Mail Removed)>,
><(E-Mail Removed)> wrote:
>
>> I'm trying to find the best way to do something. I've got 50 processes
>> (may have 200 soon) that need to broadcast simple messages to each of
>> them. I tried doing this with sockets, but although I was able to get
>> them to read a socket without halting succesfully I found that when
>> one process reads a socket it removes the message from the socket and
>> the other processes don't see it. My current solution works using
>> IPC::Shareable, but is slow and hogs memory as well as the CPU.
>> Shareable lets you set a variablle that multiple programs can read and
>> write to. In my case they read and write to the list that they all run
>> off.

>
>Sockets implement a data connection between two processes, potentially
>on different systems. Three or more processes cannot share a socket.
>When you write data to a socket that connects two processes, only the
>reading process will get the data. Any other process will be reading
>from its own unique socket and will not receive the data. You would
>have to connect each pair of processes with a unique socket pair. For
>200 processes, that would mean 39800 separate sockets.
>
>A better approach using sockets would implement a single dispatcher
>process that sends the current list to each of the 200 analysis
>processes (or whatever you want to call them). Each analysis process
>would send the updated list to the central dispatching process.
>
>If all of your processes are on the same system, then internet protocol
>(IP) domain sockets impose a network overhead that is unnecessary. You
>are better off using Unix domain sockets (assuming you are on Unix,
>that is).
>
>However, it would seem for this application that shared memory is the
>fastest way to go, and IPC::Shareable gives you access to shared memory
>(disclaimer: I have not used it). If you are having performance
>problems, then can try to optimize your use of shared memory. One
>suggestion would be to make a local copy of the shared memory list and
>iterate over the local copy. This works if you can use an old list when
>the list gets modified for a short period of time. You can check
>periodically (using another shared variable perhaps) if the list has
>been modified and fetch the new version. You can use a simple counter
>to indicate when the list has been updated.
>
>If you can't get IPC::Shareable to work, you can put the list into a
>file, periodically read the file if it is unlocked, and have any
>process that wants to update the file lock it and re-write it.
>
>
>----== Posted via Newsfeeds.Com - Unlimited-Uncensored-Secure Usenet News==----
>http://www.newsfeeds.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
>---= East/West-Coast Server Farms - Total Privacy via Encryption =---


 
Reply With Quote
 
xhoster@gmail.com
Guest
Posts: n/a
 
      01-11-2005
(E-Mail Removed) wrote:

> I actually did something like that, but used a string instead of an
> array. It greately sped things up. Each process merely looks for a
> change to the string and if there are any it then decodes the string
> and modifies its internal list.


When you have your variable tied to IPC::Shareable, merely looking for a
change isn't just "merely". It has a lot of overhead. (Not to mention
potential for corruption if you aren't careful about locking).

>
> >> Basicly each process is iterating over a list (array) and every so
> >> often a process gets a result that means that item no longer needs to
> >> be ran, so it should remove it from it's list and notify the other
> >> processes so that they can remove it from theirs as well.

> >
> >What are the consequences if a process doesn't get the message and runs
> >that task anyway? Is it just a waste of resources, or is it fatal to
> >the whole thing you are trying to do?
> >
> >How many such removal messages do you generate, in relation to the full
> >size of the array to iterate over? If small, it would probably be most
> >efficient to just run even the "removed" tasks and then filter them out
> >in post-processing.

>
> It's not fatal, but it wastes the request.


I'm not sure what the "request" is that you are talking about. That
sounds like you are doing some kind of http or other network processing,
rather than the parallel computational processing in a SMP environment what
I had originally thought you were talking about. If you just have one CPU
and are issuing many slow IOs, maybe you should look at using non-blocking
IO in just one process rather than spawning an extravagant number of
processes.

Anyway, unless someone is charging you per request, a request is not
something that can be wasted. Only the resources associated with it can be
wasted, and you should weigh those resources against the resources that, as
you have discovered, are used by excessive IPC::Shareable (or any other
synchronization method).

> Each request has a chance
> of getting a product for my company. So, concentrating on products
> that are still available and not wasting requests on taken products
> will improve our chances of getting products 10-20% or so.


Let us say that the overhead of extremely fine-grained synchronization
means that you can only perform 50 requests per second, with none of them
wasted. While the lowered overhead of more loose synchronization means you
can do 150 requests per second, with 5 of them wasted? Would it be
preferable to have 50 good requests per second or 145 good requests per
second?

> >Each process could keep their own private version of the array, and
> >only refresh it against the shared version (or against a shared
> >exception list) every now and then. How often it would do this refresh
> >would depend on the cost of the refresh vs. the wasted effort that goes
> >into processing tasks that have been removed since the last refresh.

>
> I have it do it for every request. Durring actual conditions the
> requests slow down to a 10 second response which is all the time in
> the world for this use.


If you already have all the time in the world, why are you worried about
further optimizing it?

Xho

--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
 
Reply With Quote
 
xhoster@gmail.com
Guest
Posts: n/a
 
      01-11-2005
(E-Mail Removed) wrote:
>
> A server would work like this:
>
> Server accepts connections on given port. Connection can be just a
> request or a request with a removal. Sever accepts any removals and
> removes them from it's list and then replies with the next data item
> from it's list for the client to run. Disadvantage - each process must
> connect to the server once for each itteration.


Why would it have to connect once for each iteration? Just connect at
the beginnging, and keep reusing that connection.

> Now the server process would be very simple. It just maintains the
> list and gives out the next element. But I wonder if a single process
> could deal well with 150 or more requests per second. I think I'd just
> have to program it and give it a try. Can anyone comment on this?


My very simple server can process 100 times that much, 1,000,000 processes
per minute. See below. I'm sure the code is lousy in many ways, but it
is just a quick and dirty benchmark.


Xho



#!/usr/bin/perl -w
use strict;
use IO::Select;
use IO::Handle;

my $s=IO::Select->new();
my %s;

foreach (1..50) {
pipe my ($pin,$cout);
pipe my ($cin,$pout);
my $pid = fork(); defined $pid or die;
unless ($pid) { # In the child, interogate parent.
close $pin; close $pout;
select $cout; $|=1;
foreach (1..20000) {
print "giveme!\n";
my $x=scalar <$cin>;
# warn "$$: received $x" if rand()<0.001;
};
exit;
};
close $cout; close $cin;
$pout->autoflush();
$s{$pin}=$pout;
$s->add($pin);
};

my $serial=0;

while ($s->count()) {
my @read=$s->can_read();
foreach (@read) {
my $x=<$_>;
unless (defined $x) { $s->remove($_); next};
die "'$x' ne giveme!" unless $x eq "giveme!\n";
print {$s{$_}} "you get ". $serial++ . "\n";
};
};

--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
 
Reply With Quote
 
whansen_at_corporate-image_dot_com@us.com
Guest
Posts: n/a
 
      01-20-2005

I'm going to go over your quick and dirty server code. I haven't been
able to work on this much recently due to other related issues being
more importatant.

I think my largest concern is how many processes can work well
together with memory being fininate. Each process is maintaing a
telnet session with a mainfraime. Current tests with 50 proceesses
operate at arround 150 requests per second combined. A request is
wasted if it is given for a product that is known to already be
'taken', and yes it would be better to make 145 valid requests per
second wasting 5 requests than to only make 50 valid requests. That's
why I think it's just something that I'll have to expierement with.

IPC::Shareable has a lot of overhead. So a server solution may be able
to decrease memory usage significantly increasing the number of
processes that can be run at the same time.

I hope I haven't ****ed everyone off. Talking about things like this
is very helpful for thinking this through and I value all of your
suggestions.


On 11 Jan 2005 01:52:29 GMT, (E-Mail Removed) wrote:

>(E-Mail Removed) wrote:
>
>> I actually did something like that, but used a string instead of an
>> array. It greately sped things up. Each process merely looks for a
>> change to the string and if there are any it then decodes the string
>> and modifies its internal list.

>
>When you have your variable tied to IPC::Shareable, merely looking for a
>change isn't just "merely". It has a lot of overhead. (Not to mention
>potential for corruption if you aren't careful about locking).
>
>>
>> >> Basicly each process is iterating over a list (array) and every so
>> >> often a process gets a result that means that item no longer needs to
>> >> be ran, so it should remove it from it's list and notify the other
>> >> processes so that they can remove it from theirs as well.
>> >
>> >What are the consequences if a process doesn't get the message and runs
>> >that task anyway? Is it just a waste of resources, or is it fatal to
>> >the whole thing you are trying to do?
>> >
>> >How many such removal messages do you generate, in relation to the full
>> >size of the array to iterate over? If small, it would probably be most
>> >efficient to just run even the "removed" tasks and then filter them out
>> >in post-processing.

>>
>> It's not fatal, but it wastes the request.

>
>I'm not sure what the "request" is that you are talking about. That
>sounds like you are doing some kind of http or other network processing,
>rather than the parallel computational processing in a SMP environment what
>I had originally thought you were talking about. If you just have one CPU
>and are issuing many slow IOs, maybe you should look at using non-blocking
>IO in just one process rather than spawning an extravagant number of
>processes.
>
>Anyway, unless someone is charging you per request, a request is not
>something that can be wasted. Only the resources associated with it can be
>wasted, and you should weigh those resources against the resources that, as
>you have discovered, are used by excessive IPC::Shareable (or any other
>synchronization method).
>
>> Each request has a chance
>> of getting a product for my company. So, concentrating on products
>> that are still available and not wasting requests on taken products
>> will improve our chances of getting products 10-20% or so.

>
>Let us say that the overhead of extremely fine-grained synchronization
>means that you can only perform 50 requests per second, with none of them
>wasted. While the lowered overhead of more loose synchronization means you
>can do 150 requests per second, with 5 of them wasted? Would it be
>preferable to have 50 good requests per second or 145 good requests per
>second?
>
>> >Each process could keep their own private version of the array, and
>> >only refresh it against the shared version (or against a shared
>> >exception list) every now and then. How often it would do this refresh
>> >would depend on the cost of the refresh vs. the wasted effort that goes
>> >into processing tasks that have been removed since the last refresh.

>>
>> I have it do it for every request. Durring actual conditions the
>> requests slow down to a 10 second response which is all the time in
>> the world for this use.

>
>If you already have all the time in the world, why are you worried about
>further optimizing it?
>
>Xho
>
>--
>-------------------- http://NewsReader.Com/ --------------------
>Usenet Newsgroup Service $9.95/Month 30GB


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Looking for an IPC solution Laszlo Nagy Python 10 09-06-2012 10:13 AM
Re: Looking for an IPC solution Gelonida N Python 0 09-06-2012 09:25 AM
Re: Looking for an IPC solution Gelonida N Python 0 09-06-2012 09:23 AM
Re: Looking for an IPC solution Jean-Michel Pichavant Python 0 09-03-2012 08:54 AM
Re: Looking for an IPC solution Antoine Pitrou Python 0 08-31-2012 09:05 PM



Advertisments