Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > IPC:Shareable

Reply
Thread Tools

IPC:Shareable

 
 
Snorik
Guest
Posts: n/a
 
      09-18-2008
Hello everyone,

I am trying to speed up a few perl scripts by forking them.
Unfortunately, I need to pass back to the parent.
I am using named pipes at other places, but this time, I wanted to use
shared memory (this being on *nix).

The first case is basically traversing a HUGE directory tree, looking
for certain files and returning them.

The idea is to fork finds from a specific point of the directory tree,
gather all the files in an array for each child process and then store
that array as reference as value of the hash.


For this, I have done:

sub get_fbas_for_rg
{
my $rg = shift;
my @children;
use IPC::Shareable;
use Data:umper;
use constant MYGLUE => 'Test';
my %fba_hash;
my $handle = tie (%fba_hash, IPC::Shareable, MYGLUE, {create =>
1, mode => 0666}) or die "cannot tie to shared memory: $! \n";

my @ggs = qx (ls /default/main/www/$rg | grep -v STAGING | grep -
v WORKAREA | grep -v EDITION);

foreach my $gg (@ggs)
{
chomp $gg;
my $gg_fba = $gg."_FBAs";

my $pid = fork();

if ($pid)
{
push(@children, $pid);
}
elsif ($pid == 0)
{
my @fbas = (qx (/usr/bin/find /default/main/
www/$rg/$gg/WORKAREA/workarea/$gg_fba -type f ));
$handle->shlock();
push (@{$fba_hash{$gg}}, @fbas);
$handle->shunlock();
exit (0);
}
else
{
print STDERR "\nERROR: fork failed: $!\n";
}
}

foreach (@children)
{
waitpid($_, 0);
}
return %fba_hash;
}

Now, If I call this function, it seems to work fine, only the hash
values contain only the scalars of the array, at least that is what
Data Dumper tells me:

$VAR1 = {
'dir1' => 3858,
'dir2' => 2394,
'dir3' => 2075
};


This is what I do in the script:

my %fbas = TestPackage::get_fbas_for_rg("test");

print Dumper \%fbas;

foreach my $gg (keys %fbas)
{
print $gg."\n";
foreach my $fba (sort @{$fbas{$gg}})
{
print $fba."\n";
}
}


The foreach loop does not return anything (understandable since the
hash value only contains the scalar of the array).
Again, my question is: how do I manage to receive the actual array in
the calling script instead of just a hash containing my designated
keys and the sizes of the arrays as values?

I would be very grateful for some help.
 
Reply With Quote
 
 
 
 
Ben Morrow
Guest
Posts: n/a
 
      09-18-2008

Quoth Snorik <(E-Mail Removed)>:
> Hello everyone,
>
> I am trying to speed up a few perl scripts by forking them.
> Unfortunately, I need to pass back to the parent.
> I am using named pipes at other places, but this time, I wanted to use
> shared memory (this being on *nix).
>
> The first case is basically traversing a HUGE directory tree, looking
> for certain files and returning them.
>
> The idea is to fork finds from a specific point of the directory tree,
> gather all the files in an array for each child process and then store
> that array as reference as value of the hash.
>
>
> For this, I have done:
>
> sub get_fbas_for_rg
> {
> my $rg = shift;
> my @children;
> use IPC::Shareable;
> use Data:umper;


It's best not to 'use' modules inside a sub (except for lowercase
pragmata which have lexical effect). It gives the false impression that
the exporter subs are only available in that sub.

> use constant MYGLUE => 'Test';
> my %fba_hash;
> my $handle = tie (%fba_hash, IPC::Shareable, MYGLUE, {create =>
> 1, mode => 0666}) or die "cannot tie to shared memory: $! \n";
>
> my @ggs = qx (ls /default/main/www/$rg | grep -v STAGING | grep -
> v WORKAREA | grep -v EDITION);


use File::Slurp qw/read_dir/;

my @ggs =
grep !/EDITION/,
grep !/WORKAREA/,
grep !/STAGING/,
read_dir "/default/main/www/$rg";

or

grep !/EDITION|WORKAREA|STAGING/,

of course.

> my @fbas = (qx (/usr/bin/find /default/main/
> www/$rg/$gg/WORKAREA/workarea/$gg_fba -type f ));


I would use File::Find::Rule for this.

my @fbas = File::Find::Rule->file
->in("/default/main/www/$rg/$gg/WORKAREA/workarea/$gg_fba");

> $handle->shlock();
> push (@{$fba_hash{$gg}}, @fbas);


You cannot assign a ref to an IPC::Shareable tied hash. The other
process has no way of following that ref: it refers to data structures
that aren't in shared memory. I would suggest using Storable:

use Storable qw/freeze/;

$fba_hash{$gg} = freeze \@fbas;

and then retrieve it with

use Storable qw/thaw/;

my @fbas = @{ thaw $fba_hash{$gg} };

Ben

--
Although few may originate a policy, we are all able to judge it.
Pericles of Athens, c.430 B.C.
http://www.velocityreviews.com/forums/(E-Mail Removed)
 
Reply With Quote
 
 
 
 
Ted Zlatanov
Guest
Posts: n/a
 
      09-18-2008
On Thu, 18 Sep 2008 06:55:15 -0700 (PDT) Snorik <(E-Mail Removed)> wrote:

S> use IPC::Shareable;

Try IPC::ShareLite or even Tie::ShareLite (easiest, hash interface).
They work better for me.

S> Now, If I call this function, it seems to work fine, only the hash
S> values contain only the scalars of the array, at least that is what
S> Data Dumper tells me:

S> $VAR1 = {
S> 'dir1' => 3858,
S> 'dir2' => 2394,
S> 'dir3' => 2075
S> };

You're assigning @array to the hash value; the value can only be a
scalar so you get the size of the array instead of its contents.

See the Tie::ShareLite docs, especially section 'REFERENCES,' for a
better solution.

Ted
 
Reply With Quote
 
Snorik
Guest
Posts: n/a
 
      09-19-2008
On Sep 18, 5:18*pm, Ben Morrow <(E-Mail Removed)> wrote:
> Quoth Snorik <(E-Mail Removed)>:
>
>
>
> > Hello everyone,

>
> > I am trying to speed up a few perl scripts by forking them.
> > Unfortunately, I need to pass back to the parent.
> > I am using named pipes at other places, but this time, I wanted to use
> > shared memory (this being on *nix).

>
> > The first case is basically traversing a HUGE directory tree, looking
> > for certain files and returning them.

>
> > The idea is to fork finds from a specific point of the directory tree,
> > gather all the files in an array for each child process and then store
> > that array as reference as value of the hash.

>
> > For this, I have done:

>
> > sub get_fbas_for_rg
> > * {
> > * * * * *my $rg = shift;
> > * * * * * my @children;
> > * * * * * use IPC::Shareable;
> > * * * * * use Data:umper;

>
> It's best not to 'use' modules inside a sub (except for lowercase
> pragmata which have lexical effect). It gives the false impression that
> the exporter subs are only available in that sub.


Ok, that is a useful remark, I will keep that in mind.

> > * * * * *use constant MYGLUE => 'Test';
> > * * * * * my %fba_hash;
> > * * * *my $handle = tie (%fba_hash, IPC::Shareable, MYGLUE, {create =>
> > 1, mode => 0666}) or die "cannot tie to shared memory: $! \n";

>
> > * * *my @ggs = qx (ls /default/main/www/$rg | grep -v STAGING |grep -
> > v WORKAREA | grep -v EDITION);

>
> * * use File::Slurp qw/read_dir/;


*snip useage of File::Slurp*

Thanks for pointing that out, that module really helps working a lot!
I never knew that existed.

>
> > * * * * * * * * * * * * * my @fbas = (qx (/usr/bin/find /default/main/
> > www/$rg/$gg/WORKAREA/workarea/$gg_fba -type f ));

>
> I would use File::Find::Rule for this.
>
> * * my @fbas = File::Find::Rule->file
> * * * * ->in("/default/main/www/$rg/$gg/WORKAREA/workarea/$gg_fba");


Again, thanks for pointing that out - This is so much more elegant
than the normal File::Find way.

> > * * * * * * * * * * * * $handle->shlock();
> > * * * * * * * * * * * * push (@{$fba_hash{$gg}}, @fbas);

>
> You cannot assign a ref to an IPC::Shareable tied hash. The other
> process has no way of following that ref: it refers to data structures
> that aren't in shared memory. I would suggest using Storable:


OK, so if I may rephrase in order to check whether I have actually
understood:
All that can be tied in that hash is the scalar of the array (its
size), I cannot use it to follow a ref to the actual array.

> * * use Storable qw/freeze/;
>
> * * $fba_hash{$gg} = freeze \@fbas;
>
> and then retrieve it with
>
> * * use Storable qw/thaw/;
>
> * * my @fbas = @{ thaw $fba_hash{$gg} };


I have a question concerning this (I just had a look at the Storable
documentation, but this does not really clear things up):

So Storable persists (and of course serializes) any datastructure;that
means I can store the hash to disk (or memory, hopefully memory?).
How can I retrieve this in the calling script, as this sub is going to
live in a module itself? I must admit, this is my first attempt at IPC
myself.

I would be very grateful for an answer.

Snorik

 
Reply With Quote
 
Ben Morrow
Guest
Posts: n/a
 
      09-19-2008

Quoth Snorik <(E-Mail Removed)>:
> On Sep 18, 5:18*pm, Ben Morrow <(E-Mail Removed)> wrote:
> >
> > You cannot assign a ref to an IPC::Shareable tied hash. The other
> > process has no way of following that ref: it refers to data structures
> > that aren't in shared memory. I would suggest using Storable:

>
> OK, so if I may rephrase in order to check whether I have actually
> understood:
> All that can be tied in that hash is the scalar of the array (its
> size), I cannot use it to follow a ref to the actual array.


Yes. I don't entirely understand why the value stored was
scalar(@array): I would have expected it to be the stringification of
the ref. I guess it's to do with how IPC::Shareable interprets its
arguments.

>
> > * * use Storable qw/freeze/;
> >
> > * * $fba_hash{$gg} = freeze \@fbas;
> >
> > and then retrieve it with
> >
> > * * use Storable qw/thaw/;
> >
> > * * my @fbas = @{ thaw $fba_hash{$gg} };

>
> I have a question concerning this (I just had a look at the Storable
> documentation, but this does not really clear things up):
>
> So Storable persists (and of course serializes) any datastructure;that
> means I can store the hash to disk (or memory, hopefully memory?).


Yes. You use store/retrieve to save to and load from disk; you use
freeze/thaw to save to and load from memory.

> How can I retrieve this in the calling script, as this sub is going to
> live in a module itself? I must admit, this is my first attempt at IPC
> myself.


If you store it with 'freeze', you get it out again with 'thaw'.

Ben

--
I touch the fire and it freezes me, [(E-Mail Removed)]
I look into it and it's black.
Why can't I feel? My skin should crack and peel---
I want the fire back... BtVS, 'Once More With Feeling'
 
Reply With Quote
 
Snorik
Guest
Posts: n/a
 
      09-19-2008
On 19 Sep., 18:21, Ben Morrow <(E-Mail Removed)> wrote:

> > So Storable persists (and of course serializes) any datastructure;that
> > means I can store the hash to disk (or memory, hopefully memory?).

>
> Yes. You use store/retrieve to save to and load from disk; you use
> freeze/thaw to save to and load from memory.


Ok, thanks for that, I will read the documentation and actually try to
understand it.

> > How can I retrieve this in the calling script, as this sub is going to
> > live in a module itself? I must admit, this is my first attempt at IPC
> > myself.

>
> If you store it with 'freeze', you get it out again with 'thaw'.


Yes, I have understood that, but if I freeze a hash in one script, how
can I thaw it in the other script? I do not have the reference?
I tried to use a tied variable for that, figuring that this should
work this time, but this failed unfortunately.

 
Reply With Quote
 
xhoster@gmail.com
Guest
Posts: n/a
 
      09-19-2008
Snorik <(E-Mail Removed)> wrote:
> On 19 Sep., 18:21, Ben Morrow <(E-Mail Removed)> wrote:
>
> >
> > If you store it with 'freeze', you get it out again with 'thaw'.

>
> Yes, I have understood that, but if I freeze a hash in one script, how
> can I thaw it in the other script?


When you freeze, you get a serialized data, which is just a string. You
pass that string to the other script using shared memory (or pipes).

> I do not have the reference?


That is what thaw does. It makes a reference again out of the serialized
data. Obviously it isn't the same reference, but deep copy of the
referenced data.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 
Reply With Quote
 
Snorik
Guest
Posts: n/a
 
      09-22-2008
On Sep 18, 5:19*pm, Ted Zlatanov <(E-Mail Removed)> wrote:
> On Thu, 18 Sep 2008 06:55:15 -0700 (PDT) Snorik <(E-Mail Removed)> wrote:
>
> S> * * * * * use IPC::Shareable;
>
> Try IPC::ShareLite or even Tie::ShareLite (easiest, hash interface).
> They work better for me.
>
> S> Now, If I call this function, it seems to work fine, only the hash
> S> values contain only the scalars of the array, at least that is what
> S> Data Dumper tells me:
>
> S> $VAR1 = {
> S> * * * * * 'dir1' => 3858,
> S> * * * * * 'dir2' => 2394,
> S> * * * * * 'dir3' => 2075
> S> * * * * };
>
> You're assigning @array to the hash value; the value can only be a
> scalar so you get the size of the array instead of its contents.
>
> See the Tie::ShareLite docs, especially section 'REFERENCES,' for a
> better solution.


Hello,

ok, this works - however, it is very slow if I use a normal hash (the
hash has about 10000 entries).
I have yet to get it to work with hash references.

Thanks for your help!
 
Reply With Quote
 
Ted Zlatanov
Guest
Posts: n/a
 
      09-22-2008
On Mon, 22 Sep 2008 05:30:53 -0700 (PDT) Snorik <(E-Mail Removed)> wrote:

>> See the Tie::ShareLite docs, especially section 'REFERENCES,' for a
>> better solution.


S> ok, this works - however, it is very slow if I use a normal hash (the
S> hash has about 10000 entries).
S> I have yet to get it to work with hash references.

I've only used it with hashes of up to 1000 entries, but I'm surprised
it's very slow. Can you show your code so we can see if the problem is
in the module or in your code?

Ted
 
Reply With Quote
 
Snorik
Guest
Posts: n/a
 
      09-22-2008
On Sep 22, 4:11*pm, Ted Zlatanov <(E-Mail Removed)> wrote:
> On Mon, 22 Sep 2008 05:30:53 -0700 (PDT) Snorik <(E-Mail Removed)> wrote:
>
> >> See the Tie::ShareLite docs, especially section 'REFERENCES,' for a
> >> better solution.

>
> S> ok, this works - however, it is very slow if I use a normal hash (the
> S> hash has about 10000 entries).
> S> I have yet to get it to work with hash references.
>
> I've only used it with hashes of up to 1000 entries, but I'm surprised
> it's very slow. *Can you show your code so we can see if the problem is
> in the module or in your code?


Hello,

okay, it appears even a Solaris system needs a reboot some time - now
it is pretty fast: 19 seconds for 16k entries (and that includes
Data:umper printing out the hash and forking about 30 child
processes).

================================
I do the following now:

if ($pid)
{
push(@children, $pid);
}
elsif ($pid == 0)
{
use File::Find::Rule;
my @fbas = File::Find::Rule->file->in("/default/main/www/$rg/$gg/
WORKAREA/workarea/$gg_fba");
$ipc->lock(LOCK_EX);
$shared{$gg} = \@fbas;
$ipc->unlock();
exit (0);
}
else
{
print STDERR "\nERROR: fork failed: $!\n";
}
}

foreach (@children)
{
waitpid($_, 0);
}
return %shared;

And in the calling script:

my %fba_ref = Package::get_fbas_for_rg("dir1");
print Dumper \%fba_ref;

=================================
I am wondering: Do I even need the locks for the hash reference, does
this lock the entire hash, or solely the key in question? Does this
still go faster?

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off




Advertisments