Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Devel::SmallProf claims "return 1" needs much time !?

Reply
Thread Tools

Devel::SmallProf claims "return 1" needs much time !?

 
 
whumann
Guest
Posts: n/a
 
      04-30-2008
I'm trying to do some profiling on DBM:eep. I started with
Devel:Prof and Devel:rofile but for some subs I had no idea *why*
they took long, so I tried Devel::SmallProf for additional detail.
Much of what I see makes sense but results like this puzzle me (these
are the subs that do the low-level disc read and write):

count wall tm cpu time line
0 0.00000 0.00000 103:sub print_at {
314000 0.96161 3.67000 104: my $self = shift;
314000 0.88398 3.34000 105: my $loc = shift;
0 0.00000 0.00000 106:
314000 1.51351 3.89000 107: local ($/,$\);
0 0.00000 0.00000 108:
314000 1.07561 3.94000 109: my $fh = $self->{fh};
314000 4.39820 7.83000 110: if ( defined $loc ) {
0 0.00000 0.00000 111: seek( $fh, $loc + $self-
0 0.00000 0.00000 112: }
0 0.00000 0.00000 113:
314000 3.46850 6.32000 114: print( $fh @_ ) or die
"Internal Error
0 0.00000 0.00000 115:
314000 6.65324 9.86000 116: return 1;
0 0.00000 0.00000 117:}
0 0.00000 0.00000 118:
0 0.00000 0.00000 119:sub read_at {
500507 1.43996 6.05000 120: my $self = shift;
500507 1.60207 5.78000 121: my ($loc, $size) = @_;
0 0.00000 0.00000 122:
500507 2.44265 7.01000 123: local ($/,$\);
0 0.00000 0.00000 124:
500507 1.43653 5.88000 125: my $fh = $self->{fh};
500507 7.14111 11.73000 126: if ( defined $loc ) {
0 0.00000 0.00000 127: seek( $fh, $loc + $self-
0 0.00000 0.00000 128: }
0 0.00000 0.00000 129:
500507 1.41859 6.08000 130: my $buffer;
500507 4.36185 8.77000 131: read( $fh, $buffer, $size);
0 0.00000 0.00000 132:
500507 7.96107 12.04000 133: return $buffer;
0 0.00000 0.00000 134:}

Much of the time is spent in "seek", "print" and "read" -- as expected
(although I'm surprised it's only tenths of microseconds per call -- I
suppose disc caches work their wonders...). But even more time is
spent in the two "return" statements, and I don't know why! I thought
that maybe they contain the overhead of function calling/returning but
coudn't verify that in a simple testscript. Is this an artifact of the
perl debugger? Or what else is the cause for this?
Thanks for any help understanding this.

Wolfram
 
Reply With Quote
 
 
 
 
xhoster@gmail.com
Guest
Posts: n/a
 
      04-30-2008
whumann <> wrote:
> I'm trying to do some profiling on DBM:eep. I started with
> Devel:Prof and Devel:rofile but for some subs I had no idea *why*
> they took long, so I tried Devel::SmallProf for additional detail.
> Much of what I see makes sense but results like this puzzle me (these
> are the subs that do the low-level disc read and write):


I often find SmallProf to be unreliable, especially when trying to profile
code portions which are fast on each execution but are executed very often.

>
> count wall tm cpu time line
> 0 0.00000 0.00000 103:sub print_at {
> 314000 0.96161 3.67000 104: my $self = shift;
> 314000 0.88398 3.34000 105: my $loc = shift;
> 0 0.00000 0.00000 106:
> 314000 1.51351 3.89000 107: local ($/,$\);
> 0 0.00000 0.00000 108:
> 314000 1.07561 3.94000 109: my $fh = $self->{fh};
> 314000 4.39820 7.83000 110: if ( defined $loc ) {


This just seems weird. My 3GHz machine does an if defined test 32 times
faster, so unless you have an old computer I would say that this casts
doubt on the entire reliability of the SmallProf output.
....
> 0 0.00000 0.00000 115:
> 314000 6.65324 9.86000 116: return 1;
> 0 0.00000 0.00000 117:}

....
>
> Much of the time is spent in "seek", "print" and "read" -- as expected
> (although I'm surprised it's only tenths of microseconds per call -- I
> suppose disc caches work their wonders...). But even more time is
> spent in the two "return" statements, and I don't know why! I thought
> that maybe they contain the overhead of function calling/returning but
> coudn't verify that in a simple testscript.


Your return is only taking ~50-100% longer than your "if defined". While
both of them are taking absurdly long, that ratio is about what I find in a
simple test script. Is your machine old and slow? Is it perhaps swapping
itself to death during this test, or heavily loaded with other people's
processes?

If you can post the entire harness (provided it is small) you are using to
profile DBM:eep, I'll play with it a bit.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 
Reply With Quote
 
 
 
 
Wolfram Humann
Guest
Posts: n/a
 
      04-30-2008
On Apr 30, 6:38*pm, xhos...@gmail.com wrote:
>
> This just seems weird. *My 3GHz machine does an if defined test 32 times
> faster, so unless you have an old computer I would say that this casts
> doubt on the entire reliability of the SmallProf output.


I think you got somthing wrong here. The profiler runs on the comiled
and optimized code where several source-lines may have become one. If
you look at the "count"-column, you will see that line 110 includes
the time for seek() in the next line. Im my experience, the shortest-
running lines in a script (e.g. "my $self = shift;") will indeed vary
a lot from run to run because of the nulltime-compensation im
SmallProf. The lines we're talking about are fairly stable.

> If you can post the entire harness (provided it is small) you are using to
> profile DBM:eep, I'll play with it a bit.


Every script that imports heavily in DBM:eep will do. Mine looks
like this:

BEGIN
{
$DB:rofile = 0;
%DB:ackages = ( 'DBM:eep::Engine' => 1,
'DBM:eep::Engine::Sector' => 1, 'DBM:eep::File' => 1 );
}

use strict;
use warnings;
use DBM:eep;

my $text = 'this is always the same dummy text';
my $inner = 500;
my $outer = 2;
my $profilefile = "prof.out";

my $db = DBM:eep->new( "deeptest2.db" );

for my $j (1..$outer)
{
my $data;

for my $i (1..$inner)
{
my %hash = map { +"subkey$_" => $text } (12..30);
$data->{"key$i"} = \%hash;
}

$DB:rofile = 1 if $j == $outer;

my $t = time();
$db->import($data);
print "Import duration: ", time() - $t, "\n";
}

But I can't say exactly how $inner and $outer were set when I created
the posted profile. Also, I already did some changes in DBM:eep that
might affect performance.

Wolfram
 
Reply With Quote
 
xhoster@gmail.com
Guest
Posts: n/a
 
      04-30-2008
Wolfram Humann <> wrote:
> On Apr 30, 6:38=A0pm, xhos...@gmail.com wrote:
> >
> > This just seems weird. =A0My 3GHz machine does an if defined test 32
> > times=

>
> > faster, so unless you have an old computer I would say that this casts
> > doubt on the entire reliability of the SmallProf output.

>
> I think you got somthing wrong here. The profiler runs on the comiled
> and optimized code where several source-lines may have become one. If
> you look at the "count"-column, you will see that line 110 includes
> the time for seek() in the next line.


While I can't rule that out, I've never seen that done in this setting.
I've seen the condition of an elsif reported as if it were the condition of
the preceding if, but I've never seen that type of conflation happen
between the if condition and if block (except when the block contents are
on the same line as the condition). And I can't reproduce it with test
cases.

I assumed the behavior of the count column was simply because $loc is never
defined in the use-case you used, and therefore line 111 is never executed.
But I could be wrong.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 
Reply With Quote
 
Wolfram Humann
Guest
Posts: n/a
 
      04-30-2008
On Apr 30, 10:27*pm, xhos...@gmail.com wrote:
>
> I assumed the behavior of the count column was simply because $loc is never
> defined in the use-case you used, and therefore line 111 is never executed..
> But I could be wrong.


Well, I'm also not *that* sure about my claim, I should better check
that

Wolfram
 
Reply With Quote
 
xhoster@gmail.com
Guest
Posts: n/a
 
      04-30-2008
Wolfram Humann <> wrote:
>
> Every script that imports heavily in DBM:eep will do. Mine looks
> like this:
>
> BEGIN
> {
> $DB:rofile = 0;
> %DB:ackages = ( 'DBM:eep::Engine' => 1,
> 'DBM:eep::Engine::Sector' => 1, 'DBM:eep::File' => 1 );
> }


I'm not sure, but I think that by restricting your packages that way, all
the time spent in a non-monitored package will get attributed to the
most recently executed statement which is in one of the monitored packages.
That statement is likely to be a "return".

I tried profiling your program, but the profiled code looked nothing like
yours. I realized I have an old DBM:eep. I installed the current
version in a test directory, and holy cow is it slow compared to the old
version. If it ever finishes, I'll see what the profile looks like.

It looks like DBM:eep is trying to change from a module to tie hashes to
disk with as little differences as possible (behvior-wise) from a regular
Perl hash; and instead turn into a full-fledged ACID database. I think
that that is unfortunate. Perhaps a code fork is in order.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 
Reply With Quote
 
Wolfram Humann
Guest
Posts: n/a
 
      05-01-2008
On 30 Apr., 23:52, xhos...@gmail.com wrote:
> Wolfram Humann <w.c.hum...@arcor.de> wrote:
>
> I'm not sure, but I think that by restricting your packages that way, all
> the time spent in a non-monitored package will get attributed to the
> most recently executed statement which is in one of the monitored packages..
> That statement is likely to be a "return".


I never thought of that but it sounds quite possible. Thanks for the
hint.

> It looks like DBM:eep is trying to change from a module to tie hashes to
> disk with as little differences as possible (behvior-wise) from a regular
> Perl hash; and instead turn into a full-fledged ACID database. *I think
> that that is unfortunate. *Perhaps a code fork is in order.


I rather think that during the major rework to version 1.000 a few
things got introduced that make the module unnecessary slow. I posted
here http://groups.google.com/group/DBM-D...e2ae30130a49f7
what I think is one major problem. I would appreciate comments if you
have the time to look into that.

If my profiling results are worth anything, code like "sub engine
{ $_[0]{engine} }" (in Engine.pm) is also a bad idea because it's
called so often. And then "$e = $self->engine" is just two characters
less than "$e = $self->{engine}" and probably much slower. But I still
need to ask Rob Kinyon if there are any reasons I don't know for these
subs.

Wolfram
 
Reply With Quote
 
salva
Guest
Posts: n/a
 
      05-02-2008
On Apr 30, 2:51 pm, whumann <w.c.hum...@arcor.de> wrote:

> Much of the time is spent in "seek", "print" and "read" -- as expected
> (although I'm surprised it's only tenths of microseconds per call -- I
> suppose disc caches work their wonders...). But even more time is
> spent in the two "return" statements, and I don't know why! I thought
> that maybe they contain the overhead of function calling/returning but
> coudn't verify that in a simple testscript. Is this an artifact of the
> perl debugger? Or what else is the cause for this?
> Thanks for any help understanding this.


Hi, I am the current maintainer of Devel::SmallProf and I would be
grateful if you could send to me a bug report for that, including the
details about how to exactly reproduce the problem

Cheers,

- Salva
 
Reply With Quote
 
xhoster@gmail.com
Guest
Posts: n/a
 
      05-02-2008
salva <> wrote:
> On Apr 30, 2:51 pm, whumann <w.c.hum...@arcor.de> wrote:
>
> > Much of the time is spent in "seek", "print" and "read" -- as expected
> > (although I'm surprised it's only tenths of microseconds per call -- I
> > suppose disc caches work their wonders...). But even more time is
> > spent in the two "return" statements, and I don't know why! I thought
> > that maybe they contain the overhead of function calling/returning but
> > coudn't verify that in a simple testscript. Is this an artifact of the
> > perl debugger? Or what else is the cause for this?
> > Thanks for any help understanding this.

>
> Hi, I am the current maintainer of Devel::SmallProf and I would be
> grateful if you could send to me a bug report for that, including the
> details about how to exactly reproduce the problem
>
> Cheers,


Hi Salva,

Here is an example:

$ cat foo.pl
use strict;
use warnings;
use constant foobar =>1;

BEGIN
{
%DB:ackages = ( 'package' => 1 );
}

foreach (1..5) {
package::bar();
foo();
};

exit;

sub foo {
sleep 1;
return 1;
};

package package;
sub bar {
sleep 1;
return 1;
};
__END__

All the time spent in the (unmonitored) foo is attributed
to the last statement in (monitored) bar.

0 0.00000 0.00000 16:
5 0.00000 0.00000 17:sub foo {
0 0.00000 0.00000 18: sleep 1;
0 0.00000 0.00000 19: return 1;
0 0.00000 0.00000 20:};
0 0.00000 0.00000 21:
0 0.00000 0.00000 22ackage package;
5 0.00000 0.00000 23:sub bar {
5 5.01095 0.00000 24: sleep 1;
5 5.00817 0.00000 25: return 1;
0 0.00000 0.00000 26:};
0 0.00000 0.00000 27:

Posted and mailed.

Cheers,

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 
Reply With Quote
 
xhoster@gmail.com
Guest
Posts: n/a
 
      05-02-2008
Wolfram Humann <> wrote:
> On 30 Apr., 23:52, xhos...@gmail.com wrote:
> > Wolfram Humann <w.c.hum...@arcor.de> wrote:
> >
> > I'm not sure, but I think that by restricting your packages that way,
> > all the time spent in a non-monitored package will get attributed to
> > the most recently executed statement which is in one of the monitored
> > packages=

> .
> > That statement is likely to be a "return".

>
> I never thought of that but it sounds quite possible. Thanks for the
> hint.


And I've verified that this is indeed the case.

>
> > It looks like DBM:eep is trying to change from a module to tie hashes
> > to disk with as little differences as possible (behvior-wise) from a
> > regular Perl hash; and instead turn into a full-fledged ACID database.
> > I think that that is unfortunate. Perhaps a code fork is in
> > order.

>
> I rather think that during the major rework to version 1.000 a few
> things got introduced that make the module unnecessary slow. I posted
> here
> http://groups.google.com/group/DBM-D...ad/9ae2ae3013=
> 0a49f7 what I think is one major problem. I would appreciate comments if
> you have the time to look into that.


It looks like your proposed change should work, unless $orig_loc or
$old_loc are objects with operator overloading, which doesn't seem to be
the case. (But I do have to wonder why it was done the way it was in the
first place. It is certainly an oddly contorted way of doing things if it
wasn't done like that for a reason.)

Anyway, I ran the self-tests that come with DBM-Deep with your proposed
change in place and it passes all tests, so I'd call it good. And it does
make it much faster, but as you note, it still isn't all that speedy.
Reverting back to DBM version 0.983 is 20 times faster than even your
improved version.

>
> If my profiling results are worth anything, code like "sub engine
> { $_[0]{engine} }" (in Engine.pm) is also a bad idea because it's
> called so often. And then "$e =3D $self->engine" is just two characters
> less than "$e =3D $self->{engine}" and probably much slower. But I still
> need to ask Rob Kinyon if there are any reasons I don't know for these
> subs.


That is classic OO programming to support inheritance. Some subclass might
want to override engine(), which of course it can't do if
"$e = $self->{engine}" is used instead of "$e = $self->engine"

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
DVD Soon or much much later? anthony DVD Video 10 07-08-2005 07:13 PM
time.strftime in 2.4.1 claims data out of range when not Sheila King Python 5 04-22-2005 03:39 PM
Simulation questions...how much is too much? =?Utf-8?B?VGlwcHk=?= Microsoft Certification 0 04-16-2005 04:47 AM
CPU Heat--how much is too much? PowerPost2000 Computer Support 4 12-22-2003 12:40 AM
paranoia... much too much adcl Computer Support 14 11-08-2003 05:18 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57