Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Speed of reading some MB of data using qx(...)

Reply
Thread Tools

Speed of reading some MB of data using qx(...)

 
 
Ilya Zakharevich
Guest
Posts: n/a
 
      07-28-2010
On 2010-07-27, Ben Morrow <(E-Mail Removed)> wrote:
>> By the way: I estimate the time required for each realloc to be around
>> 3 ms for 10e6 chars, growing linearly with the amount of data -- I
>> consider that a fair speed and no reason to blame win32.

>
> If you timed perl's own realloc, you would (I believe) find it does much
> better than this. AFAICS from the code, it has a fixed set of block
> sizes it actually allocates. Enlarging a block such that it doesn't go
> over the block size actually allocated is *free*, not even linear in the
> size of the block, since all it does is adjust the end marker. This is
> the logic you are expecting sv_grow to implement, but perl has decided
> that this is the allocator's responsibility.


Wrong analysis. *In this particular situation* one would find Perl's
realloc() as slow as M$'s one - but it would be called MUCH more
rarely. With my malloc(), I implemented an extra call:
malloced_size(). So on the 1st iteration, things would go similarly
to "their-malloc", and realloc() would be called - and your analysis
would apply.

But on the 2nd iteration of appending, Perl would discover the new
*really* allocated size for the string, and would update the buffer
size stored in SV. After this, for many iteractions the realloc() is
out-of-the-loop altogether.

BTW, IIRC, my malloc() does not store the "end marker" at all (unless
a debugging version is used). (The bucket size of short strings
depends only on which page of memory they belong to - well, actually,
half-page.)

Hope this helps,
Ilya
 
Reply With Quote
 
 
 
 
Ilya Zakharevich
Guest
Posts: n/a
 
      07-28-2010
On 2010-07-27, Peter J. Holzer <(E-Mail Removed)> wrote:
> I'm pretty sure that GNU malloc doesn't round up to powers of two or
> something like that. However, the performance difference between GNU
> malloc and Perl malloc is rather small:


Yeah, right. Not more than 3 times.

> perl 5.12.1, default config, EGLIBC 2.11.2-2:
>
> 1E5 chars + 1E4 x 1E2 chars: 3.9 ms
> 1E6 chars + 1E4 x 1E2 chars: 3.8 ms
> 1E7 chars + 1E4 x 1E2 chars: 4.4 ms
>
> 1E7 chars + 1E5 x 1E1 chars: 28.4 ms
> 1E7 chars + 1E4 x 1E2 chars: 4.5 ms
> 1E7 chars + 1E3 x 1E3 chars: 2.6 ms
> 1E7 chars + 1E2 x 1E4 chars: 2.0 ms
> 1E7 chars + 1E1 x 1E5 chars: 1.9 ms
>
> 1E7 chars (pre-extend to 2E7) + 1E4 x 1E2 chars: 2.0 ms
> 1E7 (1E5 x 1E2 chars) array + 1E4 x 1E2 chars : 4.4 ms
>
> perl 5.12.1, usemymalloc=y, EGLIBC 2.11.2-2:
>
> 1E5 chars + 1E4 x 1E2 chars: 2.6 ms
> 1E6 chars + 1E4 x 1E2 chars: 3.8 ms
> 1E7 chars + 1E4 x 1E2 chars: 2.5 ms
>
> 1E7 chars + 1E5 x 1E1 chars: 18.8 ms
> 1E7 chars + 1E4 x 1E2 chars: 2.5 ms
> 1E7 chars + 1E3 x 1E3 chars: 0.9 ms
> 1E7 chars + 1E2 x 1E4 chars: 0.9 ms
> 1E7 chars + 1E1 x 1E5 chars: 1.1 ms
>
> 1E7 chars (pre-extend to 2E7) + 1E4 x 1E2 chars: 1.9 ms
> 1E7 (1E5 x 1E2 chars) array + 1E4 x 1E2 chars : 3.4 ms


Yours,
Ilya
 
Reply With Quote
 
 
 
 
Peter J. Holzer
Guest
Posts: n/a
 
      07-28-2010
On 2010-07-28 01:02, Ilya Zakharevich <(E-Mail Removed)> wrote:
> On 2010-07-27, Peter J. Holzer <(E-Mail Removed)> wrote:
>> I'm pretty sure that GNU malloc doesn't round up to powers of two or
>> something like that. However, the performance difference between GNU
>> malloc and Perl malloc is rather small:

>
> Yeah, right. Not more than 3 times.


Compared to the factor of about 1000 that Wolfram and Ben observed this
is small. More importantly, growing a string to length n in constant
increments appears to be an O(n) operation with both your malloc and GNU
malloc, but O(nē) with Win32 malloc and BSD malloc.

Your malloc is certainly faster than GNU malloc. OTOH, AFAICS from
quickly scanning the comments in malloc.c, it always pads allocation to
the next power of two (minus 4) and it doesn't return unused memory back
to the OS. So it might use quite a bit more memory. Which of these
effects is more important depends on the program and is hard to
determine analytically. Which means that I should benchmark some of my
more performance-critical scripts with both allocators .

hp

 
Reply With Quote
 
Wolfram Humann
Guest
Posts: n/a
 
      07-28-2010
On Jul 28, 2:50*am, Ilya Zakharevich <(E-Mail Removed)> wrote:
> On 2010-07-27, Wolfram Humann <(E-Mail Removed)> wrote:
>
> > sv.c). Perl_sv_grow then needs to decide if the string's memory is
> > already sufficient or really needs to grow. In the latter case,
> > safesysrealloc -> Perl_safesysrealloc -> realloc is called. The
> > interesting point is: how much memory does it request? The answer is:

>
> > newlen += 10 * (newlen - SvCUR(sv)); /* avoid copy each time */

>
> > I.e. it requests 10 times as much memory as is required for the
> > current append operation. So when I loop 10000 times and each time
> > append 100 chars to an initial string size of 10 million, the memory
> > grows from 10.000e6 to 10.001e6 to 10.002e6 and so on 1000 times till
> > it ends at 11.000e6.

>
> Good l*rd!
>
> The current algorithm is optimized to work in tandem with "my"
> malloc(), which would round up to a certain geometric progression
> anyway. *So if one use as different malloc()s, one should better use
>
> * newlen += (newlen >> 4) + 10; /* avoid copy each time */


I finally managed to compile my own win32 perl. (Actually it was quite
easy once I refrained from doing mistakes so stupid I do not dare to
talk about them...)
Now I could modify Perl_sv_grow() and insert debugging prints and I
found good and bad news.

The bad news: Looks like I was *overly optimistic* (LOL!) concerning
the efficiency of the current string memory allocation on win32. The
"newlen += 10 * (newlen - SvCUR(sv))" line is only executed if
SvOOK(sv) -- i.e. in most cases it is *not* executed. Therefore win32
system realloc is not called every tenth string-append operation but
*every* time something gets appended to a string.

The good news: A single additional line of code makes win32 perl
100...1000 times faster!
(for code that appends to strings very frequently)

I went with Ilya's proposal but inserted the line a little further
down, just after
if (newlen > SvLEN(sv)) { /* need more room? */

So now we have:
if (newlen > SvLEN(sv)) { /* need more room? */
newlen += (newlen >> 2) + 10;
#ifndef Perl_safesysmalloc_size
newlen = PERL_STRLEN_ROUNDUP(newlen);
#endif
if (SvLEN(sv) && s) {
s = (char*)saferealloc(s, newlen);
}

The remaining question is by what ratio a string's memory should grow.
I tried several values from (newlen >> 0) to (newlen >> 6) for the
best compromise between execution time and memory usage and my
personal favorite is (newlen >> 2). What do others here think? At the
end of this post I will attach the results for my benchmark script
starting with Cygwin Perl followed by several versions of (newlen >>
x) and finally the unpatched Strawberry Perl. These reports now also
include memory footprint info (courtesy of pslist from the
Sysinternals suite). I also went back to my original task of reading a
12 MB postscript file using qx(cat ...) and in some cases I also
report times for that -- here Cygwin (70 ms) still beats my modified
perl (210 ms), but that's still waaaaay better than the original 18000
ms

I will also report to p5p.

Wolfram

################################################## #########

c:\cygwin\bin\perl d:\exe\LongStrings.pl

1E5 chars + 1E4 x 1E2 chars: 1.5 ms
1E6 chars + 1E4 x 1E2 chars: 2.3 ms
1E7 chars + 1E4 x 1E2 chars: 1.5 ms

1E7 chars + 1E5 x 1E1 chars: 12.2 ms
1E7 chars + 1E4 x 1E2 chars: 1.4 ms
1E7 chars + 1E3 x 1E3 chars: 0.6 ms
1E7 chars + 1E2 x 1E4 chars: 0.6 ms
1E7 chars + 1E1 x 1E5 chars: 0.8 ms

1E7 chars (pre-extend to 2E7) + 1E4 x 1E2 chars: 1.2 ms
1E7 (1E5 x 1E2 chars) array + 1E4 x 1E2 chars : 5.9 ms

Private MB: 326.5
Peak Private MB: 326.5

--------------

qx(cat postscriptfile.ps): 68.7 ms
Private MB: 38.5
Peak Private MB: 38.5

################################################## #########

newlen += (newlen >> 0) + 10;

C:\wh_fast_perl\bin\perl d:\exe\LongStrings.pl

1E5 chars + 1E4 x 1E2 chars: 2.2 ms
1E6 chars + 1E4 x 1E2 chars: 1.4 ms
1E7 chars + 1E4 x 1E2 chars: 1.4 ms

1E7 chars + 1E5 x 1E1 chars: 10.4 ms
1E7 chars + 1E4 x 1E2 chars: 1.4 ms
1E7 chars + 1E3 x 1E3 chars: 0.6 ms
1E7 chars + 1E2 x 1E4 chars: 0.6 ms
1E7 chars + 1E1 x 1E5 chars: 0.6 ms

1E7 chars (pre-extend to 2E7) + 1E4 x 1E2 chars: 1.2 ms
1E7 (1E5 x 1E2 chars) array + 1E4 x 1E2 chars : 6.0 ms

Private MB: 378.3
Peak Private MB: 418.0

--------------

qx(cat postscriptfile.ps): 181.2 ms
Private MB: 25.1
Peak Private MB: 40.3

################################################## #########

newlen += (newlen >> 1) + 10;

C:\wh_fast_perl\bin\perl d:\exe\LongStrings.pl

1E5 chars + 1E4 x 1E2 chars: 2.5 ms
1E6 chars + 1E4 x 1E2 chars: 2.4 ms
1E7 chars + 1E4 x 1E2 chars: 1.3 ms

1E7 chars + 1E5 x 1E1 chars: 9.6 ms
1E7 chars + 1E4 x 1E2 chars: 1.3 ms
1E7 chars + 1E3 x 1E3 chars: 0.7 ms
1E7 chars + 1E2 x 1E4 chars: 0.7 ms
1E7 chars + 1E1 x 1E5 chars: 0.6 ms

1E7 chars (pre-extend to 2E7) + 1E4 x 1E2 chars: 1.1 ms
1E7 (1E5 x 1E2 chars) array + 1E4 x 1E2 chars : 6.4 ms

Private MB: 290.2
Peak Private MB: 319.5

################################################## #########

newlen += (newlen >> 2) + 10;

C:\wh_fast_perl\bin\perl d:\exe\LongStrings.pl

1E5 chars + 1E4 x 1E2 chars: 9.2 ms
1E6 chars + 1E4 x 1E2 chars: 5.3 ms
1E7 chars + 1E4 x 1E2 chars: 1.5 ms

1E7 chars + 1E5 x 1E1 chars: 9.9 ms
1E7 chars + 1E4 x 1E2 chars: 1.4 ms
1E7 chars + 1E3 x 1E3 chars: 0.5 ms
1E7 chars + 1E2 x 1E4 chars: 0.5 ms
1E7 chars + 1E1 x 1E5 chars: 0.6 ms

1E7 chars (pre-extend to 2E7) + 1E4 x 1E2 chars: 1.1 ms
1E7 (1E5 x 1E2 chars) array + 1E4 x 1E2 chars : 5.4 ms

Private MB: 244.9
Peak Private MB: 270.1

--------------

qx(cat postscriptfile.ps): 209.8 ms
Private MB: 16.2
Peak Private MB: 29.0

################################################## #########

newlen += (newlen >> 3) + 10;

C:\wh_fast_perl\bin\perl d:\exe\LongStrings.pl

1E5 chars + 1E4 x 1E2 chars: 12.1 ms
1E6 chars + 1E4 x 1E2 chars: 6.9 ms
1E7 chars + 1E4 x 1E2 chars: 1.4 ms

1E7 chars + 1E5 x 1E1 chars: 10.3 ms
1E7 chars + 1E4 x 1E2 chars: 1.4 ms
1E7 chars + 1E3 x 1E3 chars: 0.5 ms
1E7 chars + 1E2 x 1E4 chars: 0.5 ms
1E7 chars + 1E1 x 1E5 chars: 0.5 ms

1E7 chars (pre-extend to 2E7) + 1E4 x 1E2 chars: 1.1 ms
1E7 (1E5 x 1E2 chars) array + 1E4 x 1E2 chars : 5.6 ms

Private MB: 221.9
Peak Private MB: 244.3

################################################## #########

newlen += (newlen >> 4) + 10;

C:\wh_fast_perl\bin\perl d:\exe\LongStrings.pl

1E5 chars + 1E4 x 1E2 chars: 17.0 ms
1E6 chars + 1E4 x 1E2 chars: 13.8 ms
1E7 chars + 1E4 x 1E2 chars: 11.2 ms

1E7 chars + 1E5 x 1E1 chars: 19.4 ms
1E7 chars + 1E4 x 1E2 chars: 10.1 ms
1E7 chars + 1E3 x 1E3 chars: 10.9 ms
1E7 chars + 1E2 x 1E4 chars: 11.1 ms
1E7 chars + 1E1 x 1E5 chars: 11.0 ms

1E7 chars (pre-extend to 2E7) + 1E4 x 1E2 chars: 1.2 ms
1E7 (1E5 x 1E2 chars) array + 1E4 x 1E2 chars : 6.3 ms

Private MB: 219.4
Peak Private MB: 233.8

--------------

qx(cat postscriptfile.ps): 312.0 ms
Private MB: 14.0
Peak Private MB: 25.8

################################################## #########

newlen += (newlen >> 6) + 10;

C:\wh_fast_perl\bin\perl d:\exe\LongStrings.pl

1E5 chars + 1E4 x 1E2 chars: 57.7 ms
1E6 chars + 1E4 x 1E2 chars: 59.8 ms
1E7 chars + 1E4 x 1E2 chars: 67.9 ms

1E7 chars + 1E5 x 1E1 chars: 69.4 ms
1E7 chars + 1E4 x 1E2 chars: 71.6 ms
1E7 chars + 1E3 x 1E3 chars: 69.6 ms
1E7 chars + 1E2 x 1E4 chars: 64.8 ms
1E7 chars + 1E1 x 1E5 chars: 53.8 ms

1E7 chars (pre-extend to 2E7) + 1E4 x 1E2 chars: 1.2 ms
1E7 (1E5 x 1E2 chars) array + 1E4 x 1E2 chars : 5.7 ms

Private MB: 219.8
Peak Private MB: 230.0

################################################## #########

unpatched Strawberry Perl

c:\strawberry\perl\bin\perl d:\exe\LongStrings.pl

1E5 chars + 1E4 x 1E2 chars: 96.2 ms
1E6 chars + 1E4 x 1E2 chars: 325.7 ms
1E7 chars + 1E4 x 1E2 chars: 2655.9 ms

1E7 chars + 1E5 x 1E1 chars: 2687.3 ms
1E7 chars + 1E4 x 1E2 chars: 2687.4 ms
1E7 chars + 1E3 x 1E3 chars: 2656.1 ms
1E7 chars + 1E2 x 1E4 chars: 1093.6 ms
1E7 chars + 1E1 x 1E5 chars: 108.3 ms

1E7 chars (pre-extend to 2E7) + 1E4 x 1E2 chars: 1.1 ms
1E7 (1E5 x 1E2 chars) array + 1E4 x 1E2 chars : 6.1 ms

Private MB: 200.4
Peak Private MB: 210.2

--------------

qx(cat postscriptfile.ps): 18187.5 ms
Private MB: 13.2
Peak Private MB: 24.9
 
Reply With Quote
 
Peter J. Holzer
Guest
Posts: n/a
 
      07-28-2010
On 2010-07-28 15:09, Wolfram Humann <(E-Mail Removed)> wrote:
> I went with Ilya's proposal but inserted the line a little further
> down, just after
> if (newlen > SvLEN(sv)) { /* need more room? */
>
> So now we have:
> if (newlen > SvLEN(sv)) { /* need more room? */
> newlen += (newlen >> 2) + 10;
> #ifndef Perl_safesysmalloc_size
> newlen = PERL_STRLEN_ROUNDUP(newlen);
> #endif
> if (SvLEN(sv) && s) {
> s = (char*)saferealloc(s, newlen);
> }
>
> The remaining question is by what ratio a string's memory should grow.
> I tried several values from (newlen >> 0) to (newlen >> 6) for the
> best compromise between execution time and memory usage and my
> personal favorite is (newlen >> 2). What do others here think?


That sounds about right. I've used factors between 1.2 and 1.5 in the
past for similar problems. I suggest you base the growth on the old
size, though, something like:

if (newlen > SvLEN(sv)) { /* need more room? */
size_t min = SvLEN(sv) * 5/4 + 10;
if (newlen < min) newlen = min;
...

This gives you the same growth pattern if the increments are small, but
it doesn't allocate extra memory if you append a large chunk.

hp

 
Reply With Quote
 
Ilya Zakharevich
Guest
Posts: n/a
 
      07-29-2010
On 2010-07-28, Wolfram Humann <(E-Mail Removed)> wrote:
> So now we have:
> if (newlen > SvLEN(sv)) { /* need more room? */
> newlen += (newlen >> 2) + 10;
> #ifndef Perl_safesysmalloc_size
> newlen = PERL_STRLEN_ROUNDUP(newlen);
> #endif
> if (SvLEN(sv) && s) {
> s = (char*)saferealloc(s, newlen);
> }


I think you make your life too simple. What you must do is find the
chain of events which sets
Perl_safesysmalloc_size/PERL_STRLEN_ROUNDUP, and modify this chain.

> I tried several values from (newlen >> 0) to (newlen >> 6) for the
> best compromise between execution time and memory usage and my
> personal favorite is (newlen >> 2). What do others here think?


My approach is never to take responsibility for such decisions.
Make a default value, and shift responsibility to the user.

#ifndef PERL_STRLEN_ROUNDUP_SHIFT
# define PERL_STRLEN_ROUNDUP_SHIFT 2
#endif

The suggestion to use the OLD length is also very viable...

> 1E5 chars + 1E4 x 1E2 chars: 1.5 ms


I hope your `ms' are actually seconds. It does not make sense to
measure performance on runs shorter than a second (maybe more on Win,
which is doing more unknown stuff in background)...

Yours,
Ilya
 
Reply With Quote
 
Wolfram Humann
Guest
Posts: n/a
 
      07-29-2010
On Jul 29, 6:30*am, Ilya Zakharevich <(E-Mail Removed)> wrote:
> On 2010-07-28, Wolfram Humann <(E-Mail Removed)> wrote:
>
> > So now we have:
> > * * if (newlen > SvLEN(sv)) { * * * * * /* need more room? */
> > * *newlen += (newlen >> 2) + 10;
> > #ifndef Perl_safesysmalloc_size
> > * *newlen = PERL_STRLEN_ROUNDUP(newlen);
> > #endif
> > * *if (SvLEN(sv) && s) {
> > * * * *s = (char*)saferealloc(s, newlen);
> > * *}

>
> I think you make your life too simple. *What you must do is find the
> chain of events which sets
> Perl_safesysmalloc_size/PERL_STRLEN_ROUNDUP, and modify this chain. *


I'm traveling territory that's fairly unknown to me already, so if
this is too simple I should leave it to someone more knowledgeable to
do it right. What I *think* is, that PERL_STRLEN_ROUNDUP is just
concerned with memory boundary alignment, e.g. roundup to the next
multiple of 4. If this is the case, it should be independent of any
string memory expansion strategy.

>
> > I tried several values from (newlen >> 0) to (newlen >> 6) for the
> > best compromise between execution time and memory usage and my
> > personal favorite is (newlen >> 2). What do others here think?

>
> My approach is never to take responsibility for such decisions.
> Make a default value, and shift responsibility to the user. *
>
> #ifndef PERL_STRLEN_ROUNDUP_SHIFT
> # *define PERL_STRLEN_ROUNDUP_SHIFT 2
> #endif


Given that nobody stumbled across the devastating current state, I
don't envision hoards of users trying to optimize this. But I agree
that it serves well as a reminder for someone reading the code that
this is not *the* correct value but a trade-off and could be decided
differently. However, given that IMO it's a different concept from the
existing PERL_STRLEN_ROUNDUP, I would prefer to give it a different
name. How about PERL_STRLEN_EXPAND_SHIFT?

> The suggestion to use the OLD length is also very viable...


Agreed.

> > 1E5 chars + 1E4 x 1E2 chars: * *1.5 ms

>
> I hope your `ms' are actually seconds. *It does not make sense to
> measure performance on runs shorter than a second (maybe more on Win,
> which is doing more unknown stuff in background)...


No, these are milliseconds. Yes, this makes it a lousy benchmark. Yes,
these numbers do vary easily by +-30% and sometimes more from run to
run. I tried to post "average" runs, but that's even more subjective,
of course. However the time differences encountered between different
cases are often a factor of 10 or even much more (in the unmodified
case, many of these do run for several seconds ), so I think this
is a valid base for comparison.

So my current proposal reads like this:

#ifndef PERL_STRLEN_EXPAND_SHIFT
# define PERL_STRLEN_EXPAND_SHIFT 2
#endif

if (newlen > SvLEN(sv)) { /* need more room? */
size_t minlen = SvCUR(sv);
minlen += (minlen >> PERL_STRLEN_EXPAND_SHIFT) + 10;
if (newlen < minlen) newlen = minlen;
#ifndef Perl_safesysmalloc_size
newlen = PERL_STRLEN_ROUNDUP(newlen);
#endif
if (SvLEN(sv) && s) {
s = (char*)saferealloc(s, newlen);
}

My benchmark script does run slower now. The previous version did
expand allocated memory beyond minimum requirements during the initial
assignment, so that the reallocation count during append was 0 in many
cases. The new version does not do that so that at least 1 realloc is
required when one starts appending to the string.

Wolfram
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Why the speed of reading data from ramdisk is the same as harddisk? sonet Perl Misc 2 06-09-2006 10:20 AM
Need speed increase while reading large chunk of data. Darsant C++ 8 06-11-2005 03:23 AM
Reported Wireless speed w/ repeater 7-9x Measured Speed Lance Wireless Networking 0 10-31-2004 09:31 PM
I need speed Mr .Net....speed Ham ASP .Net 6 10-29-2004 08:04 AM
speed speed speed a.metselaar Computer Support 14 12-30-2003 03:34 AM



Advertisments