Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Re: Problem with splitting data

Reply
Thread Tools

Re: Problem with splitting data

 
 
Peter J. Holzer
Guest
Posts: n/a
 
      03-25-2012
On 2012-03-25 05:02, Uri Guttman <(E-Mail Removed)> wrote:
>>>>>> "PJH" == Peter J Holzer <(E-Mail Removed)> writes:

>
> PJH> On 2012-03-21 16:33, Uri Guttman <(E-Mail Removed)> wrote:
>
> >> my $text = do { local( @ARGV, $/ ) = $filename ; <> } ;
> >>
> >> that is the (im)proper idiom for slurping in a file. no open needed as
> >> it is done by the <> on the values in @ARGV. slow as hell too!

>
> PJH> Have you actually benchmarked this in the last 10 years?
>
> PJH> On my systems
> PJH> my $text = do { local( @ARGV, $/ ) = $filename ; <> } ;
> PJH> and
> PJH> my $text = read_file($filename);
> PJH> are almost exactly the same speed for largish files (for very small
> PJH> files the former is even a bit faster).
>
> PJH> However,
> PJH> read_file($filename, buf_ref => \$text);
> PJH> is a lot (factor 3-4) faster, since it avoids the extra copy.
>
> yes. and that is mentioned in the docs as the fastest style of slurp.


It is not, however, mentioned in the synopsis.

I bet most users just use
my $text = read_file($filename);

OTOH, performance probably isn't an issue for most users.

> and the benchmark script shows that as well. given that i rewrote the
> benchmark script last year (to improve the structure, options and
> such), you know i benchmarked all the slurps recently.


Your benchmark script doesn't include the case
$text = do { local( @ARGV, $/ ) = $filename ; <> } ;

It includes a case
my $text = orig_slurp_scalar( $file_name )

where orig_slurp_scalar then calls orig_slurp, which does the above. So
that adds two function calls and at least one, more likely several extra
copies (I don't know how scalar returns are implemented in perl).

I have added this to the end of bench_scalar_slurp and rerun the script:

direct_slurp_scalar =>
sub { my $text = do { local( @ARGV, $/ ) = $file_name ; <> } },

The result is surprising. I would have expected that to be about as fast
as FS::read_file (because that's what I've seen in my own benchmarks),
but it's a lot faster, even faster than FS::read_file_buf_ref2:

Rate orig_slurp FS::read_file FS::read_file_buf_ref2 direct_slurp_scalar
file_contents 169/s -76% -81% -90% -92%
file_contents_no_OO 170/s -75% -81% -90% -92%
orig_read_file 560/s -19% -39% -67% -73%
orig_slurp 694/s -- -24% -59% -66%
FS12::read_file 907/s 31% -0% -46% -56%
FS::read_file 910/s 31% -- -46% -55%
old_sysread_file 919/s 32% 1% -45% -55%
FS::read_file_scalar_ref 1047/s 51% 15% -37% -49%
FS::read_file_buf_ref 1051/s 52% 15% -37% -49%
old_read_file 1232/s 78% 35% -26% -40%
FS::read_file_buf_ref2 1673/s 141% 84% -- -18%
direct_slurp_scalar 2043/s 195% 124% 22% --

(irrelevant columns omitted)

I wonder if there is a systematic error here ...

> PJH> All tests were made with files which were already cached in memory -
> PJH> when the files have to be read from disk, all differences will probably
> PJH> be negligible.
>
> the benchmark script uses Benchmark.pm and so it runs on the same files
> many times. if you run the script twice in a row it will almost for sure
> have the files cached in ram.


Yes, I know. I just wanted to mention that in real life the files you
have to read are not always already in memory, but often on disk, which
is a lot slower. So my benchmarks (like yours) exaggerate the
differences (If you have to wait for 20 disk seeks it doesn't matter if
you save 1 millisecond or not).

hp


--
_ | Peter J. Holzer | Deprecating human carelessness and
|_|_) | Sysadmin WSR | ignorance has no successful track record.
| | | http://www.velocityreviews.com/forums/(E-Mail Removed) |
__/ | http://www.hjp.at/ | -- Bill Code on (E-Mail Removed)
 
Reply With Quote
 
 
 
 
Dr.Ruud
Guest
Posts: n/a
 
      03-25-2012
On 2012-03-25 13:25, Peter J. Holzer wrote:

> direct_slurp_scalar =>
> sub { my $text = do { local( @ARGV, $/ ) = $file_name ;<> } },


What is the role of the "my $text = do {...}" wrapper?

I would expect just:

direct_slurp_scalar =>
sub { local( @ARGV, $/ ) = $file_name; <> },

--
Ruud
 
Reply With Quote
 
 
 
 
Rainer Weikusat
Guest
Posts: n/a
 
      03-25-2012
"Dr.Ruud" <(E-Mail Removed)> writes:
> On 2012-03-25 13:25, Peter J. Holzer wrote:
>
>> direct_slurp_scalar =>
>> sub { my $text = do { local( @ARGV, $/ ) = $file_name ;<> } },

>
> What is the role of the "my $text = do {...}" wrapper?


Make the code appear more complicated than it actually is.
 
Reply With Quote
 
Peter J. Holzer
Guest
Posts: n/a
 
      03-25-2012
On 2012-03-25 13:13, Dr.Ruud <(E-Mail Removed)> wrote:
> On 2012-03-25 13:25, Peter J. Holzer wrote:
>> direct_slurp_scalar =>
>> sub { my $text = do { local( @ARGV, $/ ) = $file_name ;<> } },

>
> What is the role of the "my $text = do {...}" wrapper?
>
> I would expect just:
>
> direct_slurp_scalar =>
> sub { local( @ARGV, $/ ) = $file_name; <> },


All the other benchmarks assign the result to a variable. So I
have to do that here, too, to make the results comparable.

There are various ways in which the assignments can happen, so it makes
sense to benchmark the effect of those ways. Just throwing away the
result doesn't make much sense, however.

hp


--
_ | Peter J. Holzer | Deprecating human carelessness and
|_|_) | Sysadmin WSR | ignorance has no successful track record.
| | | (E-Mail Removed) |
__/ | http://www.hjp.at/ | -- Bill Code on (E-Mail Removed)
 
Reply With Quote
 
Uri Guttman
Guest
Posts: n/a
 
      03-26-2012
>>>>> "PJH" == Peter J Holzer <(E-Mail Removed)> writes:

PJH> Your benchmark script doesn't include the case
PJH> $text = do { local( @ARGV, $/ ) = $filename ; <> } ;

PJH> It includes a case
PJH> my $text = orig_slurp_scalar( $file_name )

PJH> where orig_slurp_scalar then calls orig_slurp, which does the above. So
PJH> that adds two function calls and at least one, more likely several extra
PJH> copies (I don't know how scalar returns are implemented in perl).

true. i didn't account for the overhead in the extra sub calls.

PJH> I have added this to the end of bench_scalar_slurp and rerun the script:

PJH> direct_slurp_scalar =>·
PJH> sub { my $text = do { local( @ARGV, $/ ) = $file_name ; <> } },

PJH> The result is surprising. I would have expected that to be about as fast
PJH> as FS::read_file (because that's what I've seen in my own benchmarks),
PJH> but it's a lot faster, even faster than FS::read_file_buf_ref2:

what size file are you testing? the script has the option of selecting
multiple file sizes. slurp's speed wins more for larger files as it has
less overhead (much of that is in arg processing and error checking).

PJH> Rate orig_slurp FS::read_file FS::read_file_buf_ref2 direct_slurp_scalar
PJH> file_contents 169/s -76% -81% -90% -92%
PJH> file_contents_no_OO 170/s -75% -81% -90% -92%
PJH> orig_read_file 560/s -19% -39% -67% -73%
PJH> orig_slurp 694/s -- -24% -59% -66%
PJH> FS12::read_file 907/s 31% -0% -46% -56%
PJH> FS::read_file 910/s 31% -- -46% -55%
PJH> old_sysread_file 919/s 32% 1% -45% -55%
PJH> FS::read_file_scalar_ref 1047/s 51% 15% -37% -49%
PJH> FS::read_file_buf_ref 1051/s 52% 15% -37% -49%
PJH> old_read_file 1232/s 78% 35% -26% -40%
PJH> FS::read_file_buf_ref2 1673/s 141% 84% -- -18%
PJH> direct_slurp_scalar 2043/s 195% 124% 22% --

i wouldn't call that much faster. also as i said, file sizes matter
too. and perl could have improved the guts of <> since i first wrote
that (it needed it badly). even so, it is such a fugly idiom that i
would never teach it.

PJH> I wonder if there is a systematic error here ...

PJH> All tests were made with files which were already cached in memory -
PJH> when the files have to be read from disk, all differences will probably
PJH> be negligible.

not exactly as requesting larger reads is still faster than what stdio
would do. but sure, disk is much slower than ram as we all know.

when i get to the next version (maybe in a couple of weeks) i will add
your entry to the benchmark. i have a couple of other minor fixes to
make.

uri
 
Reply With Quote
 
Peter J. Holzer
Guest
Posts: n/a
 
      03-26-2012
On 2012-03-26 00:21, Uri Guttman <(E-Mail Removed)> wrote:
>>>>>> "PJH" == Peter J Holzer <(E-Mail Removed)> writes:

>
> PJH> Your benchmark script doesn't include the case
> PJH> $text = do { local( @ARGV, $/ ) = $filename ; <> } ;
>
> PJH> It includes a case
> PJH> my $text = orig_slurp_scalar( $file_name )
>
> PJH> where orig_slurp_scalar then calls orig_slurp, which does the above. So
> PJH> that adds two function calls and at least one, more likely several extra
> PJH> copies (I don't know how scalar returns are implemented in perl).
>
> true. i didn't account for the overhead in the extra sub calls.
>
> PJH> I have added this to the end of bench_scalar_slurp and rerun the script:
>
> PJH> direct_slurp_scalar =>
> PJH> sub { my $text = do { local( @ARGV, $/ ) = $file_name ; <> } },
>
> PJH> The result is surprising. I would have expected that to be about as fast
> PJH> as FS::read_file (because that's what I've seen in my own benchmarks),
> PJH> but it's a lot faster, even faster than FS::read_file_buf_ref2:
>
> what size file are you testing?


Sorry, I accidentally deleted that line. These times are from the 1MB
scalar read test case (on a 3GHz Core2).

For the smaller sizes (512B, 10kB) orig_slurp is *faster* than
FS::read_file and and direct_slurp_scalar ist still faster, but
old_sysread_file beats them all .

> PJH> Rate orig_slurp FS::read_file FS::read_file_buf_ref2 direct_slurp_scalar
> PJH> file_contents 169/s -76% -81% -90% -92%
> PJH> file_contents_no_OO 170/s -75% -81% -90% -92%
> PJH> orig_read_file 560/s -19% -39% -67% -73%
> PJH> orig_slurp 694/s -- -24% -59% -66%
> PJH> FS12::read_file 907/s 31% -0% -46% -56%
> PJH> FS::read_file 910/s 31% -- -46% -55%
> PJH> old_sysread_file 919/s 32% 1% -45% -55%
> PJH> FS::read_file_scalar_ref 1047/s 51% 15% -37% -49%
> PJH> FS::read_file_buf_ref 1051/s 52% 15% -37% -49%
> PJH> old_read_file 1232/s 78% 35% -26% -40%
> PJH> FS::read_file_buf_ref2 1673/s 141% 84% -- -18%
> PJH> direct_slurp_scalar 2043/s 195% 124% 22% --
>
> i wouldn't call that much faster.


Well, you called orig_slurp "slow as hell", but FS::read_file is only
31% faster, while direct_slurp_scalar is 124% faster than FS::read_file.


> also as i said, file sizes matter too.


Yes, of course.


> and perl could have improved the guts of <> since i first wrote that
> (it needed it badly).


That's why I asked whether you had repeated your benchmarks in the last
ten years. Perl I/O has been significantly revamped for 5.8.x and it
hasn't used stdio by default for a long time (it's still available as a
compile time option I think). Oh and the last time we had this
discussion (about 2 years ago) you quoted benchmark results from a 300
MHz SPARC (IIRC), which wasn't exactly bleeding edge at the time.


> even so, it is such a fugly idiom that i would never teach it.


That I agree with.


> PJH> I wonder if there is a systematic error here ...
>
> PJH> All tests were made with files which were already cached in memory -
> PJH> when the files have to be read from disk, all differences will probably
> PJH> be negligible.
>
> not exactly as requesting larger reads is still faster than what stdio
> would do.


Even stdio is much faster than disk and has been for a long time (at
least on Linux). A CPU can burn an awful lot of cycles while waiting for
the next block. And perl doesn't use stdio anyway.

hp


--
_ | Peter J. Holzer | Deprecating human carelessness and
|_|_) | Sysadmin WSR | ignorance has no successful track record.
| | | (E-Mail Removed) |
__/ | http://www.hjp.at/ | -- Bill Code on (E-Mail Removed)
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
VWIC-1MFT-T1 and splitting voice (Callmanager) and Data (Frame Relay) jaarons@hotmail.com Cisco 1 04-01-2005 12:53 AM
Splitting a Point to Point T1 for Voice/Data Douglas Cisco 2 10-25-2004 10:38 PM
Re: Splitting up the definitions of a class into different files (splitting public from private)? John Dibling C++ 0 07-19-2003 04:41 PM
Re: Splitting up the definitions of a class into different files (splitting public from private)? Mark C++ 0 07-19-2003 04:24 PM
Re: Splitting up the definitions of a class into different files (splitting public from private)? John Ericson C++ 0 07-19-2003 04:03 PM



Advertisments