Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > $& imposes a considerable performance penalty they say

Reply
Thread Tools

$& imposes a considerable performance penalty they say

 
 
Dan Jacobson
Guest
Posts: n/a
 
      11-05-2004
$ man perlvar
$& The string matched by the last successful pattern match...
The use of this variable anywhere in a program imposes a con-
siderable performance penalty on all regular expression
matches. See "BUGS".
$ time echo x|perl -wpe 's/(x)/a$1y/'
axy
real 0m0.011s
user 0m0.003s
sys 0m0.004s
$ time echo x|perl -wpe 's/x/a$&y/'
axy
real 0m0.007s
user 0m0.001s
sys 0m0.006s

I'm not sure which of the times means money, but if it is real, then
what's the deal?
 
Reply With Quote
 
 
 
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      11-05-2004
Dan Jacobson wrote:
> $ man perlvar
> $& The string matched by the last successful pattern match...
> The use of this variable anywhere in a program imposes a con-
> siderable performance penalty on all regular expression
> matches. See "BUGS".
> $ time echo x|perl -wpe 's/(x)/a$1y/'
> axy
> real 0m0.011s
> user 0m0.003s
> sys 0m0.004s
> $ time echo x|perl -wpe 's/x/a$&y/'
> axy
> real 0m0.007s
> user 0m0.001s
> sys 0m0.006s
>
> I'm not sure which of the times means money, but if it is real, then
> what's the deal?


Even if I have never tried to quantify the claimed performance penalty
caused by $&, I realize that your above examples are not sufficient for
drawing any conclusions. The point, if I have understood it correctly,
is that the use of $& *once* enables capturing for *all* regular
expressions in the program, also those without capturing parentheses or
capturing through $&.

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
 
Reply With Quote
 
 
 
 
Uri Guttman
Guest
Posts: n/a
 
      11-05-2004
>>>>> "GH" == Gunnar Hjalmarsson <(E-Mail Removed)> writes:

GH> Even if I have never tried to quantify the claimed performance
GH> penalty caused by $&, I realize that your above examples are not
GH> sufficient for drawing any conclusions. The point, if I have
GH> understood it correctly, is that the use of $& *once* enables
GH> capturing for *all* regular expressions in the program, also those
GH> without capturing parentheses or capturing through $&.

to clarify that, $& is a way to capture the entire match. it is similar
to enclosing the regex in () and using $1. so by itself it is useful
(golfers like it . but in order to work properly it has a global side
effect. since it always has the full match from the last regex, and it
is a global var, if you use it once ANYWHERE in your code, the matched
string (btw, this really only matters with s/// since it can change the
original string) must be copied for all s/// even if you don't have any
capturing parens. so in general, don't use it, use explicit capturing
parens which will only cause the s/// with them to copy the original
string.

the OP's wimpy test didn't even come close to showing this issue. it
would need to be something which did s/// without capturing and either
$& being mentioned or not. and it would need many more runs than 1 to
show the difference. of course benchmark.pm is the way to do that as
timing a script will show nothing but compiler time and has no accuracy
at the required level.

uri
 
Reply With Quote
 
Eric Schwartz
Guest
Posts: n/a
 
      11-05-2004
Uri Guttman <(E-Mail Removed)> writes:
> so in general, don't use [$&], use explicit capturing
> parens which will only cause the s/// with them to copy the original
> string.


I don't have such an old perl to hand, but perlre points out that:

As of 5.005, $& is not so costly as the other two.

(meaning $' and $`)

How much less costly is it?

As a side note: Thanks to Abigail, mostly, one alteration I've made to
my personal programming practises lately is that I've started using
things like $&, shelling out, etc., more often in cases where the code
isn't time-critical (which is, frankly, most of the time). I've found
that it will often save me mental effort time, and in many cases makes
the code clearer than a more conventional approach might dictate.

Recently, for instance, I replaced a shell script that examined a
Linux system, and printed out what cards it thought were in which
slots, with a Perl program that does all sorts of conventionally 'bad'
things, like using $&, lots of `find -name ... | grep | sort -u`, and
the like because I was trying, as much as possible, to stick with the
logic of the shell script, and I figured "Heck, I'll optimize it
later, and pass around arrayrefs instead of calling `lspci`
everywhere, and use File::Find, and stop with the $&."

Before I even got around to it, I ran some benchmarks, and I still cut
down the average run time from 10 seconds to 3, so I give myself a
free pass for using those constructs in that context. I realize that
is not disagreeing with you, just that sometimes, the performance hit
of using $&, or shelling out even when there's a perfectly good module
available, isn't significant.

My advice would be to use them wherever you like, but be aware that
they can indeed cause performance problems. Even so, I'd still
profile your program before rushing to those as the first cure to poor
performance-- you may well find, as I have, that poor algorithms or
inefficient data structures are far more detrimental to your program's
run than $& could ever be.

-=Eric
--
Come to think of it, there are already a million monkeys on a million
typewriters, and Usenet is NOTHING like Shakespeare.
-- Blair Houghton.
 
Reply With Quote
 
Anno Siegel
Guest
Posts: n/a
 
      11-09-2004
Uri Guttman <(E-Mail Removed)> wrote in comp.lang.perl.misc:
> >>>>> "GH" == Gunnar Hjalmarsson <(E-Mail Removed)> writes:

>
> GH> Even if I have never tried to quantify the claimed performance
> GH> penalty caused by $&, I realize that your above examples are not
> GH> sufficient for drawing any conclusions. The point, if I have
> GH> understood it correctly, is that the use of $& *once* enables
> GH> capturing for *all* regular expressions in the program, also those
> GH> without capturing parentheses or capturing through $&.
>
> to clarify that, $& is a way to capture the entire match. it is similar
> to enclosing the regex in () and using $1. so by itself it is useful
> (golfers like it . but in order to work properly it has a global side
> effect. since it always has the full match from the last regex, and it
> is a global var, if you use it once ANYWHERE in your code, the matched
> string (btw, this really only matters with s/// since it can change the
> original string) must be copied for all s/// even if you don't have any
> capturing parens. so in general, don't use it, use explicit capturing
> parens which will only cause the s/// with them to copy the original
> string.
>
> the OP's wimpy test didn't even come close to showing this issue. it


Here's a similarly wimpy test that does show the difference:

time perl -e '$_ = "x" x 10_000; $1 while /(x)/g'
0.290u 0.030s 0:00.32 100.0%

time perl -e '$_ = "x" x 10_000; $& while /(x)/g'
2.910u 0.030s 0:02.96 99.3%

You want a long string to match over to see the difference. The
point is that after use of $&, all of $`, $& and $' are active, and
so the whole string is copied on every match, as opposed to only
the match itself with "()".

Some weeks ago we had a case here where someone did that with a
multi-gigabyte string...

Anno
 
Reply With Quote
 
Uri Guttman
Guest
Posts: n/a
 
      11-09-2004
>>>>> "AS" == Anno Siegel <(E-Mail Removed)-berlin.de> writes:

>> the OP's wimpy test didn't even come close to showing this issue. it


AS> Here's a similarly wimpy test that does show the difference:

AS> time perl -e '$_ = "x" x 10_000; $1 while /(x)/g'
AS> 0.290u 0.030s 0:00.32 100.0%

AS> time perl -e '$_ = "x" x 10_000; $& while /(x)/g'
AS> 2.910u 0.030s 0:02.96 99.3%

AS> You want a long string to match over to see the difference. The
AS> point is that after use of $&, all of $`, $& and $' are active, and
AS> so the whole string is copied on every match, as opposed to only
AS> the match itself with "()".

try some minor changes. move the $& to somewhere else and use $1 in both
cases. that will show its global nature. and another variant would be to
not even grab when using $& and it will also do a full copy.

AS> Some weeks ago we had a case here where someone did that with a
AS> multi-gigabyte string...

yow!

uri

--
Uri Guttman ------ http://www.velocityreviews.com/forums/(E-Mail Removed) -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Nikon D7000 noise is considerable RichA Digital Photography 18 04-28-2011 12:44 PM
Germany passes law that imposes 2 year prison sentence for illegal downloading. Useful Info DVD Video 13 05-20-2007 02:09 PM
RubyConf Hotel -- they now say they *do* have rooms David A. Black Ruby 2 08-19-2004 08:16 PM
Exceptions performance penalty Michael Andersson C++ 7 09-03-2003 04:39 PM
Is there Performance Penalty for multiple cross-assembly-calls? Peter Bär ASP .Net 2 07-18-2003 03:17 AM



Advertisments