Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Backreferences: alias vs copy

Reply
Thread Tools

Backreferences: alias vs copy

 
 
Michael Carman
Guest
Posts: n/a
 
      08-10-2008
In a separate thread someone recently asked what happens if they modify
the variable in a 'while ($var =~ /pattern/g)' loop. In crafting a
sample program I noticed something that surprised me a little:

my $s = 'abc';

while ($s =~ /(\w)/g) {
print "$1 - ";
$s = 'xyz' if $1 eq 'b';
print "$1\n";
}
__END__
a - a
b - y
x - x
y - y
z - z

In the second result, you can see that the value of $1 changes after
reassigning $s. Its value becomes the text from the new string at the
position corresponding to the match against the old one. This makes it
pretty clear that $1 is actually an alias instead of a copy but I can't
find this documented anywhere.

That made me wonder what would happen if the new string was shorter than
the match position in the old one. Consider

my $s = 'abc';

while ($s =~ /(\w)/g) {
print "$1 - ";
$s = 'x' if $1 eq 'c';
print "$1\n";
}
__END__
a - a
b - b
c - c # <--
x - x

as well as:

my $s = 'abc';

while ($s =~ /(\w)/g) {
print "$1 - ";
$s = 'xy' if $1 eq 'c';
print "$1\n";
}
__END__

a - a
b - b
c - # <--
x - x
y - y

If that doesn't scream "NUL terminated C string!" I don't know what does.

Is this documented anywhere, preferably with a caveat about using $1 and
kin after you've changed the match string?

-mjc
 
Reply With Quote
 
 
 
 
comp.lang.c++
Guest
Posts: n/a
 
      08-14-2008
On Aug 10, 8:04 am, Michael Carman <(E-Mail Removed)> wrote:
> In a separate thread someone recently asked what happens if they modify
> the variable in a 'while ($var =~ /pattern/g)' loop. In crafting a
> sample program I noticed something that surprised me a little:
>
> my $s = 'abc';
>
> while ($s =~ /(\w)/g) {
> print "$1 - ";
> $s = 'xyz' if $1 eq 'b';
> print "$1\n";
> }
> __END__
> a - a
> b - y
> x - x
> y - y
> z - z
>
> In the second result, you can see that the value of $1 changes after
> reassigning $s. Its value becomes the text from the new string at the
> position corresponding to the match against the old one. This makes it
> pretty clear that $1 is actually an alias instead of a copy but I can't
> find this documented anywhere.
>


I can't find anything completely
explicit but the performance penalty would be prohibitive. There'd be
a double whammy if
the backref. was captured but not used afterwards.

> That made me wonder what would happen if the new string was shorter than
> the match position in the old one. Consider
>
> my $s = 'abc';
>
> while ($s =~ /(\w)/g) {
> print "$1 - ";
> $s = 'x' if $1 eq 'c';
> print "$1\n";
> }
> __END__
> a - a
> b - b
> c - c # <--
> x - x
>
> as well as:
>
> my $s = 'abc';
>
> while ($s =~ /(\w)/g) {
> print "$1 - ";
> $s = 'xy' if $1 eq 'c';
> print "$1\n";
> }
> __END__
>
> a - a
> b - b
> c - # <--
> x - x
> y - y
>
> If that doesn't scream "NUL terminated C string!" I don't know what does.
>
> Is this documented anywhere, preferably with a caveat about using $1 and
> kin after you've changed the match string?
>


The only hint I saw was perlre's
warning that once $& is seen, the copy price tag extends to $1, $2,
etc as well:


WARNING: Once Perl sees that you
need one of $&, $`, or $'
anywhere in the program, it has
to provide them for every
pattern match. This may
substantially slow your program.
Perl uses the same mechanism to
produce $1, $2, etc, so you
also pay a price for each pattern
that contains capturing parens...


That seems like a clear inference could be made that no copy occurs in
the absence of $&.

--
Charles DeRykus

 
Reply With Quote
 
 
 
 
xhoster@gmail.com
Guest
Posts: n/a
 
      08-14-2008
"comp.lang.c++" <(E-Mail Removed)> wrote:
> >
> > Is this documented anywhere, preferably with a caveat about using $1
> > and kin after you've changed the match string?
> >

>
> The only hint I saw was perlre's
> warning that once $& is seen, the copy price tag extends to $1, $2,
> etc as well:
>
> WARNING: Once Perl sees that you
> need one of $&, $`, or $'
> anywhere in the program, it has
> to provide them for every
> pattern match. This may
> substantially slow your program.
> Perl uses the same mechanism to
> produce $1, $2, etc, so you
> also pay a price for each pattern
> that contains capturing parens...


I think you are misinterpreting that. It goes on to say:

> But if you never use $&, $` or $', then patterns without capturing
> parentheses will not be penalized.


This seems to imply that patterns *with* capturing parentheses will be
penalized, even in the absence of $&, $` or $'.


> That seems like a clear inference could be made that no copy occurs in
> the absence of $&.


Maybe that is what is actually happening, but it seems far from clear based
on the documents.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 
Reply With Quote
 
comp.lang.c++
Guest
Posts: n/a
 
      08-14-2008
On Aug 14, 11:42 am, (E-Mail Removed) wrote:
> "comp.lang.c++" <(E-Mail Removed)> wrote:
>
> > > Is this documented anywhere, preferably with a caveat about using $1
> > > and kin after you've changed the match string?

>
> > The only hint I saw was perlre's
> > warning that once $& is seen, the copy price tag extends to $1, $2,
> > etc as well:

>
> > WARNING: Once Perl sees that you
> > need one of $&, $`, or $'
> > anywhere in the program, it has
> > to provide them for every
> > pattern match. This may
> > substantially slow your program.
> > Perl uses the same mechanism to
> > produce $1, $2, etc, so you
> > also pay a price for each pattern
> > that contains capturing parens...

>
> I think you are misinterpreting that. It goes on to say:
>
> > But if you never use $&, $` or $', then patterns without capturing
> > parentheses will not be penalized.

>
> This seems to imply that patterns *with* capturing parentheses will be
> penalized, even in the absence of $&, $` or $'.
>


No, I think capturing parens
actually copy if $& is in the
picture. Compare below with
orig. output:

my $s = 'abc';
while ($s =~ /(\w)/g) {
print "$&: $1 - ";
print "$1 - ";
$s = 'xyz' if $1 eq 'b';
print "$1\n";
}
__END__
a: a - a
b: b - b
x: x - x
y: y - y
z: z - z

--
Charles DeRykus


 
Reply With Quote
 
xhoster@gmail.com
Guest
Posts: n/a
 
      08-14-2008
"comp.lang.c++" <(E-Mail Removed)> wrote:
> On Aug 14, 11:42 am, (E-Mail Removed) wrote:
> > "comp.lang.c++" <(E-Mail Removed)> wrote:
> >
> > > > Is this documented anywhere, preferably with a caveat about using
> > > > $1 and kin after you've changed the match string?

> >
> > > The only hint I saw was perlre's
> > > warning that once $& is seen, the copy price tag extends to $1, $2,
> > > etc as well:

> >
> > > WARNING: Once Perl sees that you
> > > need one of $&, $`, or $'
> > > anywhere in the program, it has
> > > to provide them for every
> > > pattern match. This may
> > > substantially slow your program.
> > > Perl uses the same mechanism to
> > > produce $1, $2, etc, so you
> > > also pay a price for each pattern
> > > that contains capturing parens...

> >
> > I think you are misinterpreting that. It goes on to say:
> >
> > > But if you never use $&, $` or $', then patterns without capturing
> > > parentheses will not be penalized.

> >
> > This seems to imply that patterns *with* capturing parentheses will be
> > penalized, even in the absence of $&, $` or $'.
> >

>
> No, I think capturing parens
> actually copy if $& is in the
> picture. Compare below with
> orig. output:
>
> my $s = 'abc';
> while ($s =~ /(\w)/g) {
> print "$&: $1 - ";
> print "$1 - ";
> $s = 'xyz' if $1 eq 'b';
> print "$1\n";
> }


Based on my experimentation:

In the absence of /g, capturing parenthesis always copy.

In the presence of $&, capturing parenthesis always copy.

They alias only if they are used with a /g and only if $& (etc) has not
been seen.

Odd.

If you use a string eval to inspect $&, $' or $` (so that Perl doesn't
see them coming), then those variables are set by alias vs. copy under the
same conditions the capturing parenthesis are. And if the regex doesn't
have any capturing parenthesis, then $& etc are set by alias. That was a
surprise; I figured they wouldn't get set at all when Perl doesn't see them
coming and there were no capturing parentheses.


Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 
Reply With Quote
 
Michael Carman
Guest
Posts: n/a
 
      08-15-2008
comp.lang.c++ wrote:
> On Aug 10, 8:04 am, Michael Carman <(E-Mail Removed)> wrote:
>> This makes it pretty clear that $1 is actually an alias instead of
>> a copy but I can't find this documented anywhere.

>
> I can't find anything completely explicit but the performance penalty
> would be prohibitive.


Yes, the behavior isn't surprising at all if you think about the
implementation a little.

> WARNING: Once Perl sees that you need one of $&, $`, or $' anywhere
> in the program, it has to provide them for every pattern match. This
> may substantially slow your program. Perl uses the same mechanism to
> produce $1, $2, etc, so you also pay a price for each pattern that
> contains capturing parens...
>
> That seems like a clear inference could be made that no copy occurs
> in the absence of $&.


All that says is that if you use those variables perl has to track the
prematch, match, and postmatch for every regular expression. This is
because they're set after a successful match, and when you use them perl
has no way of knowing which regex will have be the last successful one.

Capturing parens only introduce the overhead for the regexes in which
they are used because it's clear that they only apply there.

After poking around a bit more, I noticed that perlvar has this to say
in the entry for @- (@LAST_MATCH_START):

$1 is the same as "substr($var, $-[1], $+[1] - $-[1])"

I had always read that as "is equivalent to" but it would appear that a
literal interpretation is warranted. They really are the exact same.

-mjc
 
Reply With Quote
 
Michael Carman
Guest
Posts: n/a
 
      08-15-2008
http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:
> In the absence of /g, capturing parenthesis always copy.
>
> In the presence of $&, capturing parenthesis always copy.
>
> They alias only if they are used with a /g and only if $& (etc) has
> not been seen.


I see the same behavior, though I wonder if in the presence of $& it's
actually $& that's the copy and then $1 and friends alias to it instead
of to the original string. There's probably no way of knowing without
mucking through the guts.

> If you use a string eval to inspect $&, $' or $` (so that Perl
> doesn't see them coming), then those variables are set by alias vs.
> copy under the same conditions the capturing parenthesis are.


Actually, it's weirder than that:

perl -e "$_ = 'abc123'; /\d/; $_ = 'xyz789'; print qq{[$&]}"
[1]

perl -e "$_ = 'abc123'; /1/; $_ = 'xyz789'; print qq{[$&]}"
[1]

perl -e "$_ = 'abc123'; /\d/; $_ = 'xyz789'; eval 'print qq{[$&]}'"
[7]

perl -e "$_ = 'abc123'; /1/; $_ = 'xyz789'; eval 'print qq{[$&]}'"
[]

perl -e "$_ = 'abc123'; /[0-9]/; $_ = 'xyz789'; eval 'print qq{[$&]}'"
[7]

perl -e "$_ = 'abc123'; /\w1/; $_ = 'xyz789'; eval 'print qq{[$&]}'"
[z7]

> And if the regex doesn't have any capturing parenthesis, then $& etc
> are set by alias. That was a surprise; I figured they wouldn't get
> set at all when Perl doesn't see them coming and there were no
> capturing parentheses.


Agreed. I was particularly surprised by that as well, although it
depends on the pattern. If you match literal text $& isn't set; you'll
get an uninitialized value warning if you add -w.

If you match against things like /\d/, /[0-9]/, or /(?:1|2)/ then $&
does get set. Patterns such as /[1]/ and /(?:1)/ don't set it,
presumably because they can be simplified to a literal /1/.

It appears that the aliasing (at least for a stealth $&) is a side
effect of the regex engine potentially needing to backtrack. I suspect
that for the literal matches perl is calling index() to look for a
substring instead of invoking the regex engine.

It's possible that the behavior of $1 is the result of a similar
implementation detail/optimization. I'm hesitant to call it a bug,
though it might be.

-mjc
 
Reply With Quote
 
John W. Krahn
Guest
Posts: n/a
 
      08-15-2008
Michael Carman wrote:
>
> After poking around a bit more, I noticed that perlvar has this to say
> in the entry for @- (@LAST_MATCH_START):
>
> $1 is the same as "substr($var, $-[1], $+[1] - $-[1])"
>
> I had always read that as "is equivalent to" but it would appear that a
> literal interpretation is warranted. They really are the exact same.


They are not *exactly* the same. You can assign to substr($var, $-[1],
$+[1] - $-[1]) but you cannot assign to $1.



John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall
 
Reply With Quote
 
Willem
Guest
Posts: n/a
 
      08-15-2008
John W. Krahn wrote:
) Michael Carman wrote:
)>
)> After poking around a bit more, I noticed that perlvar has this to say
)> in the entry for @- (@LAST_MATCH_START):
)>
)> $1 is the same as "substr($var, $-[1], $+[1] - $-[1])"
)>
)> I had always read that as "is equivalent to" but it would appear that a
)> literal interpretation is warranted. They really are the exact same.
)
) They are not *exactly* the same. You can assign to substr($var, $-[1],
) $+[1] - $-[1]) but you cannot assign to $1.

Which is a pity, IMHO. Assigning to $1 would be a good faeture.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
Reply With Quote
 
Dr.Ruud
Guest
Posts: n/a
 
      08-15-2008
(E-Mail Removed) schreef:
> "comp.lang.c++" <(E-Mail Removed)> wrote:


>> But if you never use $&, $` or $', then patterns without capturing
>> parentheses will not be penalized.

>
> This seems to imply that patterns *with* capturing parentheses will be
> penalized, even in the absence of $&, $` or $'.


No.

Without the special patterns, this penalisation just doesn't occur.
This penalisation is only there when the special patterns are there.
A single occurence of the patterns makes Perl do something extra (like
capturing) for every regex, but if a regex is already capturing anyway,
the penalisation is less personal.
(etc., like the Parror sketch)

--
Affijn, Ruud

"Gewoon is een tijger."

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
reference/alias in perl vs reference/alias in C++ grocery_stocker C++ 9 05-24-2008 04:32 AM
XP X64 Pro, IE7, Delphi 2007, IntraWeb, MS Access, ODBC Alias problem: Insufficient memory for this operation. Alias: SomeDatabase. Skybuck Flying Windows 64bit 13 01-09-2008 07:34 PM
what is Deep Copy, shallow copy and bitwises copy.? saxenavaibhav17@gmail.com C++ 26 09-01-2006 09:37 PM
is dict.copy() a deep copy or a shallow copy Alex Python 2 09-05-2005 07:01 AM
[vhdl] how to wire two signals together? alias not adequate Khashishi VHDL 3 09-22-2004 10:36 PM



Advertisments