Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Perl Misc (http://www.velocityreviews.com/forums/f67-perl-misc.html)
-   -   if ('A:B:C' =~ /:(.*?)$/) then why the heck is $1 'B:C' and not just 'C' (http://www.velocityreviews.com/forums/t914157-if-a-b-c-then-why-the-heck-is-1-b-c-and-not-just-c.html)

OwlHoot 11-12-2010 02:38 PM

if ('A:B:C' =~ /:(.*?)$/) then why the heck is $1 'B:C' and not just 'C'
 
To repeat the title, in case it is munged by Google Groups:

if ('A:B:C' =~ /:(.*?)$/) then why the heck is $1 'B:C' and not just
'C'

I've been developing with perl for years; but even simple things in it
still
sometimes throw up surprises.

The regexp /:(.*?)$/ is anchored on the right by $, then comes a non-
greedy
match which, AIUI, is the "shortest string it can get away with",
preceded
by a colon. So I would expect this to pick up just the "C", as it does
with
/([^:]*)$/.

Am I assuming/doing something silly? It is friday afternoon after all.


Cheers

John R Ramsden

Wolf Behrenhoff 11-12-2010 02:56 PM

Re: if ('A:B:C' =~ /:(.*?)$/) then why the heck is $1 'B:C' and notjust 'C'
 
On 12.11.2010 15:38, OwlHoot wrote:
> To repeat the title, in case it is munged by Google Groups:
>
> if ('A:B:C' =~ /:(.*?)$/) then why the heck is $1 'B:C' and not just
> 'C'
>
> I've been developing with perl for years; but even simple things in it
> still
> sometimes throw up surprises.
>
> The regexp /:(.*?)$/ is anchored on the right by $, then comes a non-
> greedy
> match which, AIUI, is the "shortest string it can get away with",
> preceded
> by a colon. So I would expect this to pick up just the "C", as it does
> with
> /([^:]*)$/.


The regexp matches from the left to the right, even if there is an
anchor on the right side of the string.

Thus the : first tries to match first : in your string, i.e the one
between A and B. Then .*? tries to match any number of chars, starting
from zero because of then ?. But if zero chars are matched, the $ fails.
So the regexp tries to make the number of characters matched by the .*?
longer and longer, and finally the $ matches. The regexp does not need
to go back and select the next : in this case.

..*? means: take as few chars as possible _at this position_
It does not mean: do backtracking and try to find if it could match
fewer chars at some other place in the string

So if you add .* to the beginning, you will get the last : in your string.
/.*:(.*?)$/
In this case the .* would try to eat as many chars as possible, then
search for a :. So this would try the last : first.

Anyway, you could also use (split /:/, 'A:B:C')[-1] here.

Cheers, Wolf

sln@netherlands.com 11-12-2010 03:28 PM

Re: if ('A:B:C' =~ /:(.*?)$/) then why the heck is $1 'B:C' and not just 'C'
 
On Fri, 12 Nov 2010 06:38:08 -0800 (PST), OwlHoot <ravensdean@googlemail.com> wrote:

>To repeat the title, in case it is munged by Google Groups:
>
> if ('A:B:C' =~ /:(.*?)$/) then why the heck is $1 'B:C' and not just
>'C'
>
>I've been developing with perl for years; but even simple things in it
>still
>sometimes throw up surprises.
>
>The regexp /:(.*?)$/ is anchored on the right by $, then comes a non-
>greedy
>match which, AIUI, is the "shortest string it can get away with",
>preceded
>by a colon. So I would expect this to pick up just the "C", as it does
>with
> /([^:]*)$/.
>


Its not the shortest, its the first to satisfy it.
It is anchored on the left and right. The regex is allowing
another ':' when it traverses the string from the left.
/:(.*)$/ has the same result without checking chars between the
first ':' and the end of string.

Notice that /:(.*?):/ does the same thing, it says get all between
the first ':' and the next ':'. However,
'A:B:C:D' =~ /:(.*):/
greedily grabs all between the first and last ':', but
'A:B:C:D' =~ /:(.*?):/
grabs only that between the first 2 ':'s.

Since there is only one end of line, it gets all between the first ':'
and end of line regardless of ?.

-sln

Keith Thompson 11-12-2010 04:44 PM

Re: if ('A:B:C' =~ /:(.*?)$/) then why the heck is $1 'B:C' and not just 'C'
 
Wolf Behrenhoff <NoSpamPleaseButThisIsValid3@gmx.net> writes:
> On 12.11.2010 15:38, OwlHoot wrote:
>> To repeat the title, in case it is munged by Google Groups:
>>
>> if ('A:B:C' =~ /:(.*?)$/) then why the heck is $1 'B:C' and not just
>> 'C'


You should ask your question in the body of your message anyway.
Newsreaders vary in how they display subject lines.

>> I've been developing with perl for years; but even simple things in
>> it still sometimes throw up surprises.
>>
>> The regexp /:(.*?)$/ is anchored on the right by $, then comes a non-
>> greedy match which, AIUI, is the "shortest string it can get away
>> with", preceded by a colon. So I would expect this to pick up just
>> the "C", as it does with
>> /([^:]*)$/.

>
> The regexp matches from the left to the right, even if there is an
> anchor on the right side of the string.
>

[more explanation snipped]
>
> Anyway, you could also use (split /:/, 'A:B:C')[-1] here.


Another possibility is
if ('A:B:C' =~ /:([^:]*)$/)

--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

C.DeRykus 11-12-2010 08:55 PM

Re: if ('A:B:C' =~ /:(.*?)$/) then why the heck is $1 'B:C' and notjust 'C'
 
On Nov 12, 8:44*am, Keith Thompson <ks...@mib.org> wrote:

....

>
> > Anyway, you could also use (split /:/, 'A:B:C')[-1] here.

>
> Another possibility is
> * * if ('A:B:C' =~ /:([^:]*)$/)
>


Yet another:

'A:B:C' =~ /.*:(.*)/;



--
Charles DeRykus



Uri Guttman 11-12-2010 09:19 PM

Re: if ('A:B:C' =~ /:(.*?)$/) then why the heck is $1 'B:C' and not just 'C'
 
>>>>> "O" == OwlHoot <ravensdean@googlemail.com> writes:

O> The regexp /:(.*?)$/ is anchored on the right by $, then comes a non-
O> greedy
O> match which, AIUI, is the "shortest string it can get away with",
O> preceded
O> by a colon. So I would expect this to pick up just the "C", as it does
O> with
O> /([^:]*)$/.

as others have said, you didn't get what ? does for quantifiers. perl
will match the leftmost working match. with a greedy quantifier, it will
continue to match chars until it fails and then stop. with the
non-greedy modifier ? it will stop after the first (and locally
shortest) match. it will not globally find the shortest possible match
anywhere in the string. so the key is remembering leftmost correct match
first and then short or greedy based on the modifier.

uri

--
Uri Guttman ------ uri@stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------

Xho Jingleheimerschmidt 11-13-2010 03:37 AM

Re: if ('A:B:C' =~ /:(.*?)$/) then why the heck is $1 'B:C' and notjust 'C'
 
OwlHoot wrote:
> To repeat the title, in case it is munged by Google Groups:
>
> if ('A:B:C' =~ /:(.*?)$/) then why the heck is $1 'B:C' and not just
> 'C'
>
> I've been developing with perl for years; but even simple things in it
> still
> sometimes throw up surprises.
>
> The regexp /:(.*?)$/ is anchored on the right by $, then


There is no "then". Being anchored at the end does not change the order
of evaluation (or at least, does not do so in a way that effects the
outcome--the optimized engine can do things in whatever order it wants,
as long as behaves as if it were done left to right.)


> comes a non-
> greedy


Really it is not non-greedy. It is still greedy, it just greedy for
less, rather than greedy for more. It it is still greedy because it
satisfies itself, without looking around at the "wants" of others.

> match which, AIUI, is the "shortest string it can get away with",
> preceded
> by a colon.


The colon is also greedy. It is greedy to match as far left as it can
get away with. And because it comes before the .*? does, its greed wins.

Xho

Peter J. Holzer 11-14-2010 10:49 AM

Re: if ('A:B:C' =~ /:(.*?)$/) then why the heck is $1 'B:C' and notjust 'C'
 
On 2010-11-13 03:37, Xho Jingleheimerschmidt <xhoster@gmail.com> wrote:
> Really it is not non-greedy. It is still greedy, it just greedy for
> less, rather than greedy for more. It it is still greedy because it
> satisfies itself, without looking around at the "wants" of others.


> The colon is also greedy. It is greedy to match as far left as it can
> get away with. And because it comes before the .*? does, its greed wins.


Please. "Greedy" in the context of regular expressions is a technical
term with a precisely defined meaning. You are not helping by inventing
a different meaning for the word based on its meaning in common English.

hp


Xho Jingleheimerschmidt 11-15-2010 12:33 AM

Re: if ('A:B:C' =~ /:(.*?)$/) then why the heck is $1 'B:C' and notjust 'C'
 
Peter J. Holzer wrote:
> On 2010-11-13 03:37, Xho Jingleheimerschmidt <xhoster@gmail.com> wrote:
>> Really it is not non-greedy. It is still greedy, it just greedy for
>> less, rather than greedy for more. It it is still greedy because it
>> satisfies itself, without looking around at the "wants" of others.

>
>> The colon is also greedy. It is greedy to match as far left as it can
>> get away with. And because it comes before the .*? does, its greed wins.

>
> Please. "Greedy" in the context of regular expressions is a technical
> term with a precisely defined meaning. You are not helping by inventing
> a different meaning for the word based on its meaning in common English.


Greedy is well defined in the field of computer science, and I am not
the one inventing new meanings for it.

Xho


All times are GMT. The time now is 02:59 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.