Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > greedy v. non-greedy matching

Reply
Thread Tools

greedy v. non-greedy matching

 
 
Matt Garrish
Guest
Posts: n/a
 
      02-16-2004
Would anynoe care to enlighten me as to why the (.*?) pattern matches
greedily in the following example:

my $text =<<TEXT;
I wouldn't expect the following text to match
xyz 12345 abc
but it does and I lose this text as well
xyz 12345 abc
xyz 12345 abc
xyz 12345 abc
TEXT

$text =~ s/(xyz(.*?)abc\s*)+$//s;

print $text;


But if I change the regex to:

$text =~ s/(xyz(.*?)abc\s*)\1+$//s;

It works as expected.

Matt


 
Reply With Quote
 
 
 
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      02-16-2004
Matt Garrish wrote:
> Would anynoe care to enlighten me as to why the (.*?) pattern
> matches greedily in the following example:
>
> my $text =<<TEXT;
> I wouldn't expect the following text to match
> xyz 12345 abc
> but it does and I lose this text as well
> xyz 12345 abc
> xyz 12345 abc
> xyz 12345 abc
> TEXT
>
> $text =~ s/(xyz(.*?)abc\s*)+$//s;


It doesn't. Making it non-greedy does not change the fact that it
matches the *first occurrence* of the pattern.

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

 
Reply With Quote
 
 
 
 
Anno Siegel
Guest
Posts: n/a
 
      02-16-2004
Matt Garrish <(E-Mail Removed)> wrote in comp.lang.perl.misc:
> Would anynoe care to enlighten me as to why the (.*?) pattern matches
> greedily in the following example:
>
> my $text =<<TEXT;
> I wouldn't expect the following text to match


[...]

Greedy vs. non-greedy never decides *if* a pattern matches, it can only
modify *what* it matches. So your expectation is unjustified.

Anno
 
Reply With Quote
 
fifo
Guest
Posts: n/a
 
      02-16-2004
At 2004-02-16 07:52 -0500, Matt Garrish wrote:
> Would anynoe care to enlighten me as to why the (.*?) pattern matches
> greedily in the following example:
>
> my $text =<<TEXT;
> I wouldn't expect the following text to match
> xyz 12345 abc
> but it does and I lose this text as well
> xyz 12345 abc
> xyz 12345 abc
> xyz 12345 abc
> TEXT
>
> $text =~ s/(xyz(.*?)abc\s*)+$//s;
>
> print $text;
>


You're trying to match the sub-expression /(xyz(.*?)abc\s*)/ repeatedly,
up to end of the string.

This initially matches the first "xyz 12345 abc\n", but this isn't
followed by either the end of the string, nor by something that matches
the expression again. Hence we have to backtrack, and we find that if
we use the /(.*?)/ part to match a bit more of the string, the
expression will next match this:

xyz 12345 abc
but it does and I lose this text as well
xyz 12345 abc

Now this _is_ followed by two more "xyz 12345 abc\n" strings, each of
which also matches the above sub-expression so we're done.

>
> But if I change the regex to:
>
> $text =~ s/(xyz(.*?)abc\s*)\1+$//s;
>
> It works as expected.
>


This expression requires that whatever it is that matches
/(xyz(.*?)abc\s*)/ is repeated verbatim (at least once) upto the end of
the string. This doesn't happen when that sub-expression matches the
"but it does" line, since this doesn't occur subsequently.
 
Reply With Quote
 
Matt Garrish
Guest
Posts: n/a
 
      02-16-2004

"Anno Siegel" <(E-Mail Removed)-berlin.de> wrote in message
news:c0qgod$for$(E-Mail Removed)-Berlin.DE...
> Matt Garrish <(E-Mail Removed)> wrote in comp.lang.perl.misc:
> > Would anynoe care to enlighten me as to why the (.*?) pattern matches
> > greedily in the following example:
> >
> > my $text =<<TEXT;
> > I wouldn't expect the following text to match

>
> [...]
>
> Greedy vs. non-greedy never decides *if* a pattern matches, it can only
> modify *what* it matches. So your expectation is unjustified.
>


Yeah, it was too early in the morning to be thinking about regexes. I was
thinking that the outer grouping would limit the match to multiple instance
of "xyz...abc" to the end of the string, instead of still finding the first
"xyz" to the last "abc".

Matt


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Greedy and non greedy quantifiers Dan Kelly Ruby 4 01-19-2008 08:36 PM
regexp non-greedy matching bug? Sam Pointon Python 8 12-05-2005 08:31 AM
Pyparsing: Non-greedy matching? Peter Fein Python 2 12-31-2004 03:22 AM
Regex again - non-greedy matching kaeli Java 3 05-07-2004 07:36 PM
matching a sentence, greedy up! Christian Buck Python 1 08-11-2003 09:03 AM



Advertisments