Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Difference of * and + in regular expression

Reply
Thread Tools

Difference of * and + in regular expression

 
 
Peng Yu
Guest
Posts: n/a
 
      06-22-2008
Hi,

If I used the uncommented if-statement, I would get no match. If I
used the commend if statement otherwise, I would have the following
string as the output. I'm wondering why the regular expression with *
does not match anything?

namespace a { namespace b { namespace c {

Thanks,
Peng

$string="a namespace a { namespace b { namespace c { ";

#if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)+)/) {
if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)/) {
print "$1\$\n";
}
 
Reply With Quote
 
 
 
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      06-22-2008
Peng Yu wrote:
> If I used the uncommented if-statement, I would get no match.


Not true. $1 is defined, so the regex does match.

> $string="a namespace a { namespace b { namespace c { ";
>
> #if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)+)/) {
> if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)/) {
> print "$1\$\n";
> }


With the * quantifier, the regex seems to behave non-greedy, though.

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
 
Reply With Quote
 
 
 
 
John W. Krahn
Guest
Posts: n/a
 
      06-22-2008
Peng Yu wrote:
> Hi,
>
> If I used the uncommented if-statement, I would get no match. If I
> used the commend if statement otherwise, I would have the following
> string as the output. I'm wondering why the regular expression with *
> does not match anything?


It does match, it just doesn't match what you expected it to match.

> namespace a { namespace b { namespace c {
>
> Thanks,
> Peng
>
> $string="a namespace a { namespace b { namespace c { ";
>
> #if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)+)/) {
> if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)/) {
> print "$1\$\n";
> }


$ perl -e'
use re qw/ debug /;

my $string = "a namespace a { namespace b { namespace c { ";

if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)/) {
print "$1\$\n";
}
'
Compiling REx `\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)'
size 40 Got 324 bytes for offset annotations.
first at 1
1: STAR(3)
2: SPACE(0)
3: OPEN1(5)
5: CURLYX[0] {0,32767}(37)
7: OPEN2(9)
9: EXACT <namespace>(13)
13: PLUS(15)
14: SPACE(0)
15: ALNUM(16)
16: CURLYM[3] {0,32767}(2
20: BRANCH(22)
21: ALNUM(26)
22: BRANCH(24)
23: DIGIT(26)
26: SUCCEED(0)
27: NOTHING(2
28: STAR(30)
29: SPACE(0)
30: EXACT <{>(32)
32: STAR(34)
33: SPACE(0)
34: CLOSE2(36)
36: WHILEM[1/2](0)
37: NOTHING(3
38: CLOSE1(40)
40: END(0)
minlen 0
Offsets: [40]
3[1] 1[2] 4[1] 0[0] 37[1] 0[0] 5[1] 0[0] 6[9] 0[0] 0[0] 0[0]
17[1] 15[2] 18[2] 27[1] 0[0] 20[1] 0[0] 20[1] 21[2] 23[1] 24[2] 26[1]
0[0] 27[0] 27[0] 30[1] 28[2] 31[2] 0[0] 35[1] 33[2] 36[1] 0[0] 37[0]
37[0] 38[1] 0[0] 39[0]
Matching REx "\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)" against "a
namespace a { namespace b { namespace c { "
Setting an EVAL scope, savestack=5
0 <> <a namespace > | 1: STAR
SPACE can match 0 times out of 2147483647...
Setting an EVAL scope, savestack=5
0 <> <a namespace > | 3: OPEN1
0 <> <a namespace > | 5: CURLYX[0] {0,32767}
0 <> <a namespace > | 36: WHILEM[1/2]
0 out of 0..32767 cc=bfa0d330
Setting an EVAL scope, savestack=15
0 <> <a namespace > | 7: OPEN2
0 <> <a namespace > | 9: EXACT <namespace>
failed...
restoring \1 to -1(0)..-1(no)
restoring \1..\3 to undef
failed, try continuation...
0 <> <a namespace > | 37: NOTHING
0 <> <a namespace > | 38: CLOSE1
0 <> <a namespace > | 40: END
Match successful!
$
Freeing REx: `"\\s*((namespace\\s+\\w(\\w|\\d)*\\s*\\{\\s*)* )"'


You see where it says "Match successful!", that means that the
expression (namespace\s+\w(\w|\d)*\s*\{\s*)* matched zero times.

Also, the expression \w(\w|\d)* could be simplified to \w+.


John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall
 
Reply With Quote
 
Ben Morrow
Guest
Posts: n/a
 
      06-22-2008

Quoth Peng Yu <(E-Mail Removed)>:
>
> If I used the uncommented if-statement, I would get no match. If I
> used the commend if statement otherwise, I would have the following
> string as the output. I'm wondering why the regular expression with *
> does not match anything?
>
> namespace a { namespace b { namespace c {
>
> $string="a namespace a { namespace b { namespace c { ";
>
> #if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)+)/) {
> if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)/) {


'Match earlier in the string' beats 'match longest', even with greedy
matching, and since your regex will match the empty string the first
match is right before the first 'a'.

Ben

--
You poor take courage, you rich take care:
The Earth was made a common treasury for everyone to share
All things in common, all people one.
'We come in peace'---the order came to cut them down. [(E-Mail Removed)]
 
Reply With Quote
 
Peng Yu
Guest
Posts: n/a
 
      06-22-2008
On Jun 21, 9:39 pm, Gunnar Hjalmarsson <(E-Mail Removed)> wrote:
> Peng Yu wrote:
> > If I used the uncommented if-statement, I would get no match.

>
> Not true. $1 is defined, so the regex does match.
>
> > $string="a namespace a { namespace b { namespace c { ";

>
> > #if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)+)/) {
> > if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)/) {
> > print "$1\$\n";
> > }

>
> With the * quantifier, the regex seems to behave non-greedy, though.


According to the manual, *? is non-greedy.
Why * is also non-greedy?

Thanks,
Peng
 
Reply With Quote
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      06-22-2008
Peng Yu wrote:
> On Jun 21, 9:39 pm, Gunnar Hjalmarsson <(E-Mail Removed)> wrote:
>> Peng Yu wrote:
>>> If I used the uncommented if-statement, I would get no match.

>> Not true. $1 is defined, so the regex does match.
>>
>>> $string="a namespace a { namespace b { namespace c { ";
>>> #if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)+)/) {
>>> if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)/) {
>>> print "$1\$\n";
>>> }

>> With the * quantifier, the regex seems to behave non-greedy, though.

>
> According to the manual, *? is non-greedy.
> Why * is also non-greedy?


I don't know, sorry. Maybe the answer can be derived from John's more
extensive explanation.

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
 
Reply With Quote
 
Tad J McClellan
Guest
Posts: n/a
 
      06-22-2008
Peng Yu <(E-Mail Removed)> wrote:
> On Jun 21, 9:39 pm, Gunnar Hjalmarsson <(E-Mail Removed)> wrote:
>> Peng Yu wrote:
>> > If I used the uncommented if-statement, I would get no match.

>>
>> Not true. $1 is defined, so the regex does match.
>>
>> > $string="a namespace a { namespace b { namespace c { ";

>>
>> > #if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)+)/) {
>> > if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)/) {
>> > print "$1\$\n";
>> > }

>>
>> With the * quantifier, the regex seems to behave non-greedy, though.

>
> According to the manual, *? is non-greedy.
> Why * is also non-greedy?



Greediness is not involved here.

(Greedy vs. non-greedy never changes whether a match will succeed or fail.
It is simply a "tie breaker" used when the regex engine can match more
than one way at the current pos()ition.
)

There are 2 primary issues with this OP's problem: writing a pattern
where everything is optional, and that regexes match as early as possible
from left to right.

If you write a pattern where everything is optional, then it will match
the empty string, which in turn means that it would match *every* string
you can think of.

The left-to-right evaluation of the pattern seems to be buried
a bit in perlre.pod:

The above recipes describe the ordering of matches I<at a given position>.
One more rule is needed to understand how a match is determined for the
whole regular expression: a match at an earlier position is always better
than a match at a later position.


--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
 
Reply With Quote
 
comp.llang.perl.moderated
Guest
Posts: n/a
 
      06-23-2008
On Jun 22, 8:00 am, Tad J McClellan <(E-Mail Removed)> wrote:
> Peng Yu <(E-Mail Removed)> wrote:
> > On Jun 21, 9:39 pm, Gunnar Hjalmarsson <(E-Mail Removed)> wrote:
> >> Peng Yu wrote:
> >> > If I used the uncommented if-statement, I would get no match.

>
> >> Not true. $1 is defined, so the regex does match.

>
> >> > $string="a namespace a { namespace b { namespace c { ";

>
> >> > #if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)+)/) {
> >> > if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)/) {
> >> > print "$1\$\n";
> >> > }

>
> >> With the * quantifier, the regex seems to behave non-greedy, though.

>
> > According to the manual, *? is non-greedy.
> > Why * is also non-greedy?

>
> Greediness is not involved here.
>
> (Greedy vs. non-greedy never changes whether a match will succeed or fail.
> It is simply a "tie breaker" used when the regex engine can match more
> than one way at the current pos()ition.
> )
>
> There are 2 primary issues with this OP's problem: writing a pattern
> where everything is optional, and that regexes match as early as possible
> from left to right.
>
> If you write a pattern where everything is optional, then it will match
> the empty string, which in turn means that it would match *every* string
> you can think of.
>
> The left-to-right evaluation of the pattern seems to be buried
> a bit in perlre.pod:
>
> The above recipes describe the ordering of matches I<at a given position>.
> One more rule is needed to understand how a match is determined for the
> whole regular expression: a match at an earlier position is always better
> than a match at a later position.
>


I still prefer to think of this as another
aspect of greediness: * can be greedy
but only as greedy as needed to get the
earliest match. Thus, even greed embraces the cardinal Perl virtue of
laziness....

--
Charles DeRykus
 
Reply With Quote
 
Ted Zlatanov
Guest
Posts: n/a
 
      06-23-2008
On Sun, 22 Jun 2008 20:41:02 -0700 (PDT) "comp.llang.perl.moderated" <(E-Mail Removed)> wrote:

clpm> I still prefer to think of this as another aspect of greediness: *
clpm> can be greedy but only as greedy as needed to get the earliest
clpm> match. Thus, even greed embraces the cardinal Perl virtue of
clpm> laziness....

I'd call that opportunism, not laziness.

"The two cardinal virtues of Perl are TMTOWTDI and laziness and
opportunism... No, no. The THREE cardinal virtues of Perl are TMTOWTDI
and laziness and opportunism and DWIM... DAMN IT... The FOUR cardinal
virtues of Perl are... etc."

Ted
 
Reply With Quote
 
xhoster@gmail.com
Guest
Posts: n/a
 
      06-23-2008
Peng Yu <(E-Mail Removed)> wrote:
> On Jun 21, 9:39 pm, Gunnar Hjalmarsson <(E-Mail Removed)> wrote:
> > Peng Yu wrote:
> > > If I used the uncommented if-statement, I would get no match.

> >
> > Not true. $1 is defined, so the regex does match.
> >
> > > $string="a namespace a { namespace b { namespace c { ";

> >
> > > #if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)+)/) {
> > > if ($string =~ /\s*((namespace\s+\w(\w|\d)*\s*\{\s*)*)/) {
> > > print "$1\$\n";
> > > }

> >
> > With the * quantifier, the regex seems to behave non-greedy, though.

>
> According to the manual, *? is non-greedy.
> Why * is also non-greedy?


It depends on what you mean. "Greedy" in CS generally means you make
locally optimal decisions, rather than looking for globally optimal ones.
But what is considered "optimal" in the local matching of a regex?

In this sense, it is greedy either way, in that it still optimizes locally
rather than globally. It is just that what we consider optimal changes
with the addition of ?.

At this point, perhaps they revert from a CS meaning to a moral/political
meaning--greedy no longer means local vs. global, now it means as much as
possible vs. as little as possible.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Seek xpath expression where an attribute name is a regular expression GIMME XML 3 12-29-2008 03:11 PM
C/C++ language proposal: Change the 'case expression' from "integral constant-expression" to "integral expression" Adem C++ 42 11-04-2008 12:39 PM
Matching abitrary expression in a regular expression =?iso-8859-1?B?bW9vcJk=?= Java 8 12-02-2005 12:51 AM
Regular expression difference in NN. Guru Javascript 4 07-22-2004 12:00 PM
Dynamically changing the regular expression of Regular Expression validator VSK ASP .Net 2 08-24-2003 02:47 PM



Advertisments