Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > Question about Quantifiers in java Regular expression

Reply
Thread Tools

Question about Quantifiers in java Regular expression

 
 
NeoGeoSNK
Guest
Posts: n/a
 
      03-02-2008
Hello,
I have learned Java Regular expression for a long time, but still
confused about Quantifiers:

import java.util.regex.*;
public class NRGRegex{
public static void main(String[] args){
Pattern p = Pattern.compile("a??");
String a = "aaa";
Matcher m = p.matcher(a);
while(m.find()){
System.out.println("found char = " + m.group() + " at " + m.start()
+ " and " + m.end()); }
}
}

the output result is:
found char = at 0 and 0
found char = at 1 and 1
found char = at 2 and 2
found char = at 3 and 3
here "a??" is Reluctant quantifiers but why all char 'a' not match
successful?

when I use greedy quantifiers Pattern p = Pattern.compile("a?");
the output result is:
found char = a at 0 and 1
found char = a at 1 and 2
found char = a at 2 and 3
found char = at 3 and 3

I think greedy quantifiers first eat whole string "aaa" at a time,
but why the emtry char at (0,0) (1,1) (2,2) can't match successful
compare with Reluctant quantifiers ?

Thanks!
 
Reply With Quote
 
 
 
 
Joshua Cranmer
Guest
Posts: n/a
 
      03-02-2008
NeoGeoSNK wrote:
> Hello,
> I have learned Java Regular expression for a long time, but still
> confused about Quantifiers:
>
> import java.util.regex.*;
> public class NRGRegex{
> public static void main(String[] args){
> Pattern p = Pattern.compile("a??");
> String a = "aaa";
> Matcher m = p.matcher(a);
> while(m.find()){
> System.out.println("found char = " + m.group() + " at " + m.start()
> + " and " + m.end()); }
> }
> }
>
> the output result is:
> found char = at 0 and 0
> found char = at 1 and 1
> found char = at 2 and 2
> found char = at 3 and 3
> here "a??" is Reluctant quantifiers but why all char 'a' not match
> successful?


The definition of "a?" means that either a is matched or it isn't.
Without a quantifier, it attempts to match a first and only omit the a
when it can't match. However, you specified the reluctant quantifier,
which makes the `?' operator attempt to not match first.

Psuedocode for "a?":
try to match `a' and then the rest of the regex
if match fails:
try to match nothing and rest of regex
return result of match
else:
return true

For "a??":
try to match nothing and then the rest of the regex
if match fails:
try to match `a' and rest of regex
return result of match
else:
return true

Since "a??" is the full regex, the first attempt (to match nothing) will
succeed at every point, and the fall back of matching `a' will never occur.

> when I use greedy quantifiers Pattern p = Pattern.compile("a?");
> the output result is:
> found char = a at 0 and 1
> found char = a at 1 and 2
> found char = a at 2 and 3
> found char = at 3 and 3
>
> I think greedy quantifiers first eat whole string "aaa" at a time,
> but why the emtry char at (0,0) (1,1) (2,2) can't match successful
> compare with Reluctant quantifiers ?


Greedy means, essentially, to assume that a match will work and only
unmatch a character if it doesn't work. Reluctant quantifiers will
attempt to match the rest of the regex and only match more if it has to.

A typical example is this:
Finding a closing parenthesis in an arithmetic expression (can't handle
nested):
"(1+4)*5-6/(1+9)": the obvious regex "\\(.*\\)" will match the entire
string, whereas "\\(.*?\\)" will match only "(1+4)".

If you want to match "aaa", the regex "a*" or "a+" will do so.

Finally, there is the possessive quantifier, which refuses to backtrack
on failed matches. I can imagine that there are times when this would be
helpful, but none that I can think of off the top of my head...

--
Beware of bugs in the above code; I have only proved it correct, not
tried it. -- Donald E. Knuth
 
Reply With Quote
 
 
 
 
NeoGeoSNK
Guest
Posts: n/a
 
      03-03-2008
On Mar 3, 2:40 am, Joshua Cranmer <Pidgeo...@verizon.invalid> wrote:
> NeoGeoSNK wrote:
> > Hello,
> > I have learned Java Regular expression for a long time, but still
> > confused about Quantifiers:

>
> > import java.util.regex.*;
> > public class NRGRegex{
> > public static void main(String[] args){
> > Pattern p = Pattern.compile("a??");
> > String a = "aaa";
> > Matcher m = p.matcher(a);
> > while(m.find()){
> > System.out.println("found char = " + m.group() + " at " + m.start()
> > + " and " + m.end()); }
> > }
> > }

>
> > the output result is:
> > found char = at 0 and 0
> > found char = at 1 and 1
> > found char = at 2 and 2
> > found char = at 3 and 3
> > here "a??" is Reluctant quantifiers but why all char 'a' not match
> > successful?

>
> The definition of "a?" means that either a is matched or it isn't.
> Without a quantifier, it attempts to match a first and only omit the a
> when it can't match. However, you specified the reluctant quantifier,
> which makes the `?' operator attempt to not match first.
>
> Psuedocode for "a?":
> try to match `a' and then the rest of the regex
> if match fails:
> try to match nothing and rest of regex
> return result of match
> else:
> return true
>
> For "a??":
> try to match nothing and then the rest of the regex
> if match fails:
> try to match `a' and rest of regex
> return result of match
> else:
> return true
>
> Since "a??" is the full regex, the first attempt (to match nothing) will
> succeed at every point, and the fall back of matching `a' will never occur.
>
> > when I use greedy quantifiers Pattern p = Pattern.compile("a?");
> > the output result is:
> > found char = a at 0 and 1
> > found char = a at 1 and 2
> > found char = a at 2 and 3
> > found char = at 3 and 3

>
> > I think greedy quantifiers first eat whole string "aaa" at a time,
> > but why the emtry char at (0,0) (1,1) (2,2) can't match successful
> > compare with Reluctant quantifiers ?

>
> Greedy means, essentially, to assume that a match will work and only
> unmatch a character if it doesn't work. Reluctant quantifiers will
> attempt to match the rest of the regex and only match more if it has to.
>
> A typical example is this:
> Finding a closing parenthesis in an arithmetic expression (can't handle
> nested):
> "(1+4)*5-6/(1+9)": the obvious regex "\\(.*\\)" will match the entire
> string, whereas "\\(.*?\\)" will match only "(1+4)".
>
> If you want to match "aaa", the regex "a*" or "a+" will do so.
>
> Finally, there is the possessive quantifier, which refuses to backtrack
> on failed matches. I can imagine that there are times when this would be
> helpful, but none that I can think of off the top of my head...
>
> --
> Beware of bugs in the above code; I have only proved it correct, not
> tried it. -- Donald E. Knuth



Thanks, It's very clear,
> The definition of "a?" means that either a is matched or it isn't.
> Without a quantifier, it attempts to match a first and only omit the a
> when it can't match. However, you specified the reluctant quantifier,
> which makes the `?' operator attempt to not match first.

so do you mean:
X? meaning X,once or not at all
but
X?? meaning not at all or X,once

one question is:
> "(1+4)*5-6/(1+9)": the obvious regex "\\(.*\\)" will match the entire
> string, whereas "\\(.*?\\)" will match only "(1+4)".
>

I have test it, and "\\(.*?\\)" match both (1+4) and (1+9), why do you
think it only match (1+4) ?

Thanks for your repay again.

 
Reply With Quote
 
Lars Enderin
Guest
Posts: n/a
 
      03-03-2008
NeoGeoSNK skrev:
> On Mar 3, 2:40 am, Joshua Cranmer <Pidgeo...@verizon.invalid> wrote:
>> NeoGeoSNK wrote:
>>> Hello,
>>> I have learned Java Regular expression for a long time, but still
>>> confused about Quantifiers:
>>> import java.util.regex.*;
>>> public class NRGRegex{
>>> public static void main(String[] args){
>>> Pattern p = Pattern.compile("a??");
>>> String a = "aaa";
>>> Matcher m = p.matcher(a);
>>> while(m.find()){
>>> System.out.println("found char = " + m.group() + " at " + m.start()
>>> + " and " + m.end()); }
>>> }
>>> }

>>

> one question is:
>> "(1+4)*5-6/(1+9)": the obvious regex "\\(.*\\)" will match the entire
>> string, whereas "\\(.*?\\)" will match only "(1+4)".
>>

> I have test it, and "\\(.*?\\)" match both (1+4) and (1+9), why do you
> think it only match (1+4) ?
>

That regexp matches first (1+4), then (1+9). The other regexp matches
from the first ( up to and including the last ), once.
 
Reply With Quote
 
Joshua Cranmer
Guest
Posts: n/a
 
      03-03-2008
NeoGeoSNK wrote:
> On Mar 3, 2:40 am, Joshua Cranmer <Pidgeo...@verizon.invalid> wrote:
>> The definition of "a?" means that either a is matched or it isn't.
>> Without a quantifier, it attempts to match a first and only omit the a
>> when it can't match. However, you specified the reluctant quantifier,
>> which makes the `?' operator attempt to not match first.

> so do you mean:
> X? meaning X,once or not at all
> but
> X?? meaning not at all or X,once


Right.

>> "(1+4)*5-6/(1+9)": the obvious regex "\\(.*\\)" will match the entire
>> string, whereas "\\(.*?\\)" will match only "(1+4)".
>>

> I have test it, and "\\(.*?\\)" match both (1+4) and (1+9), why do you
> think it only match (1+4) ?


Oops, I should have been clearer. "(1+9)" will be matched as well. What
I had intended to say was that the first match would not match the whole
string but merely the indicated substring.

--
Beware of bugs in the above code; I have only proved it correct, not
tried it. -- Donald E. Knuth
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Resolving Quantifiers of Sub-Expression Quantifiers: ie: /( ( ( foo\w* )+? | ( fle\w+)* ){0,5} ) )? bar/xg; ? sln@netherlands.com Perl Misc 0 02-18-2009 12:48 AM
JavaScript RegExp Quantifiers Nathan Sokalski ASP .Net 2 06-13-2008 07:51 AM
rexular expression quantifiers Johnathan Smith Ruby 4 01-07-2008 07:33 PM
With regex, accessing multiple groups under quantifiers valan.wood@gmail.com Java 1 09-09-2007 10:26 PM
Dynamically changing the regular expression of Regular Expression validator VSK ASP .Net 2 08-24-2003 02:47 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57