Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > Parens Matching

Reply
Thread Tools

Parens Matching

 
 
Daniel Pitts
Guest
Posts: n/a
 
      02-17-2007
On Feb 16, 5:14 pm, "Daniel Pitts" <(E-Mail Removed)>
wrote:
> On Feb 15, 7:34 pm, "Ken Kast" <(E-Mail Removed)> wrote:
>
> > Does anyone have a regular expression pattern that would include a test for
> > balanced parens of arbitrary nestedness?

>
> Two ways:
>
> public class NestedParens {
> public static void main(String[] args) {
> String[] strings = {
> "Test case 1",
> "(Test case 2)",
> "((Test) case ) 3",
> ")test case 4 (",
> };
> for (String string: strings) {
> System.out.println(string +
> " is " + (isBalancedRe(string)
> ?
> "Balanced" : "Mis-matched"));
> }
>
> for (String string: strings) {
> System.out.println(string +
> " is " + (isBalancedCount(string)
> ?
> "Balanced" : "Mis-matched"));
> }
> }
>
> public static boolean isBalancedRe(String string) {
> String last;
> do {
> last = string;
> string = string.replaceAll("\\([^()]*\\)", "");
> } while (!last.equals(string));
> return !string.matches(".*[()].*");
> }
>
> private static boolean isBalancedCount(String string) {
> int parens = 0;
> for (char c : string.toCharArray()) {
> switch (c) {
> case ')':
> if (--parens < 0) {
> return false;
> }
> break;
> case '(':
> ++parens;
> }
> }
> return parens == 0;
> }
>
> }



Or, a little more concisely:
public static boolean isBalancedRe(String string) {
while (string.matches(".*\\([^()]*\\).*")) {
string = string.replaceAll("\\([^()]*\\)", "");
}
return !string.matches(".*[()].*");
}

private static boolean isBalancedCount(String string) {
int parens = 0;
for (char c : string.toCharArray()) {
if (c == '(') {
++parens;
}
if (c == ')' && --parens < 0) {
return false;
}
}
return parens == 0;
}
....

Or, to get down right esoteric:
private static boolean isBalancedCount(String string) {
int parens = 0;
for (char c : string.toCharArray()) {
if ((c == '(' && ++ parens < 0) || c==')' && --parens < 0)
{
return false;
}
}
return parens == 0;
}

CafeBabe points to those who know why that works the way it does.

 
Reply With Quote
 
 
 
 
Daniel Pitts
Guest
Posts: n/a
 
      02-17-2007
On Feb 16, 5:26 pm, Alex Hunsley <(E-Mail Removed)> wrote:
> Michael wrote:
> > On Feb 15, 7:34 pm, "Ken Kast" <(E-Mail Removed)> wrote:
> >> Does anyone have a regular expression pattern that would include a test for
> >> balanced parens of arbitrary nestedness?

>
> > No. No one does.

>
> > There's a well known proof in Computer Science theory that it is not
> > possible to create a regular expression for balanced parentheses.

>
> > Look up "the pumping lemma" for details.

>
> That thang always made me think of either pumping lemons or pumping
> lemmings.
>
> The matched parenthesis problem can be done with a non-deterministic
> state machine, but not with a deterministic one.... hence no regular
> expression can do it.


Hmm, I don't think your definition is correct.
Regex isn't always implementedwith a DSM, it might use NDA, which
still doesn't solve it.

In either case, the following is a deterministic state machine solves
the problem.

private static boolean isBalancedCount(String string) {
int parens = 0;
for (char c : string.toCharArray()) {
if (c == '(') {
++parens;
}
if (c == ')' && --parens < 0) {
return false;
}
}
return parens == 0;
}

 
Reply With Quote
 
 
 
 
Ken Kast
Guest
Posts: n/a
 
      02-17-2007
Absolutely never thought I would generate so much conversation. I need to
sit down with a glass of wine and digest this.

I probably should have been more transparent in my original post. I'm not
trying to test a string for balanced parens. I need it to be a match
termination condition. In other words, I have a fairly complex pattern. So
it tells when a match is done. I need to say the match is done if the
character pattern is played out or the match will play out before I balance
parens.

I first wrote this in C#. .Net has extended RegExp that lets you swap named
groups on condition. This effectively lets you model a stack. So you push
left parens on the stack and pop them with right ones. Now I want to
implement the same app in Java. I can certainly assume a low level nesting,
on the order of 3 or 4, so running out of memory or running out of time
aren't concerns. I haven't thought it through, but I could probably do it
brute force by iterating thru a set of patterns. That would solve 6 sigma's
worth of real world situations. I was hoping for something more elegant.

Thanks to everyone for your thoughts. I certainly learned how lemmings have
influenced computer science.

Ken


"Ken Kast" <(E-Mail Removed)> wrote in message
news:45d525e7$0$28082$(E-Mail Removed)...
> Does anyone have a regular expression pattern that would include a test
> for balanced parens of arbitrary nestedness?
>



 
Reply With Quote
 
Lew
Guest
Posts: n/a
 
      02-17-2007
Alex Hunsley wrote:
> Although even this isn't correct, in theory... due to the finite memory
> of the computer, it is possible to give it a sequence that should pass
> (the brackets match), but won't pass, because the computer runs out of
> memory. Ok, a little silly, but...


Not so silly, really. It points to a valuable principle of workaday
programming, nay, engineering overall. When the theory says there is no
perfect algorithm, the practice suggests to use an imperfect one. You code to
handle the realistic cases and discontinuously reject inputs that exceed the
algorithm's ability, like the memory-hog example.

This way of thinking also leads one to jump out of "What regular expression?"
into "What algorithm?"

- Lew
 
Reply With Quote
 
Alex Hunsley
Guest
Posts: n/a
 
      02-17-2007
Daniel Pitts wrote:
> On Feb 16, 5:26 pm, Alex Hunsley <(E-Mail Removed)> wrote:
>> Michael wrote:
>>> On Feb 15, 7:34 pm, "Ken Kast" <(E-Mail Removed)> wrote:
>>>> Does anyone have a regular expression pattern that would include a test for
>>>> balanced parens of arbitrary nestedness?
>>> No. No one does.
>>> There's a well known proof in Computer Science theory that it is not
>>> possible to create a regular expression for balanced parentheses.
>>> Look up "the pumping lemma" for details.

>> That thang always made me think of either pumping lemons or pumping
>> lemmings.
>>
>> The matched parenthesis problem can be done with a non-deterministic
>> state machine, but not with a deterministic one.... hence no regular
>> expression can do it.

>
> Hmm, I don't think your definition is correct.
> Regex isn't always implementedwith a DSM, it might use NDA, which
> still doesn't solve it.


What do you mean by NDA? Non-deterministic automaton?

>
> In either case, the following is a deterministic state machine solves
> the problem.
>
> private static boolean isBalancedCount(String string) {
> int parens = 0;
> for (char c : string.toCharArray()) {
> if (c == '(') {
> ++parens;
> }
> if (c == ')' && --parens < 0) {
> return false;
> }
> }
> return parens == 0;
> }


Nope, it doesn't solve the problem. The int type in Java has finite
capacity - see Integer.MAX_VALUE - so more accurately this a finite
deterministic state machine. I'm actually trying to remember if
'deterministic state machine' usually implies the 'finite' part in
common usage. A look at wikipedia seems to suggest that, but I wasn't
looking at it for long...

So, anyway, I can think of an input that causes the above code to fail -
and hence problem not solved.

To give a simplified example of this, suppose we have a Java where
Integer.MAX_VALUE = 1, and .MIN_VALUE = -2. So an int can have 4
distinct values for int. Assuming this, would would your code save about
the following inputs?

(((( - this would be a false positive
(((((((( - another false positive
((())) - this would be a false negative


Similarly, your code above in the world of real Java would give a false
positive for the input:

'(' * [(Integer.MAX_VALUE - Integer.MIN_VALUE + 1) * n]

where n is any positive integer.

Basically, true parens matching is scuppered, because of the need for
infinite memory!

lex


 
Reply With Quote
 
Alex Hunsley
Guest
Posts: n/a
 
      02-17-2007
Lew wrote:
> Alex Hunsley wrote:
>> Although even this isn't correct, in theory... due to the finite memory
>> of the computer, it is possible to give it a sequence that should pass
>> (the brackets match), but won't pass, because the computer runs out of
>> memory. Ok, a little silly, but...

>
> Not so silly, really. It points to a valuable principle of workaday
> programming, nay, engineering overall. When the theory says there is no
> perfect algorithm, the practice suggests to use an imperfect one. You
> code to handle the realistic cases and discontinuously reject inputs
> that exceed the algorithm's ability, like the memory-hog example.
>
> This way of thinking also leads one to jump out of "What regular
> expression?" into "What algorithm?"


Good points. I have also touched on this subject with a reply I just
wrote to a post from Daniel. In Comp Sci. proper, when you're talking
about bracket matching, you want a machine that solves it for *any*
input. So in that sense, it can't be done by a normal real world machine...

lex


>
> - Lew

 
Reply With Quote
 
Alex Hunsley
Guest
Posts: n/a
 
      02-17-2007
Mark Jeffcoat wrote:
> "Michael" <(E-Mail Removed)> writes:
>
>> On Feb 15, 7:34 pm, "Ken Kast" <(E-Mail Removed)> wrote:
>>> Does anyone have a regular expression pattern that would include a test for
>>> balanced parens of arbitrary nestedness?

>> No. No one does.
>>
>> There's a well known proof in Computer Science theory that it is not
>> possible to create a regular expression for balanced parentheses.
>>

>
> Michael is quite right. Of course, that proof is silent
> on the subject of creating a regular expression that accepts
> balanced strings with at most Long.MAX_VALUE open parens.


Although in theory you could actually count this many with a single long:

Long.MAX_VALUE - Long.MIN_VALUE + 1

I have a feeling this expression is equal to:

- (2 * Long.MIN_VALUE)

 
Reply With Quote
 
Chris Uppal
Guest
Posts: n/a
 
      02-17-2007
Alex Hunsley wrote:

> Although even this isn't correct, in theory... due to the finite memory
> of the computer, it is possible to give it a sequence that should pass
> (the brackets match), but won't pass, because the computer runs out of
> memory. Ok, a little silly, but...


You can do it in constant space for arbitrary length inputs /if/ you are
allowed to use the input as working space too -- just replace '()' by '' until
there's nothing left.

A more elegant method would be to keep the unclosed-bracket count in unbounded
integer representation (BigInteger-style), stored in (overwriting) the portion
of the input sequence which has already been scanned.

-- chris


 
Reply With Quote
 
Chris Uppal
Guest
Posts: n/a
 
      02-17-2007
Alex Hunsley wrote:

> The matched parenthesis problem can be done with a non-deterministic
> state machine, but not with a deterministic one.... hence no regular
> expression can do it.


I would understand "non-deterministic state machine" to mean the
non-deterministic variant of finite state automaton). And the deterministic
and non-deterministic versions of FSAs are equivalent in power. Do you use the
term to mean something other form of automaton ?

-- chris




 
Reply With Quote
 
Chris Uppal
Guest
Posts: n/a
 
      02-17-2007
Daniel Pitts wrote:

> private static boolean isBalancedCount(String string) {
> int parens = 0;
> for (char c : string.toCharArray()) {
> if ((c == '(' && ++ parens < 0) || c==')' && --parens < 0)
> {
> return false;
> }
> }
> return parens == 0;
> }


I have rarely seen code I liked less. Well done

-- chris


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
regex extension to handle matching parens? ivo welch Ruby 5 01-24-2009 04:35 PM
How to avoid "f.close" (no parens) bug? Stephen Ferg Python 12 02-13-2004 06:40 AM
RE: How to avoid "f.close" (no parens) bug? Batista, Facundo Python 2 02-11-2004 09:28 PM
RE: How to avoid "f.close" (no parens) bug? Batista, Facundo Python 2 02-11-2004 07:54 PM
matching balanced parens ivo welch Perl Misc 3 12-27-2003 06:32 PM



Advertisments