Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Perl Misc (http://www.velocityreviews.com/forums/f67-perl-misc.html)
-   -   Regular expression to match surrounding parenthesis (http://www.velocityreviews.com/forums/t887185-regular-expression-to-match-surrounding-parenthesis.html)

Bob 07-07-2004 06:23 PM

Regular expression to match surrounding parenthesis
 
Hi,

I am trying to create a regular expression to verify that user entered
data is surrounded by the same number of open and closed parenthesis.

For example: if 'a' was the expression I was trying to match then a,
(a), ((a)), (((a)))... (((((((a))))))) would all be valid.

I am not new to regular expressions but I am also not an expert. I
have spent hours searching for a solution but no luck.

Is this possible and if so any help would be appreciated?

Thanks
Bob

Gunnar Hjalmarsson 07-07-2004 06:47 PM

Re: Regular expression to match surrounding parenthesis
 
Bob wrote:
> I am trying to create a regular expression to verify that user
> entered data is surrounded by the same number of open and closed
> parenthesis.
>
> For example: if 'a' was the expression I was trying to match then
> a, (a), ((a)), (((a)))... (((((((a))))))) would all be valid.


What about nesting?

The CPAN module Text::Balanced might be helpful.

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

Peter J. Acklam 07-07-2004 07:19 PM

Re: Regular expression to match surrounding parenthesis
 
bob_nf@hotmail.com (Bob) wrote:

> I am trying to create a regular expression to verify that user entered
> data is surrounded by the same number of open and closed parenthesis.
>
> For example: if 'a' was the expression I was trying to match then a,
> (a), ((a)), (((a)))... (((((((a))))))) would all be valid.
>
> I am not new to regular expressions but I am also not an expert. I
> have spent hours searching for a solution but no luck.


In stead of verifying directly that the input is correct, it it
probably simpler to remove everything you know is correct and see
if there is anything left:

1 while $input =~ s/\([^()]*\)//g;
print $input =~ /[()]/ ? "bad" : "ok";

The first line removes all matching, possibly nested, parantheses
and the second line checks to see if there are any parentheses
left.

Peter

--
#!/local/bin/perl5 -wp -*- mode: cperl; coding: iso-8859-1; -*-
# matlab comment stripper (strips comments from Matlab m-files)
s/^((?:(?:[])}\w.]'+|[^'%])+|'[^'\n]*(?:''[^'\n]*)*')*).*/$1/x;

Peter J. Acklam 07-07-2004 11:05 PM

Re: Regular expression to match surrounding parenthesis
 
Abigail <abigail@abigail.nl> wrote:

> Peter J. Acklam (pjacklam@online.no) wrote:
> ``
> `` In stead of verifying directly that the input is correct, it it
> `` probably simpler to remove everything you know is correct and see
> `` if there is anything left:
> ``
> `` 1 while $input =~ s/\([^()]*\)//g;
> `` print $input =~ /[()]/ ? "bad" : "ok";
> ``
> `` The first line removes all matching, possibly nested, parantheses
> `` and the second line checks to see if there are any parentheses
> `` left.
>
> The algorithm may be 'simpler' by some standard, it's not
> efficient. Worst case, the algorithm takes time quadratic in
> the length of the input.


True, but the context here was validating user input, suggesting
interactive use, which in turn means that this is probably going
to be run relatively rarely and not thousands of time inside a
loop.

Of course, a better regex or a module could be used, or...

sub isvalid {
local $_ = shift;
my $level = 0;
while (length) {
s/^[^()]+//; # non-parantheses
if (s/^\(//) { # opening parenthesis
++$level;
next;
}
if (s/^\)//) { # closing parenthesis
return '' if --$level < 0;
next;
}
}
return $level == 0;
}

Peter

--
#!/local/bin/perl5 -wp -*- mode: cperl; coding: iso-8859-1; -*-
# matlab comment stripper (strips comments from Matlab m-files)
s/^((?:(?:[])}\w.]'+|[^'%])+|'[^'\n]*(?:''[^'\n]*)*')*).*/$1/x;

Josef Moellers 07-08-2004 06:35 AM

Re: Regular expression to match surrounding parenthesis
 
Bob wrote:
> Hi,
>
> I am trying to create a regular expression to verify that user entered
> data is surrounded by the same number of open and closed parenthesis.
>
> For example: if 'a' was the expression I was trying to match then a,
> (a), ((a)), (((a)))... (((((((a))))))) would all be valid.
>
> I am not new to regular expressions but I am also not an expert. I
> have spent hours searching for a solution but no luck.
>
> Is this possible and if so any help would be appreciated?


In a nutshell: no, this is impossible. (<- that is a definite period!)

Regular expressions cannot parse this kind of things since finite state
automata, which implement the matching of strings to regular
expressions, cannot count and have no storage (that's why they are
called "finite state") and you need to count the number of opening
parens before you can validate that there are as many closing parens.

This is from the theory of automata. Practical implementations may allow
this but the question was whether regular expressions can do this.
--
Josef Möllers (Pinguinpfleger bei FSC)
If failure had no penalty success would not be a prize
-- T. Pratchett


Anno Siegel 07-08-2004 09:48 AM

Re: Regular expression to match surrounding parenthesis
 
Peter J. Acklam <pjacklam@online.no> wrote in comp.lang.perl.misc:

[checking paren balance]

> Of course, a better regex or a module could be used, or...
>
> sub isvalid {
> local $_ = shift;
> my $level = 0;
> while (length) {
> s/^[^()]+//; # non-parantheses
> if (s/^\(//) { # opening parenthesis
> ++$level;
> next;
> }
> if (s/^\)//) { # closing parenthesis
> return '' if --$level < 0;
> next;
> }
> }
> return $level == 0;
> }


Instead of consuming the string bit by bit you could extract just the
parens and loop over them. That simplifies things a bit:

sub isvalid {
my $level = 0;
( $level += $_ eq '(' ? 1 : -1 ) >= 0 or return !!0 for
shift =~ /([()])/g;
$level == 0;
}

Anno

Anno Siegel 07-08-2004 10:13 AM

Re: Regular expression to match surrounding parenthesis
 
Abigail <abigail@abigail.nl> wrote in comp.lang.perl.misc:
> Josef Moellers (josef.moellers@fujitsu-siemens.com) wrote on MMMCMLXIV
> September MCMXCIII in <URL:news:ccipqq$2gp$1@nntp.fujitsu-siemens.com>:
> -- Bob wrote:


> -- > I am trying to create a regular expression to verify that user entered
> -- > data is surrounded by the same number of open and closed parenthesis.


[...]

> -- In a nutshell: no, this is impossible. (<- that is a definite period!)
> --
> -- Regular expressions cannot parse this kind of things since finite state
> -- automata, which implement the matching of strings to regular
> -- expressions, cannot count and have no storage (that's why they are
> -- called "finite state") and you need to count the number of opening
> -- parens before you can validate that there are as many closing parens.
> --
> -- This is from the theory of automata. Practical implementations may allow
> -- this but the question was whether regular expressions can do this.
>
>
> It all depends on your definition of 'regular expressions'. In the context
> of Perl, a 'regular expression' is something that can be grokked by Perls
> regular expression engine. And with those regular expressions, you *can*
> matched strings with balanced parenthesis.
>
> It would be very inpractical if discussion on the forum would refer to
> "expressions that can be grokked by Perl's regular expression engine"
> instead of what's understood by everyone but a few pedants, "regular
> expressions".


The mathematicians who work with regular expressions are just a club of
pedants?

"Regular expression" has two significantly different definitions in
mathematics and in CS. Of course, the CS meaning is the primary one
on clpm, but pointing out the difference is not merely pedantry.

Anno

Sam Holden 07-08-2004 10:32 PM

Re: Regular expression to match surrounding parenthesis
 
On 8 Jul 2004 10:13:02 GMT,
Anno Siegel <anno4001@lublin.zrz.tu-berlin.de> wrote:
> Abigail <abigail@abigail.nl> wrote in comp.lang.perl.misc:
>> Josef Moellers (josef.moellers@fujitsu-siemens.com) wrote on MMMCMLXIV
>> September MCMXCIII in <URL:news:ccipqq$2gp$1@nntp.fujitsu-siemens.com>:
>> -- Bob wrote:

>
>> -- > I am trying to create a regular expression to verify that user entered
>> -- > data is surrounded by the same number of open and closed parenthesis.

>
> [...]
>
>> -- In a nutshell: no, this is impossible. (<- that is a definite period!)

[snip FSA explanation]
>> -- This is from the theory of automata. Practical implementations may allow
>> -- this but the question was whether regular expressions can do this.
>>
>>
>> It all depends on your definition of 'regular expressions'. In the context
>> of Perl, a 'regular expression' is something that can be grokked by Perls
>> regular expression engine. And with those regular expressions, you *can*
>> matched strings with balanced parenthesis.
>>
>> It would be very inpractical if discussion on the forum would refer to
>> "expressions that can be grokked by Perl's regular expression engine"
>> instead of what's understood by everyone but a few pedants, "regular
>> expressions".

>
> The mathematicians who work with regular expressions are just a club of
> pedants?


I'm sure those mathematicians who happen to read clpm know the
difference between the two regular expression definitions and never
think for even a moment that a post on clpm that mentions "regular
expression" means anything but "expressions that can be grokked by
Perl's regular expression engine", hence they aren't in the "few
pedants" set above.

>
> "Regular expression" has two significantly different definitions in
> mathematics and in CS. Of course, the CS meaning is the primary one
> on clpm, but pointing out the difference is not merely pedantry.


But answering "No" to can a regular expression match X, where X can be
done by perl's non-regular regular expressions but not by regular
regular expressions seems pedantic to me.

Sure, it's fun to show off that you know all about FSAs and PDAs and TMs
but since this is clpm they are all pretty much irrelevant to the
question. Hence the showing off should be accompanied by an answer in
perl, after all someone else will know an answer and can do the showing
themselves.

The OP clearly didn't mean "can a mathematical regular expression do
this?", they meant "can a perl regular expression do this?", hence the
answer of "No" is deceptive. I seriously doubt anyone would ask in clpm
about the properties of a real honest to goodness regular expressions.

In fact I have seen people criticised for asking about less powerful
regular expressions here, asking for a regular expression to match X and
then saying that the answer won't work because grep doesn't understand
all that. Which seems to be the other side of the coin.

--
Sam Holden

Ilya Zakharevich 07-09-2004 05:35 PM

Re: Regular expression to match surrounding parenthesis
 
[A complimentary Cc of this posting was sent to
Anno Siegel
<anno4000@lublin.zrz.tu-berlin.de>], who wrote in article <ccj6ne$f4b$1@mamenchi.zrz.TU-Berlin.DE>:
> The mathematicians who work with regular expressions are just a club of
> pedants?


Can't answer this question; never saw a mathematician who works with
regular expressions. 1/5 ;-)

But in general, [with a few exceptions] mathematicians do not mind the
same word having different meanings in different contexts. Too few
words, too many things to work over....

[For an extreme example, 'less than' may have a different meaning if
written by a French.]

Hope this helps,
Ilya

Anno Siegel 07-09-2004 07:21 PM

Re: Regular expression to match surrounding parenthesis
 
Ilya Zakharevich <nospam-abuse@ilyaz.org> wrote in comp.lang.perl.misc:
> [A complimentary Cc of this posting was sent to
> Anno Siegel
> <anno4000@lublin.zrz.tu-berlin.de>], who wrote in article
> <ccj6ne$f4b$1@mamenchi.zrz.TU-Berlin.DE>:
> > The mathematicians who work with regular expressions are just a club of
> > pedants?

>
> Can't answer this question; never saw a mathematician who works with
> regular expressions. 1/5 ;-)


Okay, but some work with grammars and like to distinguish regular grammars,
which are defined using regular expressions.

This is, BTW, the reason why the mathematical notion of regular expressions
was never amended with backreferences, the way computer implementations are.
In the theory of grammars and languages, it is the distinguishing property
of regularity that some constructs (like nested parentheses, but even
simpler ones) cannot be parsed. Extending the capabilities of regular
expressions would spoil the game.

> But in general, [with a few exceptions] mathematicians do not mind the
> same word having different meanings in different contexts. Too few
> words, too many things to work over....


Not at all. Mathematics has practiced operator overloading long before
it was named thus.

Nor do I have a problem with "regular expression" meaning two different
things in different disciplines. But the case is not quite like "index"
meaning two essentially unrelated things to a mathematician and a librarian.
The computer notion of "regex" was derived from the earlier mathematical
model and is essentially (down to the notation) the same thing.

Being practitioners, computer people soon invented shortcuts and extended
notations in their hackish ways. Many of these (like character classes,
or {m,n} quantifiers) are inessential and don't change the power of
what a regex can do, only the ease of expressing it. The introduction
of backreferences *does* change the expressive power of regexes, in a
way that was, and still is, deliberately excluded from the mathematical
definition. That can lead to contradictory statements about "regular
expressions", which, depending on who is speaking, are both correct.

This particular relationship deserves an explanation when it comes up.

Anno


All times are GMT. The time now is 11:20 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.