Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > balanced paren regex's

Reply
Thread Tools

balanced paren regex's

 
 
ivowel@gmail.com
Guest
Posts: n/a
 
      06-17-2006

[posted earlier in perl.modules, but no answer.]

dear perl users: I want to write a function that extracts "ordinary"
subroutines from perl code. (an equivalent task is extracting all
macros from a latex file.) I am not trying to be too clever. let's
presume I can recognize subs because subs and only subs always start at
the first character of a line and are not anonymous. a sub is followed
by a name and can contain nested expressions.

I can do plain pattern matching to find the first occurance of the
first sub: '^sub \w+'. but now I am stuck. I need to continue
on with a Text::Balanced expression right after, and after the
text::balanced is done, continue on with my regex search (\G).

my $text=
"
{ text }
expressions
sub a {
if (1==1) { print "true"; } # if string or comment could contain
unbalanced paren, even better
}
more expressions
sub b {
if (0) { print "false" }
}
";

and I want to call a subroutine getnextsub() that first will return
"sub a {\n if (1==1) { print \"true\"; } # if string or
comment could contain unbalanced paren, even better\n }"
and on the next call will return
"sub b {\n if (0) { print \"false\" }\n }";

this must be a fairly common problem (e.g., extracting tex macro
arguments, etc.), but short of mimicking C in writing very low-level
paren counter subroutines on individual characters, I cannot see how to
solve this. I do understand the issue of how to treat nested subs, but
this right now is only a secondary concern.

help appreciated.

sincerely, /iaw

 
Reply With Quote
 
 
 
 
Dr.Ruud
Guest
Posts: n/a
 
      06-17-2006
schreef:

> I want to write a function that extracts "ordinary"
> subroutines from perl code.


Search CPAN on 'balanced' or on 'parse'.

See also
http://search.cpan.org/search?module=PPI

--
Affijn, Ruud

"Gewoon is een tijger."


 
Reply With Quote
 
 
 
 
Xicheng Jia
Guest
Posts: n/a
 
      06-17-2006
wrote:
[snip]
> and I want to call a subroutine getnextsub() that first will return
> "sub a {\n if (1==1) { print \"true\"; } # if string or
> comment could contain unbalanced paren, even better\n }"
> and on the next call will return
> "sub b {\n if (0) { print \"false\" }\n }";
>
> this must be a fairly common problem (e.g., extracting tex macro
> arguments, etc.), but short of mimicking C in writing very low-level
> paren counter subroutines on individual characters, I cannot see how to
> solve this. I do understand the issue of how to treat nested subs, but
> this right now is only a secondary concern.


Here is one way you may use(the iterator way from the book HOP):
#########################
use strict;
use warnings;

my $text= <<'END_TEST';
{ text }
expressions
sub a {
if (1==1) { print "true"; } # if string or comment could contain
unbalanced paren, even better
}
more expressions
sub b {
if (0) { print "false" }
}
safdsf
END_TEST

local our $n;
# pattern to track embedded braces
my $pattern = qr/
(?> (?{$n = 0})
(?:
[^{}]
|
\{ (?{$n++})
|
\} (?(?{$n != 0}) (?{$n--}) | (?!) )
)*
)(?(?{$n != 0})(?!))
/x;

# set the iterator
my $it = getnextsub($text);
my $count = 0;

# loop through the text and print out all functions
while (my $next_sub = $it->()) {
print "Subroutine-", ++$count, " is:\n${next_sub}\n\n";
}

# subroutine to set the iteratior
sub getnextsub {
my $text = shift;
return sub {
my $sub_def;
if ($text =~ s/(sub\s*\S+\s*{$pattern})//) {
$sub_def = $1;
}
$sub_def;
}
}
####################################

 
Reply With Quote
 
Ben Morrow
Guest
Posts: n/a
 
      06-17-2006

Quoth :
>
> [posted earlier in perl.modules, but no answer.]
>
> dear perl users: I want to write a function that extracts "ordinary"
> subroutines from perl code. (an equivalent task is extracting all
> macros from a latex file.) I am not trying to be too clever. let's
> presume I can recognize subs because subs and only subs always start at
> the first character of a line and are not anonymous. a sub is followed
> by a name and can contain nested expressions.
>
> I can do plain pattern matching to find the first occurance of the
> first sub: '^sub \w+'. but now I am stuck. I need to continue
> on with a Text::Balanced expression right after, and after the
> text::balanced is done, continue on with my regex search (\G).


You mentioned Text::Balanced; how does extract_codeblock not do what you
want?

Ben

--
The cosmos, at best, is like a rubbish heap scattered at random.
Heraclitus

 
Reply With Quote
 
Xicheng Jia
Guest
Posts: n/a
 
      06-17-2006
Xicheng Jia wrote:
> wrote:
> [snip]
> > and I want to call a subroutine getnextsub() that first will return
> > "sub a {\n if (1==1) { print \"true\"; } # if string or
> > comment could contain unbalanced paren, even better\n }"
> > and on the next call will return
> > "sub b {\n if (0) { print \"false\" }\n }";
> >
> > this must be a fairly common problem (e.g., extracting tex macro
> > arguments, etc.), but short of mimicking C in writing very low-level
> > paren counter subroutines on individual characters, I cannot see how to
> > solve this. I do understand the issue of how to treat nested subs, but
> > this right now is only a secondary concern.

>
> Here is one way you may use(the iterator way from the book HOP):
> #########################
> use strict;
> use warnings;
>
> my $text= <<'END_TEST';
> { text }
> expressions
> sub a {
> if (1==1) { print "true"; } # if string or comment could contain
> unbalanced paren, even better
> }
> more expressions
> sub b {
> if (0) { print "false" }
> }
> safdsf
> END_TEST
>
> local our $n;
> # pattern to track embedded braces
> my $pattern = qr/
> (?> (?{$n = 0})
> (?:
> [^{}]
> |
> \{ (?{$n++})
> |
> \} (?(?{$n != 0}) (?{$n--}) | (?!) )
> )*
> )(?(?{$n != 0})(?!))
> /x;
>
> # set the iterator
> my $it = getnextsub($text);
> my $count = 0;
>
> # loop through the text and print out all functions
> while (my $next_sub = $it->()) {
> print "Subroutine-", ++$count, " is:\n${next_sub}\n\n";
> }
>
> # subroutine to set the iteratior
> sub getnextsub {
> my $text = shift;
> return sub {
> my $sub_def;
> if ($text =~ s/(sub\s*\S+\s*{$pattern})//) {
> $sub_def = $1;
> }
> $sub_def;
> }
> }
> ####################################


BTW. you can make the subroutine "getnextsub" to skip any number of
function definitions.
#################
sub getnextsub {
my $text = shift;
return sub {
my $num_subs = shift || 1;
my ($sub_def, $cnt);
while ($text =~ s/(sub\s*\S+\s*{$pattern})//) {
if (++$cnt == $num_subs) {
$sub_def = $1;
last;
}
}
return $sub_def || "undefined\n";
}
}
################
# if you have subroutine definitions a, b, c, d, e, f
# and in that order, then

$it = getnextsub($text); # set the iterator
$next_sub = $it->(); #get sub a {...}
$next_sub = $it->(3); #get sub d {...}
$next_sub = $it->(); #get sub e {...}
$next_sub = $it->(2); #return "undefined"
$it = getnextsub($text); # reset the iterator

Xicheng

 
Reply With Quote
 
Xicheng Jia
Guest
Posts: n/a
 
      06-18-2006
Xicheng Jia wrote:
> Xicheng Jia wrote:
> > wrote:
> > [snip]
> > > and I want to call a subroutine getnextsub() that first will return
> > > "sub a {\n if (1==1) { print \"true\"; } # if string or
> > > comment could contain unbalanced paren, even better\n }"
> > > and on the next call will return
> > > "sub b {\n if (0) { print \"false\" }\n }";
> > >
> > > this must be a fairly common problem (e.g., extracting tex macro
> > > arguments, etc.), but short of mimicking C in writing very low-level
> > > paren counter subroutines on individual characters, I cannot see how to
> > > solve this. I do understand the issue of how to treat nested subs, but
> > > this right now is only a secondary concern.

> >
> > Here is one way you may use(the iterator way from the book HOP):
> > #########################
> > use strict;
> > use warnings;
> >
> > my $text= <<'END_TEST';
> > { text }
> > expressions
> > sub a {
> > if (1==1) { print "true"; } # if string or comment could contain
> > unbalanced paren, even better
> > }
> > more expressions
> > sub b {
> > if (0) { print "false" }
> > }
> > safdsf
> > END_TEST
> >
> > local our $n;
> > # pattern to track embedded braces
> > my $pattern = qr/
> > (?> (?{$n = 0})
> > (?:
> > [^{}]
> > |
> > \{ (?{$n++})
> > |
> > \} (?(?{$n != 0}) (?{$n--}) | (?!) )
> > )*
> > )(?(?{$n != 0})(?!))
> > /x;
> >
> > # set the iterator
> > my $it = getnextsub($text);
> > my $count = 0;
> >
> > # loop through the text and print out all functions
> > while (my $next_sub = $it->()) {
> > print "Subroutine-", ++$count, " is:\n${next_sub}\n\n";
> > }
> >
> > # subroutine to set the iteratior
> > sub getnextsub {
> > my $text = shift;
> > return sub {
> > my $sub_def;


=> > if ($text =~ s/(sub\s*\S+\s*{$pattern})//) {

should change from \s* to \s+, and \w+ is enough to replace \S+

if ($text =~ s/(sub\s+\w+\s*{$pattern})//) {

Xicheng

 
Reply With Quote
 
Xicheng Jia
Guest
Posts: n/a
 
      06-18-2006
Xicheng Jia wrote:
> Xicheng Jia wrote:

[snip]
> => > if ($text =~ s/(sub\s*\S+\s*{$pattern})//) {
>
> should change from \s* to \s+, and \w+ is enough to replace \S+
>
> if ($text =~ s/(sub\s+\w+\s*{$pattern})//) {
>

one more modification:

if ($text =~ s/.*?(sub\s+\w+\s*{$pattern})//) {

Xicheng

 
Reply With Quote
 
ivowel@gmail.com
Guest
Posts: n/a
 
      06-18-2006

thank you very much. regards, /iaw

 
Reply With Quote
 
ivowel@gmail.com
Guest
Posts: n/a
 
      06-18-2006

I am truly becoming greedy now. is there a good/clever way to keep
track on which lineno the match was made on (i.e., how many \n occurred
before)? [something similar to $., but in connection with a text
match.]

regards,

/iaw

 
Reply With Quote
 
Xicheng Jia
Guest
Posts: n/a
 
      06-18-2006
wrote:
> I am truly becoming greedy now. is there a good/clever way to keep
> track on which lineno the match was made on (i.e., how many \n occurred
> before)? [something similar to $., but in connection with a text
> match.]


sure you can. the key for this method (you might want to read [1] for
more introduction about iteration) is how to use "closure" in Perl
subroutines. I revised the previous subroutine again and fixed some
bugs. :

1) 's' modifier is added in the s/// expression, otherwise .*? can not
match multiple lines;
2) capture two parts: $1, and $2, and use something like $1 =~ tr/\n//;
to count the number of newlines in a substring.
3) "return" statement is revised so that you can use the iterator in a
while loop;
4) two variables added: $line_num to count newlines in the whole
matched text block. $lineno is the line_number containing the keyword
'sub' of your function declaration...

###################################
sub getnextsub {
my $text = shift;
my $line_num = 0;
return sub {
my $num_subs = shift || 1;
my ($sub_def, $cnt) = ("", 0);
while ($text =~ s/(.*?(sub\s*\S+\s*{$pattern}))//s) {
$line_num += ($1 =~ tr/\n//);
if (++$cnt == $num_subs) {
$sub_def = $2;
my $lineno = $line_num + 1 - ($sub_def =~ tr/\n//);
print "line_number is $lineno\n";
last;
}
}
print "undefined\n" if not $sub_def;
return $sub_def;
}
} # end of getnextsub #
###################################

don't know where you want the line numbers to go, so just print them
out.

Good luck
Xicheng

[1] "Higher-Order Perl: Transforming Programs with Programs", by M.J.
Dominus.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Whitespace before opening paren in function call? C. J. Clegg C Programming 2 03-07-2009 06:29 AM
Declaring varibale name in paren Josuan C++ 0 06-02-2008 08:56 PM
Balanced Scorecard Install =?Utf-8?B?Z3dlbnQ=?= Microsoft Certification 1 04-20-2005 05:55 AM
Problem in NLB - 2 server is not balanced =?Utf-8?B?U0s=?= Microsoft Certification 0 09-14-2004 07:55 AM
Method not recognizing PAREN when looping through character array Curts Java 1 08-21-2003 10:47 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57