Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > "my" variables and recursive regexp strangeness

Reply
Thread Tools

"my" variables and recursive regexp strangeness

 
 
Ian
Guest
Posts: n/a
 
      05-13-2004
I have something strange happening with a recursive regexp compiled
with qr//x; It is a regular expression to match individual single
double and un-quoted strings, i.e. "string", 'string' and string.

It works fine when the sub parts of it are global variables, or
"local" variables, but if I change them to "my" variables, suddenly
they stop matching correctly (or at least start matching differently).

Anybody have any ideas why changing to "my" variables would affect it
this way?

I get the same behaviour using active perl 5.6.1, and perl 5.81 on
knoppix.

Other things I'd like to know if anybody has any idea are:
Is there a simpler way to regexp this kind of thing?
Why does perl crash with some recursive regexps?
Is there any particular reason for the warning generated when this
script is run using perl -W

I use test input something like:
aaa bbb "ccc"'ddd"ddd'"eee'eee" f\ \ \ ff

Here's the program, if you change the first two vars to "my" variables
it stops working. changing others don't seem to affect it.


#!perl

# Double-quoted-string data regexp
$dStringData = qr/
([^"\\]|\\.)+ (??{$dStringData})
|
"
/x;

# Single-quoted-string data regexp
$sStringData = qr/
([^'\\]|\\.)+ (??{$sStringData})
|
'
/x;

# Characters that are allowed in unquoted strings
$token = qr/([^\s\\'"]|\\.)/x;

# Unquoted-strings broken up by spaces regexp
$uStringData = qr/
(??{$token})+ (??{$uStringData})
|
\B|\b
/x;

# Matches single or double, single or unquoted strings
$string = qr/
(
(??{$token}) (??{$uStringData})
|
" (??{$dStringData})
|
' (??{$sStringData})
)
/x;

# Test program to identify "STRING"s or 'STRING's or STRINGs in the
input

while (<>) {
my @strings;

# remove them all one by one
while (/$string/) {
push @strings, $1;
s/$string//;
}

# print out of all them one by one
my $counter = 0;
foreach (@strings) {
print "$counter = [$_]\n";
$counter ++;
}
}
 
Reply With Quote
 
 
 
 
Anno Siegel
Guest
Posts: n/a
 
      05-13-2004
Ian <(E-Mail Removed)> wrote in comp.lang.perl.misc:
> I have something strange happening with a recursive regexp compiled
> with qr//x; It is a regular expression to match individual single
> double and un-quoted strings, i.e. "string", 'string' and string.
>
> It works fine when the sub parts of it are global variables, or
> "local" variables, but if I change them to "my" variables, suddenly
> they stop matching correctly (or at least start matching differently).
>
> Anybody have any ideas why changing to "my" variables would affect it
> this way?


It isn't the fact that they're lexical, but you apparently tried
to declare the variables in the same statement that uses them,
as in

my $dStringData = qr/
([^"\\]|\\.)+ (??{$dStringData})
|
"
/x;

You can't use a lexical in the same statement that declares it.
Use an extra "my" statement, and it works.

It would have been better to post the erroneous code, instead of
saying "if I change this, it doesn't work anymore". That way
we wouldn't have to guess your error.

Anno
 
Reply With Quote
 
 
 
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      05-13-2004
Ian wrote:
> I have something strange happening with a recursive regexp compiled
> with qr//x; It is a regular expression to match individual single
> double and un-quoted strings, i.e. "string", 'string' and string.
>
> It works fine when the sub parts of it are global variables, or
> "local" variables, but if I change them to "my" variables, suddenly
> they stop matching correctly (or at least start matching
> differently).


You'd better my() declare those variables before they are used:

my $dStringData;
$dStringData = qr/

etc. (Otherwise it's too late.)

> Other things I'd like to know if anybody has any idea are: Is there
> a simpler way to regexp this kind of thing?


This would do something similar:

my $token = qr/[^\s\\'"]|\\./;
while (<>) {
my @strings;
push @strings, $+ while /('[^']*')|("[^"]*")|($token+)/g;
print "$_ = [$strings[$_]]\n" for 0..$#strings;
}

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

 
Reply With Quote
 
Jeff 'japhy' Pinyan
Guest
Posts: n/a
 
      05-13-2004
[posted & mailed]

On 13 May 2004, Ian wrote:

>Anybody have any ideas why changing to "my" variables would affect it
>this way?


Someone has already answered this. You can't declare and use the lexical
variable on the same line.

my $rx;
$rx = qr/...(??{ $rx }).../;

But there's another issue here.

># Double-quoted-string data regexp
>$dStringData = qr/
> ([^"\\]|\\.)+ (??{$dStringData})
> |
> "
> /x;


># Matches single or double, single or unquoted strings
>$string = qr/
> (
> " (??{$dStringData})
> )
> /x;


I've stripped out everything but the double-quoted regexes. WHY are these
recursive? I don't see the value of that at all. Why not just

$dStringData = qr{ (?: [^"\\] | \\. )+ }xs;
$string = qr{ " $dStringData " }x;

$dStringData is not gaining anything by being recursive, since once the
non-closing-quote stuff matches, the next thing that will match *is* the
closing quote. So it "recurses" once. Unless, of course, you never match
a closing quote, in which case your regex tries a whole bunch of
permutations before failing.

Run this code:

print "slow\n";
$rx = qr{ (?: [^\\"] | \\. )+ (??{ $rx }) | " }x;
q{"this thing is too slow} =~ m{ " (??{ $rx }) }x;
print "done\n\n";

print "fast\n";
$rx = qr{ (?: [^\\"] | \\. )+ }x;
q{"this thing is too slow} =~ m{ " $rx " }x;
print "done\n\n";

You'll see the bottom one is MUCH MUCH faster. The reason the top one is
slow is because after it fails the first time, the (?:...)+ part
backtracks a bit, and then the (??{ $rx }) can match the part it didn't
match, and then it tries to match a " and fails, and it does this more and
more and more. Every character you add to that string results in a
quadratically longer wait. I took out the "!" at the end of the string
because I got impatient!

And you needn't put $rx inside (??{ ... }) in the outermost regex; it
works fine by itself.

--
Jeff Pinyan RPI Acacia Brother #734 RPI Acacia Corp Secretary
"And I vos head of Gestapo for ten | Michael Palin (as Heinrich Bimmler)
years. Ah! Five years! Nein! No! | in: The North Minehead Bye-Election
Oh. Was NOT head of Gestapo AT ALL!" | (Monty Python's Flying Circus)

 
Reply With Quote
 
Jeff 'japhy' Pinyan
Guest
Posts: n/a
 
      05-13-2004
On Thu, 13 May 2004, Jeff 'japhy' Pinyan wrote:

>Run this code:
>
> print "slow\n";
> $rx = qr{ (?: [^\\"] | \\. )+ (??{ $rx }) | " }x;
> q{"this thing is too slow} =~ m{ " (??{ $rx }) }x;
> print "done\n\n";


This becomes MUCH MUCH faster if you change $rx to

$rx = qr{ (?: [^\\"] | \\. ) (??{ $rx }) | " }x;

Note that there is no + quantifier on the (?:...) group.

--
Jeff Pinyan RPI Acacia Brother #734 RPI Acacia Corp Secretary
"And I vos head of Gestapo for ten | Michael Palin (as Heinrich Bimmler)
years. Ah! Five years! Nein! No! | in: The North Minehead Bye-Election
Oh. Was NOT head of Gestapo AT ALL!" | (Monty Python's Flying Circus)

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[regexp] How to convert string "/regexp/i" to /regexp/i - ? Joao Silva Ruby 16 08-21-2009 05:52 PM
regexp strangeness Dale Amon Python 3 04-11-2009 02:12 AM
Recursive functions Vs Non-recursive functions - performance aspect vamsi C Programming 21 03-09-2009 10:53 PM
Two recursive calls inside of a recursive function n00m C++ 12 03-13-2008 03:18 PM
recursive brace matching with Ruby regexp Jason Sweat Ruby 17 11-06-2004 01:44 PM



Advertisments