Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Strange behavior of 'Alternative capture group numbering'

Reply
Thread Tools

Strange behavior of 'Alternative capture group numbering'

 
 
Raymundo
Guest
Posts: n/a
 
      01-01-2012
Hello,

At first, I'm sorry that I'm not good at English.

I'm reading "perlretut" (Perl Regular Expression Tutorial) of version
5.14 now:
http://perldoc.perl.org/perlretut.html

While I was reading "Alternative capture group numbering" section,
I wrote a simple test program to practice it myself.

I'm using Strawberry Perl 5.12.3 on Windows XP.

Here is my code:
-----
#!perl
use strict;
use warnings;

while (1) {
my $input = <STDIN>;
chomp $input;
if ( $input =~ /(?|(a)(b)|(c))(d)/ ) {
print "1[$1] 2[$2] 3[$3]\n";
}
}
-----

Here is the result:
-----
abd
1[a] 2[b] 3[d]
cd
Use of uninitialized value $2 in concatenation (.) or string at d:\Temp
\test.pl line 13, <STDIN> line 2.
1[c] 2[] 3[d]
----

Okay. This is what I expected and what the document said. 'd' is
assigned to $3 because the maximum number in the alternative numbering
group is 2.

Then I modified the pattern, only changing the order of two group in
the alternative numbering group:
-----
if ( $input =~ /(?|(c)|(a)(b))(d)/ ) {
-----
This is the result:
-----
abd
Use of uninitialized value $3 in concatenation (.) or string at d:\Temp
\test.pl line 13, <STDIN> line 1.
1[a] 2[d] 3[]
cd
Use of uninitialized value $3 in concatenation (.) or string at d:\Temp
\test.pl line 13, <STDIN> line 2.
1[c] 2[d] 3[]
----

I have no idea why the result differs from the first one.
Why 'd' is in $2, not $3? Where did 'b' of 'abd' go after matching?

Is this a bug? Or is there something that I misunderstand?

Any help would be appreciated.
Thank you.

 
Reply With Quote
 
 
 
 
sln@netherlands.com
Guest
Posts: n/a
 
      01-01-2012
On Sun, 1 Jan 2012 08:59:44 -0800 (PST), Raymundo <(E-Mail Removed)> wrote:

>Hello,
>
>At first, I'm sorry that I'm not good at English.
>
>I'm reading "perlretut" (Perl Regular Expression Tutorial) of version
>5.14 now:
>http://perldoc.perl.org/perlretut.html
>
>While I was reading "Alternative capture group numbering" section,
>I wrote a simple test program to practice it myself.
>
>I'm using Strawberry Perl 5.12.3 on Windows XP.
>
>Here is my code:
>-----
>#!perl
>use strict;
>use warnings;
>
>while (1) {
> my $input = <STDIN>;
> chomp $input;
> if ( $input =~ /(?|(a)(b)|(c))(d)/ ) {
> print "1[$1] 2[$2] 3[$3]\n";
> }
>}
>-----
>
>Here is the result:
>-----
>abd
>1[a] 2[b] 3[d]
>cd
>Use of uninitialized value $2 in concatenation (.) or string at d:\Temp
>\test.pl line 13, <STDIN> line 2.
>1[c] 2[] 3[d]
>----
>
>Okay. This is what I expected and what the document said. 'd' is
>assigned to $3 because the maximum number in the alternative numbering
>group is 2.
>
>Then I modified the pattern, only changing the order of two group in
>the alternative numbering group:
>-----
> if ( $input =~ /(?|(c)|(a)(b))(d)/ ) {
>-----
>This is the result:
>-----
>abd
>Use of uninitialized value $3 in concatenation (.) or string at d:\Temp
>\test.pl line 13, <STDIN> line 1.
>1[a] 2[d] 3[]
>cd
>Use of uninitialized value $3 in concatenation (.) or string at d:\Temp
>\test.pl line 13, <STDIN> line 2.
>1[c] 2[d] 3[]
>----
>
>I have no idea why the result differs from the first one.
>Why 'd' is in $2, not $3? Where did 'b' of 'abd' go after matching?
>
>Is this a bug? Or is there something that I misunderstand?
>


Its probably not a bug if you had to program branch reset code,
because the whole thing is buggy and tends to crash at the drop of
a hat.

Using the regex debug mechanism some observations can be noted.
The last branch-reset alternation is labled BRANCH (FAIL).
Apparently, the number of capture buffers in this branch is
NOT counted when calculating the largest number of buffers.
Therefore, the # capture buffer after the branch-reset is the
largest of the branches BEFORE the last branch.

Example:

(?|
(x) ()
|
(c)
|
(a) (b) (r)
)
(d)

Produces this code:

1: BRANCH (13)
2: OPEN1 (4)
4: EXACT <x> (6)
6: CLOSE1 (
8: OPEN2 (11)
10: NOTHING (11)
11: CLOSE2 (40)
13: BRANCH (20)
14: OPEN1 (16)
16: EXACT <c> (1
18: CLOSE1 (40)
20: BRANCH (FAIL)
21: OPEN1 (23)
23: EXACT <a> (25)
25: CLOSE1 (27)
27: OPEN2 (29)
29: EXACT <b> (31)
31: CLOSE2 (33)
33: OPEN3 (35)
35: EXACT <r> (37)
37: CLOSE3 (40)
39: TAIL (40)
40: OPEN3 (42)
42: EXACT <d> (44)
44: CLOSE3 (46)
46: END (0)

You can see that (d) is capture buffer 3, but it should be 4.

So the simple solution is that the largest number of capture buffers
should not be in the last branch.

There are a couple of ways around this.

1 - Pad a different branch with a NOTHING capture group.
(?|
(c) ()
| (a)(b)
)
(d)

or,

2 - Move the largest number of captures into another branch.
(?|
(a)(b)
| (c)
)
(d)

This is just an observation that seems to hold true.
In my mind, branch-reset in Perl or any PCRE engine is just
one big bug, and should be avoided.

-sln
 
Reply With Quote
 
 
 
 
Raymundo
Guest
Posts: n/a
 
      01-01-2012
On 1월2일, 오*7시16분, Ben Morrow <(E-Mail Removed)> wrote:
> Quoth (E-Mail Removed):
>
>
> It looks to me like a bug in perl, and it appears to have been fixed in
> 5.14.
>
> If you have any other instances of (?|) causing problems (that persist
> in 5.14), and certainly if you have any examples of crashes, you should
> report them with perlbug.
>
> Ben



Thank you, sln and Ben.

I've posted the same question on my twitter, and received replies
saying
that 5.14 shows correct results. One of my follows sent me this link:
http://perl5.git.perl.org/perl.git/c...c5d73550e0248c


Happy New Year~

G.Y.Park from South Korea
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Screen Capture With Mouse , Mouse Position Capture Max Java 7 08-08-2009 11:51 PM
looking for the right Usenet group which covers video capture cards michaele@ando.pair.com Computer Support 8 05-16-2009 02:06 PM
SuperVideoCap work as a broadcast capture and screen capture and record tool. hely0123 Media 0 10-30-2007 08:59 AM
[Regexp] Howto capture all matches of a single group ersin.er@gmail.com Java 3 10-03-2005 10:11 AM
undefined behavior or not undefined behavior? That is the question Mantorok Redgormor C Programming 70 02-17-2004 02:46 PM



Advertisments