On 12 Juli, 04:18, Xicheng Jia <xich...@gmail.com> wrote:
> On Jul 11, 11:09 pm, "attn.steven....@gmail.com"
>
> > The latter won't work for a target string like:
> > $_ = '123,456,789';
> > (i.e., anything with an odd number of comma delimited substrings).
> > You can try global match (//g) in scalar context:
> > $_ = "1,234,567,890";
>
> > my $n = 0;
> > while (/\G(\d{1,3})(?:,|$)/g)
>
> this should be the same as:
>
> while (/\G(\d{1,3}),?/g)
>
Ohh, now I'm beginning to see the logic...

The /(\d{1,3})(?:,
(\d{3}))*/g rexexp captured repeated productions, not repeated groups.
So, to sum up. I can't use /(\d{1,3})(?:,(\d\d\d))*/ because the RE
engine only save captured repeated groups for the last iteration. The
fix is to use g-modifier to capture repeated productions... the
subject of this thread should really have been "capturing repeated
productions", right?
Ideally, /(\d{1,3})|(?<=\d{1,3}),(\d\d\d)/g would work, but (?<=
\d{1,3}) is not implemented yet, so I ended up writing:
@parts = ();
(@parts = grep { defined $_ }
m((\d{1,3})
# (?<=\d{1,3}) not implemented, use three cases
| (?<=\d),(\d\d\d)
| (?<=\d\d),(\d\d\d)
| (?<=\d\d\d),(\d\d\d)
)xg) && do {
my $number = 0;
$number = $number * 1000 + $_ foreach (@parts);
print "$number\n";
};
It uses a "Schwartzian transformation" to filter out undef captures,
which I suppose comes from alternation cases.