Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > regex @a = m / | /g and captures?

Reply
Thread Tools

regex @a = m / | /g and captures?

 
 
Bill
Guest
Posts: n/a
 
      10-17-2003
Hello, I've got a regex question.

In the following, the use of () in an 'or' type regex causes @a to
hold both captures, so for each pass through the regex, one capture
and one undef is stored.

Can this be prevented and still use () captures and '|' in the regex?


>>>>>>>>>>>


my $s = '1 2 {3, 3, 3} 4';

my @a = $s =~ m/\{[^\}]+\}|\d/g;

print "\nWithout captures:\n", join "\n", @a;

@a = $s =~ m/(\{[^\}]+\})|(\d)/g;
foreach(@a) { $_ = 'undef' unless $_; }
print "\n\nNow with captures:\n", join "\n", @a;

<<<<<<<<<<<
 
Reply With Quote
 
 
 
 
Steve Grazzini
Guest
Posts: n/a
 
      10-17-2003
Bill <(E-Mail Removed)> wrote:
> In the following, the use of () in an 'or' type regex causes @a to
> hold both captures, so for each pass through the regex, one capture
> and one undef is stored.
>
> Can this be prevented and still use () captures and '|' in the regex?


Put the parentheses around the entire expression.

> @a = $s =~ m/(\{[^\}]+\})|(\d)/g;


/( { [^}]+ } | \d )/xg;

But (as you already know) you don't need the parens at all in
this case.

--
Steve
 
Reply With Quote
 
 
 
 
Tad McClellan
Guest
Posts: n/a
 
      10-17-2003
Bill <(E-Mail Removed)> wrote:

> In the following, the use of () in an 'or' type regex causes @a to
> hold both captures, so for each pass through the regex, one capture
> and one undef is stored.
>
> Can this be prevented and still use () captures and '|' in the regex?



> @a = $s =~ m/(\{[^\}]+\})|(\d)/g;


grep() is handy when you need to filter a list:

my @a = grep defined, $s =~ m/(\{[^\}]+\})|(\d)/g;


--
Tad McClellan SGML consulting
http://www.velocityreviews.com/forums/(E-Mail Removed) Perl programming
Fort Worth, Texas
 
Reply With Quote
 
Bill
Guest
Posts: n/a
 
      10-18-2003
Steve Grazzini <(E-Mail Removed)> wrote in message news:<iXUjb.33$(E-Mail Removed)>...
>
> Put the parentheses around the entire expression.
>
> > @a = $s =~ m/(\{[^\}]+\})|(\d)/g;

>
> /( { [^}]+ } | \d )/xg;
>


Oh yes, of course! Cool.
But I think that I simplified the code I was revising too far.

What about this (we want the numbers not the separators):
>>>>>>>>>>>


my $s = '1; 2; {3, 3, 3}; 4;';

my @a = $s =~ m/\{[^\}]+\};|\d;/g;

print "\nWithout captures:\n", join "\n", @a;

@a = $s =~ m/(\{[^\}]+\});|(\d);/g;
foreach(@a) { $_ = 'undef' unless $_; }
print "\n\nNow with captures:\n", join "\n", @a;

<<<<<<<<<<<

It seems that either I have to chop the answers here or filter undefs,
as Tad suggests?
 
Reply With Quote
 
Quantum Mechanic
Guest
Posts: n/a
 
      10-18-2003
(E-Mail Removed) (Bill) wrote in message news:<(E-Mail Removed). com>...
> > /( { [^}]+ } | \d )/xg;
> >

>
> Oh yes, of course! Cool.
> But I think that I simplified the code I was revising too far.
>
> What about this (we want the numbers not the separators):
> >>>>>>>>>>>

>
> my $s = '1; 2; {3, 3, 3}; 4;';
>
> my @a = $s =~ m/\{[^\}]+\};|\d;/g;


Then move the common elements (semi-colon) out of the alternation. In
this case, they can be moved out of the capture as well:

/( { [^}]+ } | \d );/xg;

But you haven't stated whether the semi-colons are always there, or
meaningful. If they have no meaning, you can go with the previous
version:

> > /( { [^}]+ } | \d )/xg;


-QM
 
Reply With Quote
 
Bill
Guest
Posts: n/a
 
      10-18-2003
(E-Mail Removed) (Quantum Mechanic) wrote in message news:<(E-Mail Removed). com>...

> > >>>>>>>>>>>

> >
> > my $s = '1; 2; {3, 3, 3}; 4;';
> >
> > my @a = $s =~ m/\{[^\}]+\};|\d;/g;

>
> Then move the common elements (semi-colon) out of the alternation. In
> this case, they can be moved out of the capture as well:
>
> /( { [^}]+ } | \d );/xg;
>
> But you haven't stated whether the semi-colons are always there, or
> meaningful. If they have no meaning, you can go with the previous
> version:
>
> > > /( { [^}]+ } | \d )/xg;

>
> -QM


So, I guess the answer in general is just to find a way to rewrite the
regex so that there is only one capture. It's good that regexes are so
flexible. Thanks
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Regex testing and UTF8 awarenes or Regex and numeric pattern matching sln@netherlands.com Perl Misc 2 03-10-2009 03:51 AM
How make regex that means "contains regex#1 but NOT regex#2" ?? seberino@spawar.navy.mil Python 3 07-01-2008 03:06 PM
String Pattern Matching: regex and Python regex documentation Xah Lee Python 8 09-26-2006 03:24 PM
String Pattern Matching: regex and Python regex documentation Xah Lee Perl Misc 2 09-25-2006 03:15 AM
String Pattern Matching: regex and Python regex documentation Xah Lee Java 1 09-22-2006 07:11 PM



Advertisments