Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Regex with a varrying number of captures

Reply
Thread Tools

Regex with a varrying number of captures

 
 
Joe Gottman
Guest
Posts: n/a
 
      06-18-2005

I am parsing a file with several lines of the form
keyword : value1 value2 ... valueN

What is the easiest way for me to write a regex that will capture all of the
values? my first pass was
/^ \s* keyword \s* : (?: \s* (\w+) \b)+/x

but this only captures the last value.

Joe Gottman


 
Reply With Quote
 
 
 
 
Jürgen Exner
Guest
Posts: n/a
 
      06-18-2005
Joe Gottman wrote:
> I am parsing a file with several lines of the form
> keyword : value1 value2 ... valueN
>
> What is the easiest way for me to write a regex that will capture all
> of the values?


Well, why do you want to use a regexp? A simple
my ($keyword, undef, @values) = split / /,$line;
should do the job much easier and faster.

jue


 
Reply With Quote
 
 
 
 
Brian McCauley
Guest
Posts: n/a
 
      06-18-2005


Joe Gottman wrote:

> I am parsing a file with several lines of the form
> keyword : value1 value2 ... valueN
>
> What is the easiest way for me to write a regex that will capture all of the
> values? my first pass was
> /^ \s* keyword \s* : (?: \s* (\w+) \b)+/x


The easiest way is not to try. Do it in two steps.

It can be done in one step using (?{}) but that's way more complex.

In this specific case there are alternative ways if you are willing to
presume (or have already verified) that the input conforms.

Eg.

/(\w+)/g; # Then discard the first

 
Reply With Quote
 
John W. Krahn
Guest
Posts: n/a
 
      06-18-2005
Jürgen Exner wrote:
> Joe Gottman wrote:
>
>>I am parsing a file with several lines of the form
>> keyword : value1 value2 ... valueN
>>
>>What is the easiest way for me to write a regex that will capture all
>>of the values?

>
> Well, why do you want to use a regexp? A simple
> my ($keyword, undef, @values) = split / /,$line;
> should do the job much easier and faster.


That *does* use a regexp.


John
--
use Perl;
program
fulfillment
 
Reply With Quote
 
Jürgen Exner
Guest
Posts: n/a
 
      06-18-2005
John W. Krahn wrote:
> Jürgen Exner wrote:
>> Joe Gottman wrote:
>>
>>> I am parsing a file with several lines of the form
>>> keyword : value1 value2 ... valueN
>>>
>>> What is the easiest way for me to write a regex that will capture
>>> all of the values?

>>
>> Well, why do you want to use a regexp? A simple
>> my ($keyword, undef, @values) = split / /,$line;
>> should do the job much easier and faster.

>
> That *does* use a regexp.


Hmmm, guilty as charged
But at least not for capturing the desired values.

jue


 
Reply With Quote
 
Bart Lateur
Guest
Posts: n/a
 
      06-19-2005
Joe Gottman wrote:

>I am parsing a file with several lines of the form
> keyword : value1 value2 ... valueN
>
>What is the easiest way for me to write a regex that will capture all of the
>values? my first pass was
> /^ \s* keyword \s* : (?: \s* (\w+) \b)+/x
>
>but this only captures the last value.


That's indeed an annoying feature (IMO) of Perl regular expressions: you
either capture the lot, or you capture the last value, when you match
with a repeat modifier.

The only solution that I think works reasonably well, is a two step
approach: first match the whole list, and second split up the match into
its parts. For example, like this (though there are other approches, for
example using split):

if(/^ \s* keyword \s* : ((?: \s* \w+ \b)+)/x) {
@parts = $1 =~ /\w+/g;
}

Yes, that is indeed making perl do the same match twice. Double work,
but I know of no one step method.

--
Bart.
 
Reply With Quote
 
Brian McCauley
Guest
Posts: n/a
 
      06-19-2005


Bart Lateur wrote:

> Joe Gottman wrote:
>
>
>>I am parsing a file with several lines of the form
>> keyword : value1 value2 ... valueN
>>
>>What is the easiest way for me to write a regex that will capture all of the
>>values? my first pass was
>> /^ \s* keyword \s* : (?: \s* (\w+) \b)+/x
>>
>>but this only captures the last value.

>
> The only solution that I think works reasonably well, is a two step
> approach: first match the whole list, and second split up the match into
> its parts. For example, like this (though there are other approches, for
> example using split):
>
> if(/^ \s* keyword \s* : ((?: \s* \w+ \b)+)/x) {
> @parts = $1 =~ /\w+/g;
> }


It is worth mentioning that rather than capturing and reprocessing $1
you can take advantage of the behaviour of //g in a scalar context.

if(/^ \s* keyword \s* :/gx) {
@parts = /\G \s* (\w+)/g;
}

Note - although I say this technique is worthy mention I probably
wouldn't use it here because although it's equivalent to Bart's solution
I would actually prefer to see an end-of-line anchor in Bart's solution.

if(/^ \s* keyword \s* : ([\s\w]*)$/x) {
@parts = $1 =~ /\w+/g;
}

> Yes, that is indeed making perl do the same match twice.


Of course. But as I show above the first match can actually be somewhat
simpler.

If you are feeling particularly obscure you can combine the two
techniques by using lookahead to set pos() to the middle of a pattern match.

if(/^ \s* keyword \s* : (?=[\s\w]*$)/gx) {
@parts = /\w+/g;
}

This saves the expense of performing the string copy at the expense of
being rather harder to comprehend.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How make regex that means "contains regex#1 but NOT regex#2" ?? seberino@spawar.navy.mil Python 3 07-01-2008 03:06 PM
rename captures in regex Todd W Perl Misc 6 02-11-2005 05:07 AM
Screen captures Joshua Beall HTML 4 01-19-2004 06:10 PM
regex @a = m / | /g and captures? Bill Perl Misc 5 10-18-2003 11:46 PM
Re: Movie File screen captures to .jpg or .gif? Mitch Computer Support 0 06-28-2003 12:53 PM



Advertisments