Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > regex, number of matches

Reply
Thread Tools

regex, number of matches

 
 
Dr.Ruud
Guest
Posts: n/a
 
      09-23-2005
#!/usr/local/bin/perl -wC

use strict;

my @text;
$text[1] = "xxx xx xxx xxx";
$text[2] = "yyy yyyy yyy yyy yyy";

my ($chars, $words, $lines) = wc(@text);

print "Chars: $chars\n";
print "Words: $words\n";
print "Lines: $lines\n";

sub wc {
my @ret;
for (@_) {
if (defined) {
$ret[0] += length; # chars
$ret[1] += () = /\S+/g; # words
$ret[2] += 1; # lines
}
}
return @ret;
}


What I found hard to get, is the role of the '()' in the wc-words-line:

$ret[1] += () = /\S+/g; # words

After a while, I understood it as an anonymous array that is filled with
the matches, after which its length is used to increase the words-count.


The creating and filling of () seemed like a waste of cpu-cycles, so I
tried to find another way of counting the number of matches.

Destructive variant:

$ret[1] += s/\S+//g; # words

I settled for:

$ret[1] += 1 while /\S+/g; # words

Is there a better/nicer/smarter/directer way to return the number of
matches from a regex?

See also http://dev.perl.org/perl6/rfc/110.html

--
Affijn, Ruud

"Gewoon is een tijger."


 
Reply With Quote
 
 
 
 
Brian Wakem
Guest
Posts: n/a
 
      09-23-2005
Dr.Ruud wrote:


> $ret[1] += () = /\S+/g; # words
>
>
> Destructive variant:
>
> $ret[1] += s/\S+//g; # words
>
> I settled for:
>
> $ret[1] += 1 while /\S+/g; # words
>
> Is there a better/nicer/smarter/directer way to return the number of
> matches from a regex?



This was covered a few week ago in a thread titled 'Space (\s) count
problem'.

The fastest way is to substitute a match with itself.

I fired off an email to http://www.velocityreviews.com/forums/(E-Mail Removed) with this suggestion
but it bounced.



--
Brian Wakem
Email: http://homepage.ntlworld.com/b.wakem/myemail.png
 
Reply With Quote
 
 
 
 
John Bokma
Guest
Posts: n/a
 
      09-23-2005
"Dr.Ruud" <(E-Mail Removed)> wrote:

> What I found hard to get, is the role of the '()' in the wc-words-line:
>
> $ret[1] += () = /\S+/g; # words


it forces list context

> After a while, I understood it as an anonymous array that is filled with
> the matches, after which its length is used to increase the words-count.
>
> The creating and filling of () seemed like a waste of cpu-cycles,


Maybe it's optimized away?

> so I
> tried to find another way of counting the number of matches.
>
> Destructive variant:
>
> $ret[1] += s/\S+//g; # words
>
> I settled for:
>
> $ret[1] += 1 while /\S+/g; # words



Did you benchmark those?

--
John Small Perl scripts: http://johnbokma.com/perl/
Perl programmer available: http://castleamber.com/
Happy Customers: http://castleamber.com/testimonials.html

 
Reply With Quote
 
John W. Krahn
Guest
Posts: n/a
 
      09-23-2005
Dr.Ruud wrote:
>
> What I found hard to get, is the role of the '()' in the wc-words-line:
>
> $ret[1] += () = /\S+/g; # words
>
> After a while, I understood it as an anonymous array that is filled with
> the matches, after which its length is used to increase the words-count.


If it had been an anonymous array it would have been:

$ret[1] += @{[ /\S+/g ]}; # words



John
--
use Perl;
program
fulfillment
 
Reply With Quote
 
Dr.Ruud
Guest
Posts: n/a
 
      09-23-2005
John Bokma schreef:
> Dr.Ruud:


>> What I found hard to get, is the role of the '()' in
>> $ret[1] += () = /\S+/g; # words

>
> it forces list context


Yes, I'm starting to get that.


>> The creating and filling of () seemed like a waste of cpu-cycles,

>
> Maybe it's optimized away?


Well, maybe compare it to:

$ret[1] += (@tmp = /\S+/g); # words

but if tmp is not used afterwards, that also can be optimized.


>> Destructive variant:
>>
>> $ret[1] += s/\S+//g; # words
>>
>> I settled for:
>>
>> $ret[1] += 1 while /\S+/g; # words

>
>
> Did you benchmark those?


No, I'm not in a hurry (yet).


A new option for scalar mode would be the cleanest:

$ret[1] += /\S+/t; # words

--
Affijn, Ruud

"Gewoon is een tijger."


 
Reply With Quote
 
Dr.Ruud
Guest
Posts: n/a
 
      09-23-2005
Brian Wakem:


> The fastest way is to substitute a match with itself.


OK. I had tested that, but I hated the looks, because it doesn't explain
itself enough.

The notion that the 'fake substitution' operates 'at the C level' is
very convincing.


Did you also benchmark "s!\Q$kw\E!$&!g" ?

(You pay a price using &, see man perlre: $& is not so costly.)

--
Affijn, Ruud

"Gewoon is een tijger."


 
Reply With Quote
 
John Bokma
Guest
Posts: n/a
 
      09-23-2005
"Dr.Ruud" <(E-Mail Removed)> wrote:

> John Bokma schreef:


>> Did you benchmark those?

>
> No, I'm not in a hurry (yet).


Problem with benchmarking is what today seems to be a bad choice, can
be a better choice tomorrow. An example: map in a void context was some
time ago an expensive operation. IIRC it has been optimized (note that I
don't say that "we" should all use map in a void context now )

> A new option for scalar mode would be the cleanest:
>
> $ret[1] += /\S+/t; # words


and t means? (Tellen ). The questions are: how often is this going to be
used, and; is a new option really needed, or can we get away with what's
available now and some documentation?

--
John Small Perl scripts: http://johnbokma.com/perl/
Perl programmer available: http://castleamber.com/
Happy Customers: http://castleamber.com/testimonials.html

 
Reply With Quote
 
Dr.Ruud
Guest
Posts: n/a
 
      09-23-2005
Abigail:

> I was surprised the while variant was the clear winner -
> assigning to an empty list is hardly faster than assigning to an
> array.


That is what I expected.


> And the code:


Thanks for that too.

How about:

sub2 => '$sub2 = 0; $sub2 += s/\S+/$&/g for @data;'

And how about the o-option (pre-compile), or doesn't that go with g?

--
Affijn, Ruud

"Gewoon is een tijger."


 
Reply With Quote
 
Dr.Ruud
Guest
Posts: n/a
 
      09-23-2005
Brian Wakem:
> Dr.Ruud:
>
>
>> $ret[1] += () = /\S+/g; # words
>>
>>
>> Destructive variant:
>>
>> $ret[1] += s/\S+//g; # words
>>
>> I settled for:
>>
>> $ret[1] += 1 while /\S+/g; # words
>>
>> Is there a better/nicer/smarter/directer way to return the number of
>> matches from a regex?

>
>
> This was covered a few week ago in a thread titled 'Space (\s) count
> problem'.
>
> The fastest way is to substitute a match with itself.


As Abigail showed, there will be a difference between

(1) s/$kw/$kw/g (add \Q and \E where needed)

and

(2) s/\S+/$&/g

and

(3) s/\S+//g


The loops of both (1) and (3) are more 'constant' so will need less
cycles than (2).
(is my guess)

--
Affijn, Ruud

"Gewoon is een tijger."


 
Reply With Quote
 
John Bokma
Guest
Posts: n/a
 
      09-23-2005
"Dr.Ruud" <(E-Mail Removed)> wrote:

> sub2 => '$sub2 = 0; $sub2 += s/\S+/$&/g for @data;'
>
> And how about the o-option (pre-compile), or doesn't that go with g?


o is IIRC only useful in some rare cases when you use a variabele in the
regexp. Since your s/// is constant, I think it's compiled, optimized, etc.
at the compile stage of your script, but again IIRC.

--
John Small Perl scripts: http://johnbokma.com/perl/
Perl programmer available: http://castleamber.com/
Happy Customers: http://castleamber.com/testimonials.html

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Check if String.matches() AND (if yes) extract number from String? Jochen Brenzlinger Java 5 11-21-2011 07:43 PM
Different number of matches from re.findall and re.split Jeremy Python 10 01-13-2010 02:52 AM
Return number of matches Christopher Causer Ruby 9 02-12-2008 05:49 PM
Get Number of regex matches Ingo Weiss Ruby 5 12-07-2006 02:31 PM
OT: Number Nine, Number Nine, Number Nine FrisbeeŽ MCSE 37 09-26-2005 04:06 PM



Advertisments