Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Match on x instances of a character

Reply
Thread Tools

Match on x instances of a character

 
 
John Burgess
Guest
Posts: n/a
 
      02-04-2006
Hi,
I am having some trouble with regexps and hope someone can help.

Problem: Iterating through a list of newsgroups and matching only those
with 2 .'s in the name. So comp.lang.perl would match but comp.lang or
comp.lang.perl.misc would not.

(Broken) Solution: I have got something like this

$test = "comp.lang.perl";
if ($test =~ m/([^\.]\.[^\.]){2}/g) {print STDERR "$test is 2\n";} else
{print STDERR "$test is not 2\n";}

Clearly this doesn't work. I can't see what I'm doing wrong. Tips
appreciated.

John
 
Reply With Quote
 
 
 
 
Brian Wakem
Guest
Posts: n/a
 
      02-04-2006
John Burgess wrote:

> Hi,
> I am having some trouble with regexps and hope someone can help.
>
> Problem: Iterating through a list of newsgroups and matching only those
> with 2 .'s in the name. So comp.lang.perl would match but comp.lang or
> comp.lang.perl.misc would not.
>
> (Broken) Solution: I have got something like this
>
> $test = "comp.lang.perl";
> if ($test =~ m/([^\.]\.[^\.]){2}/g) {print STDERR "$test is 2\n";} else
> {print STDERR "$test is not 2\n";}
>
> Clearly this doesn't work. I can't see what I'm doing wrong. Tips
> appreciated.
>
> John



#!/usr/bin/perl

use strict;
use warnings;

while(<DATA>){
chomp;
my $dots = tr/.//;
print "$_ has $dots dots\n";
}


__DATA__
comp.lang
comp.lang.perl
comp.lang.perl.misc

###########

$ perl scripts/tmp/tmp72.pl
comp.lang has 1 dots
comp.lang.perl has 2 dots
comp.lang.perl.misc has 3 dots


See perldoc -q count


--
Brian Wakem
Email: http://homepage.ntlworld.com/b.wakem/myemail.png
 
Reply With Quote
 
 
 
 
Anno Siegel
Guest
Posts: n/a
 
      02-04-2006
John Burgess <> wrote in comp.lang.perl.misc:
> Hi,
> I am having some trouble with regexps and hope someone can help.
>
> Problem: Iterating through a list of newsgroups and matching only those
> with 2 .'s in the name. So comp.lang.perl would match but comp.lang or
> comp.lang.perl.misc would not.
>
> (Broken) Solution: I have got something like this
>
> $test = "comp.lang.perl";
> if ($test =~ m/([^\.]\.[^\.]){2}/g) {print STDERR "$test is 2\n";} else


What is the /g for? It makes no sense, you're not looking for multiple
occurences of anything. Further, in a character class a dot is not
special, so the "\" is not needed. Third, you forgot an asterisk after
each character class that matches non-dots, so it can never match more
than one non-dot in a row. Fourth, you are using capturing parentheses
for grouping. Fifth, you didn't anchor your match to the beginning and
the end of the string, so, even with the other corrections it would match
anything with two or more dots in it.

> {print STDERR "$test is not 2\n";}


Applying all of this to your regex, it becomes

/^(?:[^.]*\.[^.]*){2}$/

which dies indeed match what you want.

However, the easiest (and fastest) way of counting characters is the
tr/// operator:

if ( tr/.// == 2 ) { #...

Anno
--
If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers.
 
Reply With Quote
 
John Burgess
Guest
Posts: n/a
 
      02-04-2006
Thanks Brian, I was aware the tr function would do it. However I was
planning to use the match in a grep and so I dont think the tr is so
economical. I am also testing these options for speed and thats part of
the reason for finding the match function. To see which is fastest.
Thanks very much for your input though!

Regards,
John

Brian Wakem wrote:
> John Burgess wrote:
>
>
>>Hi,
>> I am having some trouble with regexps and hope someone can help.
>>
>>Problem: Iterating through a list of newsgroups and matching only those
>>with 2 .'s in the name. So comp.lang.perl would match but comp.lang or
>>comp.lang.perl.misc would not.
>>
>>(Broken) Solution: I have got something like this
>>
>>$test = "comp.lang.perl";
>>if ($test =~ m/([^\.]\.[^\.]){2}/g) {print STDERR "$test is 2\n";} else
>>{print STDERR "$test is not 2\n";}
>>
>>Clearly this doesn't work. I can't see what I'm doing wrong. Tips
>>appreciated.
>>
>>John

>
>
>
> #!/usr/bin/perl
>
> use strict;
> use warnings;
>
> while(<DATA>){
> chomp;
> my $dots = tr/.//;
> print "$_ has $dots dots\n";
> }
>
>
> __DATA__
> comp.lang
> comp.lang.perl
> comp.lang.perl.misc
>
> ###########
>
> $ perl scripts/tmp/tmp72.pl
> comp.lang has 1 dots
> comp.lang.perl has 2 dots
> comp.lang.perl.misc has 3 dots
>
>
> See perldoc -q count
>
>

 
Reply With Quote
 
John Burgess
Guest
Posts: n/a
 
      02-04-2006
Seems I really was off the track a bit. I am no regexp pro. I'm trying
though. Your example does indeed work. Your comment about speed is
interesting. Part of the reason for finding the correct match regexp was
to test for speed, which I will still test. The other thing is I want to
use this in a grep and I'm not sure the tr can be used economically in
this context? Thanks for your help. I'll be sure and go over where you
say I've got it wrong. Your comments make a lot of sense.

Regards,
John

Anno Siegel wrote:
> John Burgess <> wrote in comp.lang.perl.misc:
>
>>Hi,
>> I am having some trouble with regexps and hope someone can help.
>>
>>Problem: Iterating through a list of newsgroups and matching only those
>>with 2 .'s in the name. So comp.lang.perl would match but comp.lang or
>>comp.lang.perl.misc would not.
>>
>>(Broken) Solution: I have got something like this
>>
>>$test = "comp.lang.perl";
>>if ($test =~ m/([^\.]\.[^\.]){2}/g) {print STDERR "$test is 2\n";} else

>
>
> What is the /g for? It makes no sense, you're not looking for multiple
> occurences of anything. Further, in a character class a dot is not
> special, so the "\" is not needed. Third, you forgot an asterisk after
> each character class that matches non-dots, so it can never match more
> than one non-dot in a row. Fourth, you are using capturing parentheses
> for grouping. Fifth, you didn't anchor your match to the beginning and
> the end of the string, so, even with the other corrections it would match
> anything with two or more dots in it.
>
>
>>{print STDERR "$test is not 2\n";}

>
>
> Applying all of this to your regex, it becomes
>
> /^(?:[^.]*\.[^.]*){2}$/
>
> which dies indeed match what you want.
>
> However, the easiest (and fastest) way of counting characters is the
> tr/// operator:
>
> if ( tr/.// == 2 ) { #...
>
> Anno

 
Reply With Quote
 
MikeGee
Guest
Posts: n/a
 
      02-04-2006

John Burgess wrote:
> Seems I really was off the track a bit. I am no regexp pro. I'm trying
> though. Your example does indeed work. Your comment about speed is
> interesting. Part of the reason for finding the correct match regexp was
> to test for speed, which I will still test. The other thing is I want to
> use this in a grep and I'm not sure the tr can be used economically in
> this context? Thanks for your help. I'll be sure and go over where you
> say I've got it wrong. Your comments make a lot of sense.
>
> Regards,
> John
>
> Anno Siegel wrote:
> > John Burgess <> wrote in comp.lang.perl.misc:
> >
> >>Hi,
> >> I am having some trouble with regexps and hope someone can help.
> >>
> >>Problem: Iterating through a list of newsgroups and matching only those
> >>with 2 .'s in the name. So comp.lang.perl would match but comp.lang or
> >>comp.lang.perl.misc would not.
> >>
> >>(Broken) Solution: I have got something like this
> >>
> >>$test = "comp.lang.perl";
> >>if ($test =~ m/([^\.]\.[^\.]){2}/g) {print STDERR "$test is 2\n";} else

> >
> >
> > What is the /g for? It makes no sense, you're not looking for multiple
> > occurences of anything. Further, in a character class a dot is not
> > special, so the "\" is not needed. Third, you forgot an asterisk after
> > each character class that matches non-dots, so it can never match more
> > than one non-dot in a row. Fourth, you are using capturing parentheses
> > for grouping. Fifth, you didn't anchor your match to the beginning and
> > the end of the string, so, even with the other corrections it would match
> > anything with two or more dots in it.
> >
> >
> >>{print STDERR "$test is not 2\n";}

> >
> >
> > Applying all of this to your regex, it becomes
> >
> > /^(?:[^.]*\.[^.]*){2}$/
> >
> > which dies indeed match what you want.
> >
> > However, the easiest (and fastest) way of counting characters is the
> > tr/// operator:
> >
> > if ( tr/.// == 2 ) { #...
> >
> > Anno


Why don't you think you can use tr/// in a grep?

@two_dotted = grep { tr/.// == 2 } @newsgroups;

 
Reply With Quote
 
Uri Guttman
Guest
Posts: n/a
 
      02-04-2006
>>>>> "JB" == John Burgess <> writes:

JB> Seems I really was off the track a bit. I am no regexp pro. I'm
JB> trying though. Your example does indeed work. Your comment about
JB> speed is interesting. Part of the reason for finding the correct
JB> match regexp was to test for speed, which I will still test. The
JB> other thing is I want to use this in a grep and I'm not sure the
JB> tr can be used economically in this context? Thanks for your
JB> help. I'll be sure and go over where you say I've got it
JB> wrong. Your comments make a lot of sense.

please stop top posting. read the frequently posted group guidelines for
more about that.

what does 'used economically in this context' mean? what context? why
are you so speed conscious about this? have you found it to be a major
bottleneck and you need more speed? and tr/// isn't a regex so don't
confuse it with them. and tr/// *IS* the fastest way to count chars in a
string. there is no way a regex can beat it for something as simple as
that. tr/// is designed for character oriented operations.

uri

--
Uri Guttman ------ -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
 
Reply With Quote
 
Tad McClellan
Guest
Posts: n/a
 
      02-05-2006

[ Please do not top-post.
Text rearranged into a more sensible order.
]


John Burgess <> wrote:
> Anno Siegel wrote:
>> John Burgess <> wrote in comp.lang.perl.misc:


>>>Problem: Iterating through a list of newsgroups and matching only those
>>>with 2 .'s in the name.


>> However, the easiest (and fastest) way of counting characters is the
>> tr/// operator:
>>
>> if ( tr/.// == 2 ) { #...



Note that there *are no* regular expressions used in Anno's suggestion.


> Part of the reason for finding the correct match regexp was
> to test for speed, which I will still test.



Sounds like premature optimization to me...


> The other thing is I want to
> use this in a grep and I'm not sure the tr can be used economically in
> this context?



The docs for grep() say that it can take any EXPRession.

tr/// is an expression.

my @two_dot_groups = grep tr/.// == 2, @newsgroups;


--
Tad McClellan SGML consulting
Perl programming
Fort Worth, Texas
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
dicts,instances,containers, slotted instances, et cetera. ocschwar@gmail.com Python 8 01-29-2009 09:52 AM
pat-match.lisp or extend-match.lisp in Python? ekzept Python 0 08-10-2007 06:08 PM
$match = true() for empty $match?? Victor XML 2 05-17-2004 10:43 AM
list of class instances within a list of a class instances John Wohlbier Python 2 02-22-2004 08:41 AM
Java regex can't match lengthy match? hiwa Java 0 01-29-2004 10:09 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57