Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Using split to count matches, but exclude certain patterns

Reply
Thread Tools

Using split to count matches, but exclude certain patterns

 
 
surfitupdotcom@gmail.com
Guest
Posts: n/a
 
      08-01-2007
I have script that recursively greps for a term and counts the
occurrences of it in each file. It works fine but now I want to
exclude matches where the term has an underscore in front or after
it. I have tried to continue using split on (not underscore)
$search_term(not underscore) in below examples but my results are not
right yet. Input is a string in $grep_out and I want to count any
number of occurrences. I can not break string up into words since a
correct match may not have spaces or any certain character around it.
Let me know if I have not provided enough info, or should post whole
script.... Thanks in advance for any assist, John

Attempts so far:
# @surewords = split(/(?<!\_)${search_term}(?!\_)/im,
$grep_out);
# @surewords = split(/\_{0}${search_term}\_{0}/im,
$grep_out);
@surewords = split(/[^\_]${search_term}[^\_]/im,
$grep_out);
# @surewords = split(/(^|[^\_])${search_term}($|[^\_])/im,
$grep_out);

 
Reply With Quote
 
 
 
 
Paul Lalli
Guest
Posts: n/a
 
      08-01-2007
On Aug 1, 3:22 pm, (E-Mail Removed) wrote:
> I have script that recursively greps for a term and counts the
> occurrences of it in each file. It works fine but now I want to
> exclude matches where the term has an underscore in front or after
> it. I have tried to continue using split on (not underscore)
> $search_term(not underscore) in below examples but my results are not
> right yet. Input is a string in $grep_out and I want to count any
> number of occurrences. I can not break string up into words since a
> correct match may not have spaces or any certain character around it.
> Let me know if I have not provided enough info, or should post whole
> script.... Thanks in advance for any assist, John
>
> Attempts so far:
> # @surewords = split(/(?<!\_)${search_term}(?!\_)/im,
> $grep_out);


_ is not special. No need to backslash it. This code says to split
on any $search_term that is not *immediately* preceded by or
*immediately* followed by an underscore. Is that what you meant?

> # @surewords = split(/\_{0}${search_term}\_{0}/im,
> $grep_out);


A quantifier of {0} is a no-op. Frankly, I think that should be a
syntax error, or at least a warning.

> @surewords = split(/[^\_]${search_term}[^\_]/im,
> $grep_out);


This says to include the not-underscore character in the split
delimiter.

> # @surewords = split(/(^|[^\_])${search_term}($|[^\_])/im,
> $grep_out);


That's a modification of the above, allowing $search_term to come at
the beginning or end of the string as well.


Please provide some sample input and sample output, so people have a
chance to know what it is you're trying to acheive. This and other
good advice can be found in the Posting Guidelines, which are posted
here twice a week.

Paul Lali

 
Reply With Quote
 
 
 
 
Paul Lalli
Guest
Posts: n/a
 
      08-01-2007
On Aug 1, 3:22 pm, (E-Mail Removed) wrote:
> I have script that recursively greps for a term and counts the
> occurrences of it in each file. It works fine but now I want to
> exclude matches where the term has an underscore in front or after
> it. I have tried to continue using split


As a side note to my other response, split() is a very bad way to
attempt to count occurrences of a string:

$ perl -le'
print scalar(@foo = split /foo/, "barfoobazfoobiff");
print scalar(@foo = split /foo/, "barfoobazbifffoo");
print scalar(@foo = split /foo/, "barbazbifffoofoo");
'
3
2
1


I rather strongly suggest you read:
$ perldoc -q count
Found in /opt2/Perl5_8_4/lib/perl5/5.8.4/pod/perlfaq4.pod
How can I count the number of occurrences of a substring
within a string?

Paul Lalli

 
Reply With Quote
 
surfitupdotcom@gmail.com
Guest
Posts: n/a
 
      08-01-2007
On Aug 1, 2:41 pm, Paul Lalli <(E-Mail Removed)> wrote:
> On Aug 1, 3:22 pm, (E-Mail Removed) wrote:
>
> > I have script that recursively greps for a term and counts the
> > occurrences of it in each file. It works fine but now I want to
> > exclude matches where the term has an underscore in front or after
> > it. I have tried to continue using split

>
> As a side note to my other response, split() is a very bad way to
> attempt to count occurrences of a string:
>
> $ perl -le'
> print scalar(@foo = split /foo/, "barfoobazfoobiff");
> print scalar(@foo = split /foo/, "barfoobazbifffoo");
> print scalar(@foo = split /foo/, "barbazbifffoofoo");
> '
> 3
> 2
> 1
>
> I rather strongly suggest you read:
> $ perldoc -q count
> Found in /opt2/Perl5_8_4/lib/perl5/5.8.4/pod/perlfaq4.pod
> How can I count the number of occurrences of a substring
> within a string?
>
> Paul Lalli


You read me correctly, idea was to split on any occurrence of my
search term that does not have an underscore before or after it.
Counting matches using split worked fine until I tried to exclude
certain patterns. I will look at the perldoc you suggested but here
is more info for the thread. Thanks, John

Sample input: super _super_ _super super SUPER SUPER_ blahsuper
Desired output: super super SUPER super

Current output using split(/(?<!_)${search_term}(?!_)/i, $grep_out);
Array contents- _super_ _super SUPER_ blah


 
Reply With Quote
 
Tad McClellan
Guest
Posts: n/a
 
      08-01-2007
http://www.velocityreviews.com/forums/(E-Mail Removed) <(E-Mail Removed)> wrote:

> I have script that recursively greps



> Attempts so far:
> # @surewords = split(/(?<!\_)${search_term}(?!\_)/im,
> $grep_out);
> # @surewords = split(/\_{0}${search_term}\_{0}/im,
> $grep_out);
> @surewords = split(/[^\_]${search_term}[^\_]/im,
> $grep_out);
> # @surewords = split(/(^|[^\_])${search_term}($|[^\_])/im,
> $grep_out);



There is no recursion anywhere in that code.

Perhaps you meant "repeatedly" instead?


--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
 
Reply With Quote
 
surfitupdotcom@gmail.com
Guest
Posts: n/a
 
      08-02-2007
On Aug 1, 4:42 pm, Tad McClellan <(E-Mail Removed)> wrote:
> (E-Mail Removed) <(E-Mail Removed)> wrote:
> > I have script that recursively greps
> > Attempts so far:
> > # @surewords = split(/(?<!\_)${search_term}(?!\_)/im,
> > $grep_out);
> > # @surewords = split(/\_{0}${search_term}\_{0}/im,
> > $grep_out);
> > @surewords = split(/[^\_]${search_term}[^\_]/im,
> > $grep_out);
> > # @surewords = split(/(^|[^\_])${search_term}($|[^\_])/im,
> > $grep_out);

>
> There is no recursion anywhere in that code.
>
> Perhaps you meant "repeatedly" instead?
>
> --
> Tad McClellan
> email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"


The recursion is elsewhere in the script. By the time it gets to this
split each line of $grep_out has one or more hits of the search term.

 
Reply With Quote
 
surfitupdotcom@gmail.com
Guest
Posts: n/a
 
      08-03-2007
On Aug 1, 8:55 pm, "(E-Mail Removed)"
<(E-Mail Removed)> wrote:
> On Aug 1, 1:02 pm, (E-Mail Removed) wrote:
>
> (snipped)
>
>
>
> > You read me correctly, idea was tospliton any occurrence of my
> > search term that does not have an underscore before or after it.
> > Counting matches usingsplitworked fine until I tried to exclude
> > certain patterns. I will look at the perldoc you suggested but here
> > is more info for the thread. Thanks, John

>
> > Sample input: super _super_ _super super SUPER SUPER_ blahsuper
> > Desired output: super super SUPER super

>
> How did you plan on getting rid of the 'blah' substring by
> doing asplit?
>
>
>
> > Current output usingsplit(/(?<!_)${search_term}(?!_)/i, $grep_out);
> > Array contents- _super_ _super SUPER_ blah

>
> Your description said 'a underscore before ... OR
> a underscore after'; so you also need an "OR" in your
> regular expression. This is known as "Alternation"
> (see perldoc perlre).
>
> use Data:umper;
>
> my $term = 'super';
>
> my $string = 'super _super_ _super super SUPER SUPER_ blahsuper';
>
> my @fragments =split(
> /_\Q$term\E_? # exclude term with underscore in front
> # (optional trailing _)
> | # OR
> _?\Q$term\E_/xi # exclude term with underscore afterward
> # (optional leading _)
> , $string);
>
> print Dumper \@fragments;
>
> __END__
>
> I get:
>
> $VAR1 = [
> 'super ',
> ' ',
> ' super SUPER ',
> ' blahsuper'
> ];
>
> Is that what you wanted? As Paul said, there's
> probably a better way to "count" things than
> usingsplit.
>
> --
> Hope this helps,
> Steven



Thanks all for the assist. After further experimentation I did switch
to using option other than split for this task. I did sharpen my
regexp along the way so everything worked out. Take care, John

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Finding one certain line in a file is easy but how to look forheadlines and something just under this certain headline? kazaam Ruby 3 08-26-2007 03:34 PM
Script that gives instance count unique patterns in a sorted file Generic Usenet Account Perl Misc 9 05-04-2006 08:03 PM
where to find good patterns and sources of patterns (was Re: singletons) crichmon C++ 4 07-07-2004 10:02 PM
I am adding a new row to the datagrid dynamically but if i use the Count property of Item it is not showing the count of the new rows being added Praveen Balanagendra via .NET 247 ASP .Net 2 06-06-2004 07:16 AM
Forms Authentication - how to exclude certain files Peter ASP .Net 2 08-20-2003 07:39 PM



Advertisments