Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > counting number of (overlapping) occurances

Reply
Thread Tools

counting number of (overlapping) occurances

 
 
John
Guest
Posts: n/a
 
      03-10-2006
I have two strings S1 and S2. I want to know how many times
S2 occurs inside S1.

For instance

if S1 = "AAAA"
and S2 = "AA"

then the count is 3. Is there an easy way to do this in python?
I was trying to use the "count" function but it does not do
overlapping counts it seems.

Thanks,
--j

 
Reply With Quote
 
 
 
 
Paul Rubin
Guest
Posts: n/a
 
      03-10-2006
"John" <(E-Mail Removed)> writes:
> if S1 = "AAAA"
> and S2 = "AA"
>
> then the count is 3. Is there an easy way to do this in python?
> I was trying to use the "count" function but it does not do
> overlapping counts it seems.


len([1 for i in xrange(len(s1)) if s1[i:].startswith(s2)])
 
Reply With Quote
 
 
 
 
John
Guest
Posts: n/a
 
      03-10-2006
Thanks a lot,

This works but is a bit slow, I guess I'll have to live with it.
Any chance this could be sped up in python?

Thanks once again,
--j

 
Reply With Quote
 
Paul Rubin
Guest
Posts: n/a
 
      03-10-2006
"John" <(E-Mail Removed)> writes:
> This works but is a bit slow, I guess I'll have to live with it.
> Any chance this could be sped up in python?


Whoops, I meant to say:

len([1 for i in xrange(len(s1)) if s1.startswith(s2,i)])

That avoids creating a lot of small strings.

If s1 is large you may want to use fancy algorithms like Boyer-Moore
string search. If s2 is also large, you could have some kind of
finite state machine scan through s1 so it recognizes overlapping
occurrences of s2 in parallel. There might be some way to combine
that with Boyer-Moore.
 
Reply With Quote
 
Ben Cartwright
Guest
Posts: n/a
 
      03-10-2006
John wrote:
> This works but is a bit slow, I guess I'll have to live with it.
> Any chance this could be sped up in python?


Sure, to a point. Instead of:

def countoverlap(s1, s2):
return len([1 for i in xrange(len(s1)) if s1[i:].startswith(s2)])

Try this version, which takes smaller slices (resulting in 2x-5x speed
increase when dealing with a large s1 and a small s2):

def countoverlap(s1, s2):
L = len(s2)
return len([1 for i in xrange(len(s1)-L+1) if s1[i:i+L] == s2])

And for a minor extra boost, this version eliminates the list
comprehension:

def countoverlap(s1, s2):
L = len(s2)
cnt = 0
for i in xrange(len(s1)-L+1):
if s1[i:i+L] == s2:
cnt += 1
return cnt

Finally, if the execution speed of this function is vital to your
application, create a C extension. String functions like this one are
generally excellent candidates for extensionizing.

--Ben

 
Reply With Quote
 
Alex Martelli
Guest
Posts: n/a
 
      03-10-2006
John <(E-Mail Removed)> wrote:

> Thanks a lot,
>
> This works but is a bit slow, I guess I'll have to live with it.
> Any chance this could be sped up in python?


Sure (untested code):

def count_with_overlaps(needle, haystack):
count = 0
pos = 0
while True:
where = haystack.index(needle, pos)
if where == -1: return count
count += 1
pos = where + 1


Alex
 
Reply With Quote
 
Alex Martelli
Guest
Posts: n/a
 
      03-10-2006
Alex Martelli <(E-Mail Removed)> wrote:

> John <(E-Mail Removed)> wrote:
>
> > Thanks a lot,
> >
> > This works but is a bit slow, I guess I'll have to live with it.
> > Any chance this could be sped up in python?

>
> Sure (untested code):
>
> def count_with_overlaps(needle, haystack):
> count = 0
> pos = 0
> while True:
> where = haystack.index(needle, pos)


Oops -- 'find', not 'index' -- sorry (it WAS untested.

> if where == -1: return count
> count += 1
> pos = where + 1



Alex
 
Reply With Quote
 
Steven D'Aprano
Guest
Posts: n/a
 
      03-11-2006
On Thu, 09 Mar 2006 22:01:17 -0800, John wrote:

> Thanks a lot,
>
> This works but is a bit slow, I guess I'll have to live with it.
> Any chance this could be sped up in python?


I don't know. What is it?

You should always quote enough of the previous poster's message to give
context. Messages sometimes go missing in Usenet, and people won't have
the foggiest idea what you are talking about.

--
Steven.

 
Reply With Quote
 
Fredrik Lundh
Guest
Posts: n/a
 
      03-11-2006
Steven D'Aprano wrote

> You should always quote enough of the previous poster's message to give
> context. Messages sometimes go missing in Usenet, and people won't have
> the foggiest idea what you are talking about.


one would think that given how many Pythoneers we now have working
for google, at least one of them could go fix the horridly broken posting
interface in googlegroups...

:::

for those who use googlegroups and prefer to make a least a little sense
to people read this newsgroup/mailing list using other tools, please click
"show options" and use the "reply" link at the top of the message instead
of the link with the same name at the bottom.

</F>



 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
counting up instead of counting down edwardfredriks Javascript 6 09-07-2005 03:30 PM
counting word occurances Rodrick Brown Perl Misc 22 06-06-2005 08:48 PM
counting word occurances Rodrick Brown Perl 2 06-02-2005 02:49 AM
Counting occurances of string A in string B, and adding it to string B Sandman Perl Misc 7 08-03-2004 08:46 PM
Multiple Occurances Of Value In String =?Utf-8?B?SmltIEhlYXZleQ==?= ASP .Net 4 06-29-2004 02:55 PM



Advertisments