Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > Count substrings from a string

Reply
Thread Tools

Count substrings from a string

 
 
Zangief Ief
Guest
Posts: n/a
 
      11-18-2007
Hi,

I am able to know if there is a substring in a string with:

if string =~ /#{word}/:
puts "match"
else
puts "do not match"
end

But I would like to count all the substring from a string...

Thanks for help
--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
 
 
 
Wolfgang Nádasi-Donner
Guest
Posts: n/a
 
      11-18-2007
Zangief Ief wrote:
> Hi,
>
> I am able to know if there is a substring in a string with:
>
> if string =~ /#{word}/:
> puts "match"
> else
> puts "do not match"
> end
>
> But I would like to count all the substring from a string...
>
> Thanks for help


irb(main):001:0> "one word is one word, never be three
words".scan(/word/).length
=> 3

, Wolfgang Nádasi-Donner
--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
 
 
 
Robert Dober
Guest
Posts: n/a
 
      11-18-2007
On Nov 18, 2007 3:56 PM, Wolfgang N=E1dasi-Donner <> wro=
te:
> Zangief Ief wrote:
> > Hi,
> >
> > I am able to know if there is a substring in a string with:
> >
> > if string =3D~ /#{word}/:
> > puts "match"
> > else
> > puts "do not match"
> > end
> >
> > But I would like to count all the substring from a string...
> >
> > Thanks for help

>
> irb(main):001:0> "one word is one word, never be three
> words".scan(/word/).length
> =3D> 3
>
> , Wolfgang N=E1dasi-Donner
>
> --
> Posted via http://www.ruby-forum.com/.
>
>

It is not clear what you want, I use an idiom like the following to
allow for overlap

class String
def all_subs substr, overlap =3D false
return scan(substr) unless overlap
(0...size).inject([]){ |r, i|
r <<
( substr.match(self[i..-1])[0] rescue nil )
}.compact
end
end


p "aaaa".all_subs( /aaa/ )
p "aaaa".all_subs( /aaa/, true )

too bad scan does not have an overlap parameter

HTH
Robert


--=20
what do I think about Ruby?
http://ruby-smalltalk.blogspot.com/

 
Reply With Quote
 
Wolfgang Nádasi-Donner
Guest
Posts: n/a
 
      11-18-2007
Robert Dober wrote:
> too bad scan does not have an overlap parameter


It has the possibility


irb(main):001:0> "aaaaaa".scan(/(?=aa)/) do
irb(main):002:1* puts "#{$~.pre_match}-#{$~[0]}-#{$~.post_match}"
irb(main):003:1> end
--aaaaaa
a--aaaaa
aa--aaaa
aaa--aaa
aaaa--aa
=> "aaaaaa"
irb(main):004:0> "aaaaaa".scan(/(?=aa)/).length
=> 5

With the look ahead features severals things can be done.

Wolfgang Nádasi-Donner
--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
Robert Dober
Guest
Posts: n/a
 
      11-18-2007
On Nov 18, 2007 4:16 PM, Wolfgang N=E1dasi-Donner <> wro=
te:
> Robert Dober wrote:
> > too bad scan does not have an overlap parameter

>
> It has the possibility

I am not willing to pay that price
but let me see, in case of sparse overlaps ( and that is often the
case ) the lookahead might be less costly than my
brute force approach, hmm, not sure, I will try

38/38 > cat substr.rb && ruby substr.rb
# vim: sw=3D2 ts=3D2 ft=3Druby expandtab tw=3D0 nu syn:
#
class String
def bf_sub substr
(0...size).inject([]){ |r, i|
r <<
( substr.match(self[i..-1])[0] rescue nil )
}.compact
end

def la_sub substr
scan /(?=3D#{substr})/
end
end

require 'benchmark'

N =3D 1_000
A =3D "a" * 100
S =3D /aa/
B =3D "b " * 20
T =3D /b\s/
Benchmark::bmbm do |bm|
bm.report "look ahead on many occurrances" do
N.times do
A.la_sub S
end
end
bm.report "brute force on many occurrances" do
N.times do
A.bf_sub S
end
end

bm.report "look ahead on sparse occurrances" do
N.times do
B.la_sub T
end
end
bm.report "brute force on sparse occurrances" do
N.times do
B.bf_sub T
end
end
end

Rehearsal -----------------------------------------------------------------=
----
look ahead on many occurrances 0.150000 0.000000 0.150000 ( 0.240=
576)
brute force on many occurrances 1.020000 0.030000 1.050000 ( 1.327=
992)
look ahead on sparse occurrances 0.050000 0.000000 0.050000 ( 0.125=
577)
brute force on sparse occurrances 2.720000 0.090000 2.810000 ( 4.508=
353)
------------------------------------------------------------ total: 4.06000=
0sec

user system total r=
eal
look ahead on many occurrances 0.150000 0.000000 0.150000 ( 0.184=
002)
brute force on many occurrances 0.890000 0.050000 0.940000 ( 1.243=
384)
look ahead on sparse occurrances 0.050000 0.000000 0.050000 ( 0.051=
307)
brute force on sparse occurrances 2.620000 0.090000 2.710000 ( 3.080=
540)
---------------------------------------------------------------------------=
--------------------------------
$stdout.sub("urranc","urrenc")

Turns out lookahead is quite ok, brute force hopeless even in the case
suiting it best , wonder how this would scale if there
were a Regexp#match allowing for an offset.

Cheers
Robert


--=20
what do I think about Ruby?
http://ruby-smalltalk.blogspot.com/

 
Reply With Quote
 
Wolfgang Nádasi-Donner
Guest
Posts: n/a
 
      11-18-2007
Robert Dober wrote:
> ...wonder how this would scale if there were a Regexp#match allowing for an offset.



...which will come in Ruby 1.9:

C:\Dokumente und Einstellungen\wolfgang>irb19
irb(main):001:0> "abcdef".match(/./,3)
=> #<MatchData "d">

Wolfgang Nádasi-Donner
--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
Zangief Ief
Guest
Posts: n/a
 
      11-18-2007
Thanks a lot.
--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
Robert Dober
Guest
Posts: n/a
 
      11-18-2007
On Nov 18, 2007 6:29 PM, Wolfgang N=E1dasi-Donner <> wro=
te:
> Robert Dober wrote:
> > ...wonder how this would scale if there were a Regexp#match allowing fo=

r an offset.
>
>
> ...which will come in Ruby 1.9:
>
> C:\Dokumente und Einstellungen\wolfgang>irb19
> irb(main):001:0> "abcdef".match(/./,3)
> =3D> #<MatchData "d">
>

Great

I have written a 1.9 test and still lookahead is much better, I am
surpprised as I got rid of all fancy stuff in the
bf_sub method going back to my Regexp stuff happily again.

def bf_sub substr
r =3D []
size.times do |i|
s =3D
substr.match(self,i)
r << ( s && s[0] )
end
r.compact
end

512/12 > ruby1.9 substr-1.9.rb
Rehearsal -----------------------------------------------------------------=
----
look ahead on many occurrances 0.200000 0.000000 0.200000 ( 0.308=
822)
brute force on many occurrances 0.530000 0.000000 0.530000 ( 0.649=
79
look ahead on sparse occurrances 0.070000 0.000000 0.070000 ( 0.171=
571)
brute force on sparse occurrances 1.260000 0.000000 1.260000 ( 1.517=
635)
------------------------------------------------------------ total: 2.06000=
0sec

user system total r=
eal
look ahead on many occurrances 0.200000 0.000000 0.200000 ( 0.531=
080)
brute force on many occurrances 0.530000 0.000000 0.530000 ( 0.641=
892)
look ahead on sparse occurrances 0.070000 0.000000 0.070000 ( 0.158=
519)
brute force on sparse occurrances 1.240000 0.000000 1.240000 ( 1.389=
619)



Original test
507/7 > ruby substr.rb
Rehearsal -----------------------------------------------------------------=
----
look ahead on many occurrances 0.140000 0.000000 0.140000 ( 0.148=
194)
brute force on many occurrances 0.770000 0.020000 0.790000 ( 1.049=
865)
look ahead on sparse occurrances 0.040000 0.000000 0.040000 ( 0.059=
984)
brute force on sparse occurrances 2.130000 0.050000 2.180000 ( 2.316=
25
------------------------------------------------------------ total: 3.15000=
0sec

user system total r=
eal
look ahead on many occurrances 0.120000 0.000000 0.120000 ( 0.147=
802)
brute force on many occurrances 0.770000 0.030000 0.800000 ( 0.838=
135)
look ahead on sparse occurrances 0.040000 0.000000 0.040000 ( 0.057=
320)
brute force on sparse occurrances 2.070000 0.070000 2.140000 ( 2.398=
06

>
> Wolfgang N=E1dasi-Donner
> --
> Posted via http://www.ruby-forum.com/.
>
>



R.
--=20
what do I think about Ruby?
http://ruby-smalltalk.blogspot.com/

 
Reply With Quote
 
Robert Klemme
Guest
Posts: n/a
 
      11-18-2007
On 18.11.2007 19:57, Robert Dober wrote:
> On Nov 18, 2007 6:29 PM, Wolfgang Nádasi-Donner <>wrote:
>> Robert Dober wrote:
>>> ...wonder how this would scale if there were a Regexp#match allowing for an offset.

>>
>> ...which will come in Ruby 1.9:
>>
>> C:\Dokumente und Einstellungen\wolfgang>irb19
>> irb(main):001:0> "abcdef".match(/./,3)
>> => #<MatchData "d">
>>

> Great
>
> I have written a 1.9 test and still lookahead is much better, I am
> surpprised as I got rid of all fancy stuff in the
> bf_sub method going back to my Regexp stuff happily again.


IMHO it is not surprising because the lookahead method uses one method
(scan) which is implemented in C while you create a lot of temporary
objects and also start scanning at *every* position in the string.
Usually regexp engines are implemented much smarter than we can do in
short method. I am sure you have heard of Boyer Moore and Knuth Morris
Pratt algorithms...

> def bf_sub substr
> r = []
> size.times do |i|
> s =
> substr.match(self,i)
> r << ( s && s[0] )
> end
> r.compact
> end


Kind regards

robert
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Count substrings in string, scan too slow Danny Challis Ruby 17 06-30-2010 02:04 AM
Counting the repetition count of repeated substrings Michele Dondi Perl Misc 0 11-15-2007 03:38 PM
C program to count occurences of substrings in strings JD C Programming 1 11-12-2005 11:56 AM
Replacing palindrome substrings of an input string with a given string Tung Chau C Programming 1 08-06-2004 07:27 PM
Replacing palindrome substrings of an input string with a given string. Any effecient algorithm? Tung Chau C Programming 0 08-06-2004 10:18 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57