Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > Searching through a sorted array

Reply
Thread Tools

Searching through a sorted array

 
 
FireAphis
Guest
Posts: n/a
 
      10-03-2007
Hello,

I have a very big array of objects sorted by one of its numeric data
members. During the flow of my application I need occasionally to get
all the elements in a specific range. I could use find_all

partition = my_array.find_all {|x| (x > start) && (x < end)}

My problem is that, as far as I know, find_all just iterates through
all the elements of the array and that's very inefficient, especially
in my case, in which I have a sorted array. Is there any standard way
to search efficiently through a sorted array? A binary search for
example?

Thanks
FireAphis

 
Reply With Quote
 
 
 
 
Peter Szinek
Guest
Posts: n/a
 
      10-03-2007
FireAphis wrote:
> Hello,
>
> I have a very big array of objects sorted by one of its numeric data
> members. During the flow of my application I need occasionally to get
> all the elements in a specific range.


Then why not

my_array[start..end] ?

Cheers,
Peter
__
http://www.rubyrailways.com
http://scrubyt.org

 
Reply With Quote
 
 
 
 
FireAphis
Guest
Posts: n/a
 
      10-03-2007
On Oct 3, 9:30 am, Peter Szinek <(E-Mail Removed)> wrote:
> FireAphis wrote:
> > Hello,

>
> > I have a very big array of objects sorted by one of its numeric data
> > members. During the flow of my application I need occasionally to get
> > all the elements in a specific range.

>
> Then why not
>
> my_array[start..end] ?
>
> Cheers,
> Peter
> __http://www.rubyrailways.comhttp://scrubyt.org


Sorry if my explanation has been misleading. The array isn't
necessarily comprised of continuous or consecutive numbers. Moreover
they don't necessarily start from zero. For example:

my_array = [-100, -99, -2, 0, 3, 125, 1742, 1234568763]

now I want to get all the elements bigger than 1 and smaller than
1000. AFAIK my_array[1..1000] doesn't help in this case.

 
Reply With Quote
 
Axel Etzold
Guest
Posts: n/a
 
      10-03-2007

-------- Original-Nachricht --------
> Datum: Wed, 3 Oct 2007 16:30:03 +0900
> Von: Peter Szinek <(E-Mail Removed)>
> An: http://www.velocityreviews.com/forums/(E-Mail Removed)
> Betreff: Re: Searching through a sorted array


> FireAphis wrote:
> > Hello,
> >
> > I have a very big array of objects sorted by one of its numeric data
> > members. During the flow of my application I need occasionally to get
> > all the elements in a specific range.

>
> Then why not
>
> my_array[start..end] ?


Dear FireAphis,

to find the values 'start' and 'end', you could use something like
the twenty questions game to find a number between 1 and a million
(roughly 2^20, hence 20 questions), starting by :

1.) Is the index of 'start' ('end') bigger or smaller than 2^19 (roughly 500000)?

2.) If the answer was "yes", is the index of 'start' ('end') bigger or smaller than 2^19+2^18 (roughly 750000)?
If the answer was "no", is the index of 'start' ('end') bigger or smaller than 2^18 (roughly 250000)?

etc..

The range in which "start" lies reduces its size by half with each question, and you need 2*log(2) n questions to find the 'start' and 'end'
values.

I am actually not aware whether that's implemented in the Array#index
method.

Best regards,

Axel

--
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer

 
Reply With Quote
 
Axel Etzold
Guest
Posts: n/a
 
      10-03-2007

-------- Original-Nachricht --------
> Datum: Wed, 3 Oct 2007 16:50:03 +0900
> Von: FireAphis <(E-Mail Removed)>
> An: (E-Mail Removed)
> Betreff: Re: Searching through a sorted array


> On Oct 3, 9:30 am, Peter Szinek <(E-Mail Removed)> wrote:
> > FireAphis wrote:
> > > Hello,

> >
> > > I have a very big array of objects sorted by one of its numeric data
> > > members. During the flow of my application I need occasionally to get
> > > all the elements in a specific range.

> >
> > Then why not
> >
> > my_array[start..end] ?
> >
> > Cheers,
> > Peter
> > __http://www.rubyrailways.comhttp://scrubyt.org

>
> Sorry if my explanation has been misleading. The array isn't
> necessarily comprised of continuous or consecutive numbers. Moreover
> they don't necessarily start from zero. For example:
>
> my_array = [-100, -99, -2, 0, 3, 125, 1742, 1234568763]
>
> now I want to get all the elements bigger than 1 and smaller than
> 1000. AFAIK my_array[1..1000] doesn't help in this case.
>

In that case, you can say:

interesting_array=my_array.delete_if{|entry| entry>1000 or entry<1}

Best regards,

Axel
--
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer

 
Reply With Quote
 
FireAphis
Guest
Posts: n/a
 
      10-03-2007
On Oct 3, 9:56 am, "Axel Etzold" <(E-Mail Removed)> wrote:
> -------- Original-Nachricht --------
>
> > Datum: Wed, 3 Oct 2007 16:30:03 +0900
> > Von: Peter Szinek <(E-Mail Removed)>
> > An: (E-Mail Removed)
> > Betreff: Re: Searching through a sorted array
> > FireAphis wrote:
> > > Hello,

>
> > > I have a very big array of objects sorted by one of its numeric data
> > > members. During the flow of my application I need occasionally to get
> > > all the elements in a specific range.

>
> > Then why not

>
> > my_array[start..end] ?

>
> Dear FireAphis,
>
> to find the values 'start' and 'end', you could use something like
> the twenty questions game to find a number between 1 and a million
> (roughly 2^20, hence 20 questions), starting by :
>
> 1.) Is the index of 'start' ('end') bigger or smaller than 2^19 (roughly 500000)?
>
> 2.) If the answer was "yes", is the index of 'start' ('end') bigger or smaller than 2^19+2^18 (roughly 750000)?
> If the answer was "no", is the index of 'start' ('end') bigger or smallerthan 2^18 (roughly 250000)?
>
> etc..
>
> The range in which "start" lies reduces its size by half with each question, and you need 2*log(2) n questions to find the 'start' and 'end'
> values.
>
> I am actually not aware whether that's implemented in the Array#index
> method.
>
> Best regards,
>
> Axel
>
> --
> Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
> Ideal für Modem und ISDN:http://www.gmx.net/de/go/smartsurfer


Of course that could be an excellent solution. Do you know if there is
already such a facility in Ruby that implements the algorithm? I'm
quite sure Array#index doesn't do that since in order to apply a
binary search the array has to be sorted.

 
Reply With Quote
 
Axel Etzold
Guest
Posts: n/a
 
      10-03-2007

-------- Original-Nachricht --------
> Datum: Wed, 3 Oct 2007 17:15:12 +0900
> Von: FireAphis <(E-Mail Removed)>
> An: (E-Mail Removed)
> Betreff: Re: Searching through a sorted array


> On Oct 3, 9:56 am, "Axel Etzold" <(E-Mail Removed)> wrote:
> > -------- Original-Nachricht --------
> >
> > > Datum: Wed, 3 Oct 2007 16:30:03 +0900
> > > Von: Peter Szinek <(E-Mail Removed)>
> > > An: (E-Mail Removed)
> > > Betreff: Re: Searching through a sorted array
> > > FireAphis wrote:
> > > > Hello,

> >
> > > > I have a very big array of objects sorted by one of its numeric data
> > > > members. During the flow of my application I need occasionally to

> get
> > > > all the elements in a specific range.

> >
> > > Then why not

> >
> > > my_array[start..end] ?

> >
> > Dear FireAphis,
> >
> > to find the values 'start' and 'end', you could use something like
> > the twenty questions game to find a number between 1 and a million
> > (roughly 2^20, hence 20 questions), starting by :
> >
> > 1.) Is the index of 'start' ('end') bigger or smaller than 2^19 (roughly

> 500000)?
> >
> > 2.) If the answer was "yes", is the index of 'start' ('end') bigger or

> smaller than 2^19+2^18 (roughly 750000)?
> > If the answer was "no", is the index of 'start' ('end') bigger or

> smaller than 2^18 (roughly 250000)?
> >
> > etc..
> >
> > The range in which "start" lies reduces its size by half with each

> question, and you need 2*log(2) n questions to find the 'start' and 'end'
> > values.
> >
> > I am actually not aware whether that's implemented in the Array#index
> > method.
> >
> > Best regards,
> >
> > Axel
> >
> > --
> > Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
> > Ideal für Modem und ISDN:http://www.gmx.net/de/go/smartsurfer

>
> Of course that could be an excellent solution. Do you know if there is
> already such a facility in Ruby that implements the algorithm? I'm
> quite sure Array#index doesn't do that since in order to apply a
> binary search the array has to be sorted.
>


Dear FireAphis,

I don't know of an implemented solution, but I've implemented my own ... use at your own risk!

Best regards,

Axel


class Array
def find_lower_index(cond)
len=self.length
upper=len
max_nr=(Math.log(len)/Math.log(2)).floor
lower_index=0
pow=1
while eval(cond)==false
lower_index=2**pow
pow=pow+1
end
pow=pow-2
lower=2**pow
(pow-1).downto(0) do |x|
old_test=lower_index
lower_index=lower+2**x
if eval(cond)==false
else
lower_index=old_test
end
end
return lower_index+1
end

def find_upper_index(cond)
len=self.length
upper=len
max_nr=(Math.log(len)/Math.log(2)).floor
upper_index=0
pow=1
while eval(cond)==true
upper_index=2**pow
pow=pow+1
end
pow=pow-2
upper=2**pow
(pow-1).downto(0) do |x|
old_test=upper_index
upper_index=upper+2**x
if eval(cond)
else
upper_index=old_test
end
upper=upper_index
end
return upper_index-1
end


end

# some example
len=10**8
a=(1..len).to_a
b=5*len/2
cond='lower_index>10'
r=a.find_lower_index(cond)
cond='upper_index<100'
s=a.find_upper_index(cond)
p r,s
p a[r]
p a[s]

--
Psssst! Schon vom neuen GMX MultiMessenger gehört?
Der kanns mit allen: http://www.gmx.net/de/go/multimessenger

 
Reply With Quote
 
7stud --
Guest
Posts: n/a
 
      10-03-2007
FireAphis wrote:
> My problem is that, as far as I know, find_all just iterates through
> all the elements of the array and that's very inefficient, especially
> in my case, in which I have a sorted array. Is there any standard way
> to search efficiently through a sorted array? A binary search for
> example?
>


You can also turn your array into a hash, which will work just as well
for unsorted arrays:


class Dog
attr_reader :breed, :id

def initialize(breed, id)
@breed = breed
@id = id
end

end

#Create sorted array of Dog's:
#-----------------------------
dogs = []
(1..2_000).each do |r|
dogs << Dog.new("Akita", r*2)
end

=begin
dogs = (1..2000).inject([]) do |arr, r|
arr << Dog.new("Akita", r*2)
arr
end
=end
#----------------------------+


#Convert array to hash.
#key: dog.id
#val: Dog object
#---------------------
my_id_dog_hash = {}

class << my_id_dog_hash
attr_accessor :highest_id
end

dogs.each do |dog|
my_id_dog_hash[dog.id] = dog
end

my_id_dog_hash.highest_id = dogs[-1].id
----------------------+


#Find Dog's in range:
#-------------------
def get_dogs_in_range(id_dog_hash, range)

high_id = id_dog_hash.highest_id
if range.end > high_id
range = range.begin..high_id
end


dogs_in_range = []

range.each do |r|
dog = id_dog_hash[r]

if dog
dogs_in_range << id_dog_hash[r]
end

end
=begin
dogs_in_range = range.inject([]) do |arr1, r|
if dog = id_dog_hash[r]
arr1 << dog
end

arr1
end
=end

dogs_in_range
end
------------------+


#Test it out:
------------
results = get_dogs_in_range(my_id_dog_hash, 1_990..2000)
results.each {|dog| p dog}
-----------+

--output:--
#<Dog:0x1a70c @id=1990, @breed="Akita">
#<Dog:0x1a6e4 @id=1992, @breed="Akita">
#<Dog:0x1a6bc @id=1994, @breed="Akita">
#<Dog:0x1a694 @id=1996, @breed="Akita">
#<Dog:0x1a66c @id=1998, @breed="Akita">
#<Dog:0x1a644 @id=2000, @breed="Akita">



--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
James Edward Gray II
Guest
Posts: n/a
 
      10-03-2007
On Oct 3, 2007, at 3:15 AM, FireAphis wrote:

> Of course that could be an excellent solution. Do you know if there is
> already such a facility in Ruby that implements the algorithm?


A recent Ruby Quiz centered around binary search algorithms, so you
can find several examples reading those solutions:

http://www.rubyquiz.com/quiz139.html

James Edward Gray II


 
Reply With Quote
 
FireAphis
Guest
Posts: n/a
 
      10-03-2007
On Oct 3, 1:45 pm, 7stud -- <(E-Mail Removed)> wrote:
> FireAphis wrote:
> > My problem is that, as far as I know, find_all just iterates through
> > all the elements of the array and that's very inefficient, especially
> > in my case, in which I have a sorted array. Is there any standard way
> > to search efficiently through a sorted array? A binary search for
> > example?

>
> You can also turn your array into a hash, which will work just as well
> for unsorted arrays:
>
> class Dog
> attr_reader :breed, :id
>
> def initialize(breed, id)
> @breed = breed
> @id = id
> end
>
> end
>
> #Create sorted array of Dog's:
> #-----------------------------
> dogs = []
> (1..2_000).each do |r|
> dogs << Dog.new("Akita", r*2)
> end
>
> =begin
> dogs = (1..2000).inject([]) do |arr, r|
> arr << Dog.new("Akita", r*2)
> arr
> end
> =end
> #----------------------------+
>
> #Convert array to hash.
> #key: dog.id
> #val: Dog object
> #---------------------
> my_id_dog_hash = {}
>
> class << my_id_dog_hash
> attr_accessor :highest_id
> end
>
> dogs.each do |dog|
> my_id_dog_hash[dog.id] = dog
> end
>
> my_id_dog_hash.highest_id = dogs[-1].id
> ----------------------+
>
> #Find Dog's in range:
> #-------------------
> def get_dogs_in_range(id_dog_hash, range)
>
> high_id = id_dog_hash.highest_id
> if range.end > high_id
> range = range.begin..high_id
> end
>
> dogs_in_range = []
>
> range.each do |r|
> dog = id_dog_hash[r]
>
> if dog
> dogs_in_range << id_dog_hash[r]
> end
>
> end
> =begin
> dogs_in_range = range.inject([]) do |arr1, r|
> if dog = id_dog_hash[r]
> arr1 << dog
> end
>
> arr1
> end
> =end
>
> dogs_in_range
> end
> ------------------+
>
> #Test it out:
> ------------
> results = get_dogs_in_range(my_id_dog_hash, 1_990..2000)
> results.each {|dog| p dog}
> -----------+
>
> --output:--
> #<Dog:0x1a70c @id=1990, @breed="Akita">
> #<Dog:0x1a6e4 @id=1992, @breed="Akita">
> #<Dog:0x1a6bc @id=1994, @breed="Akita">
> #<Dog:0x1a694 @id=1996, @breed="Akita">
> #<Dog:0x1a66c @id=1998, @breed="Akita">
> #<Dog:0x1a644 @id=2000, @breed="Akita">
>
> --
> Posted viahttp://www.ruby-forum.com/.


Neat. Not space efficient but undoubtfully more quick. But if the IDs
have large gaps ([10000, 20000, 30000]) do you think it still will be
more efficient compared to a simple iteration? Consider the fact that
a simple loop will have three iterations whereas your implementation
will have 30000 iteration.

Thanks,
FireAphis

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
searching a sorted list kasthurirangan.balaji@gmail.com C++ 5 08-23-2008 03:02 PM
Google search result to be URL-limited when searching site, but notwhen searching Web stumblng.tumblr Javascript 1 02-04-2008 09:01 AM
Algorithm for searching sorted strings in binary file Hunk C++ 4 04-03-2007 07:12 AM
Searching a sorted array of strings one man army Perl Misc 29 03-01-2006 05:00 AM
searching elements of an array within another array diffused Java 9 08-01-2004 10:09 AM



Advertisments