Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > Premature Optimization

Reply
Thread Tools

Premature Optimization

 
 
poopdeville@gmail.com
Guest
Posts: n/a
 
      10-25-2006
Hi everybody,

I'm writing a fairly open-ended question. I'm hoping for suggestions,
opinions, advice. Suppose I have n arrays, each of which has m
entries. m is a fairly large integer, on the order of 10,000. Each
entry is either 1 or 0.

The first task I need to accomplish is figuring out how many times a 1
occurs in the ith entry in an array. So for concreteness, if I had
arrays:

first = [1,0,0,0,0]
second = [1,1,0,0,0]
third = [0,0,0,1,0]

I would end up with
count = [2,1,0,1,0]

I'm just trying to give the general flavor of what I'm working on. I
know I can use some simple each_with_index loops to increment
count[index] (something along the lines of

count = Array.new(m,0)
[first, second, third].each do |array|
array.each_with_index do |item, index|
count[index] += item
end
end

There are going to be m * 3n * (two constant multipliers for the
looping) object allocations and method calls. m is fairly large, and I
have other, similar, tasks to accomplish with this data. The faster I
can process this data, the more data I can process in a given amount of
time, and the more accurate the analysis will be.

Is there a slick way to do this with unpacking and packing? Or some
other way to do this with strings? Any modules or libraries I should
look into? I'm fairly new to Ruby, though not to scientific
computation. I realize I won't be able to do this any faster than with
m * n * (a constant multiplier), but I need that constant multiplier to
be as small as possible. Any advice would be appreciated.

Thanks!
'cid 'ooh

 
Reply With Quote
 
 
 
 
Robert Klemme
Guest
Posts: n/a
 
      10-25-2006
wrote:
> Hi everybody,
>
> I'm writing a fairly open-ended question. I'm hoping for suggestions,
> opinions, advice. Suppose I have n arrays, each of which has m
> entries. m is a fairly large integer, on the order of 10,000. Each
> entry is either 1 or 0.


> Is there a slick way to do this with unpacking and packing? Or some
> other way to do this with strings? Any modules or libraries I should
> look into? I'm fairly new to Ruby, though not to scientific
> computation. I realize I won't be able to do this any faster than with
> m * n * (a constant multiplier), but I need that constant multiplier to
> be as small as possible. Any advice would be appreciated.


I would look out for an implementation of a BitSet as this data
structure seems crucial for your application. You could even create one
your own - it's not too hard.

Kind regards

robert

 
Reply With Quote
 
 
 
 
ara.t.howard@noaa.gov
Guest
Posts: n/a
 
      10-25-2006
On Thu, 26 Oct 2006 wrote:

> Hi everybody,
>
> I'm writing a fairly open-ended question. I'm hoping for suggestions,
> opinions, advice. Suppose I have n arrays, each of which has m entries. m
> is a fairly large integer, on the order of 10,000. Each entry is either 1
> or 0.
>
> The first task I need to accomplish is figuring out how many times a 1
> occurs in the ith entry in an array. So for concreteness, if I had arrays:
>
> first = [1,0,0,0,0]
> second = [1,1,0,0,0]
> third = [0,0,0,1,0]
>
> I would end up with
> count = [2,1,0,1,0]


harp:~ > cat a.rb
require 'narray'

first = NArray.to_na [1,0,0,0,0]
second = NArray.to_na [1,1,0,0,0]
third = NArray.to_na [0,0,0,1,0]

count = first.eq(1) + second.eq(1) + third.eq(1)

p count


harp:~ > ruby a.rb
NArray.byte(5):
[ 2, 1, 0, 1, 0 ]


> I'm just trying to give the general flavor of what I'm working on. I know I
> can use some simple each_with_index loops to increment count[index]
> (something along the lines of
>
> count = Array.new(m,0)
> [first, second, third].each do |array|
> array.each_with_index do |item, index|
> count[index] += item
> end
> end
>
> There are going to be m * 3n * (two constant multipliers for the
> looping) object allocations and method calls. m is fairly large, and I
> have other, similar, tasks to accomplish with this data. The faster I
> can process this data, the more data I can process in a given amount of
> time, and the more accurate the analysis will be.


i use narray on huge (> 1gb) datasets all the time. it's blindingly fast.


regards.

-a
--
my religion is very simple. my religion is kindness. -- the dalai lama

 
Reply With Quote
 
Wilson Bilkovich
Guest
Posts: n/a
 
      10-25-2006
On 10/25/06, <> wrote:
> On Thu, 26 Oct 2006 wrote:
>
> > Hi everybody,
> >
> > I'm writing a fairly open-ended question. I'm hoping for suggestions,
> > opinions, advice. Suppose I have n arrays, each of which has m entries. m
> > is a fairly large integer, on the order of 10,000. Each entry is either 1
> > or 0.
> >
> > The first task I need to accomplish is figuring out how many times a 1
> > occurs in the ith entry in an array. So for concreteness, if I had arrays:
> >
> > first = [1,0,0,0,0]
> > second = [1,1,0,0,0]
> > third = [0,0,0,1,0]
> >
> > I would end up with
> > count = [2,1,0,1,0]

>
> harp:~ > cat a.rb
> require 'narray'
>
> first = NArray.to_na [1,0,0,0,0]
> second = NArray.to_na [1,1,0,0,0]
> third = NArray.to_na [0,0,0,1,0]
>
> count = first.eq(1) + second.eq(1) + third.eq(1)
>
> p count
>
>
> harp:~ > ruby a.rb
> NArray.byte(5):
> [ 2, 1, 0, 1, 0 ]
>
>
> > I'm just trying to give the general flavor of what I'm working on. I know I
> > can use some simple each_with_index loops to increment count[index]
> > (something along the lines of
> >
> > count = Array.new(m,0)
> > [first, second, third].each do |array|
> > array.each_with_index do |item, index|
> > count[index] += item
> > end
> > end
> >
> > There are going to be m * 3n * (two constant multipliers for the
> > looping) object allocations and method calls. m is fairly large, and I
> > have other, similar, tasks to accomplish with this data. The faster I
> > can process this data, the more data I can process in a given amount of
> > time, and the more accurate the analysis will be.

>
> i use narray on huge (> 1gb) datasets all the time. it's blindingly fast.
>
>


What's it going to take to get this into the stdlib? It is awesome.

 
Reply With Quote
 
ara.t.howard@noaa.gov
Guest
Posts: n/a
 
      10-26-2006
On Thu, 26 Oct 2006, Wilson Bilkovich wrote:

> What's it going to take to get this into the stdlib? It is awesome.


it sure it. i've be campaigning for years to no avail... maybe it's time to
bug matz again? it even compiles on windows: both it and the gsl can be found
precompiled here:

http://codeforpeople.com/lib/ruby/rb-gsl-win/

regards.

-a
--
my religion is very simple. my religion is kindness. -- the dalai lama

 
Reply With Quote
 
dga
Guest
Posts: n/a
 
      10-26-2006
Someone already mentioned the NArray way; if you treat the entire
thing as a set of matrix operations, you can make it insanely fast in
either NArray or GSL. But don't treat it as "N" arrays with "M"
entries -- treat it as a matrix, and then all of your operations can be
done in the external (very fast) code:

m = GSL::Matrix.new([1, 0, 0, 0, 0], [1, 1, 0, 0, 0], [0, 0, 0, 1, 0])

You can then either cheat, and "flatten" the matrix into a vector and
find the sum:

puts m.to_v.sum

(which works since they're ones)

Or pretend you're in Matlab and do it as a set of matrix operations:

colones = GSL::Matrix.new(1, m.size[1])
colones[0].set_all(1)

rowones = GSL::Matrix.new(1, m.size[0])
rowones[0].set_all(1)

puts ((colones * m.transpose) * rowones.transpose)[0,0]

-Dave


wrote:
> Hi everybody,
>
> I'm writing a fairly open-ended question. I'm hoping for suggestions,
> opinions, advice. Suppose I have n arrays, each of which has m
> entries. m is a fairly large integer, on the order of 10,000. Each
> entry is either 1 or 0.
>
> The first task I need to accomplish is figuring out how many times a 1
> occurs in the ith entry in an array. So for concreteness, if I had
> arrays:
>
> first = [1,0,0,0,0]
> second = [1,1,0,0,0]
> third = [0,0,0,1,0]
>
> I would end up with
> count = [2,1,0,1,0]
>
> I'm just trying to give the general flavor of what I'm working on. I
> know I can use some simple each_with_index loops to increment
> count[index] (something along the lines of
>
> count = Array.new(m,0)
> [first, second, third].each do |array|
> array.each_with_index do |item, index|
> count[index] += item
> end
> end
>
> There are going to be m * 3n * (two constant multipliers for the
> looping) object allocations and method calls. m is fairly large, and I
> have other, similar, tasks to accomplish with this data. The faster I
> can process this data, the more data I can process in a given amount of
> time, and the more accurate the analysis will be.
>
> Is there a slick way to do this with unpacking and packing? Or some
> other way to do this with strings? Any modules or libraries I should
> look into? I'm fairly new to Ruby, though not to scientific
> computation. I realize I won't be able to do this any faster than with
> m * n * (a constant multiplier), but I need that constant multiplier to
> be as small as possible. Any advice would be appreciated.
>
> Thanks!
> 'cid 'ooh


 
Reply With Quote
 
dga
Guest
Posts: n/a
 
      10-26-2006
Tweaking a bit and benchmarking with 100 rows of 10,000 elements each

user system total real
Create Matrix 3.940000 0.090000 4.030000 ( 4.037665)
Add Up Matrix 0.710000 0.000000 0.710000 ( 0.711897)
Create GSL::Matrix 3.400000 0.050000 3.450000 ( 3.496827)
Add GSL::Matrix 0.080000 0.030000 0.110000 ( 0.111982)
Matrix Add GSL::Matrix 0.020000 0.000000 0.020000 ( 0.017661)

With the "Matrix Add" done as:

(@m * colones.transpose).to_v.sum

(to avoid @m.transpose, which creates a copy of all of @m).

Have fun. Note that there are faster ways to initialize a GSL array
from an external source, if you end up concerned about that source of
overhead. (Or an NArray).

-Dave

dga wrote:
> Someone already mentioned the NArray way; if you treat the entire
> thing as a set of matrix operations, you can make it insanely fast in
> either NArray or GSL. But don't treat it as "N" arrays with "M"
> entries -- treat it as a matrix, and then all of your operations can be
> done in the external (very fast) code:
>
> m = GSL::Matrix.new([1, 0, 0, 0, 0], [1, 1, 0, 0, 0], [0, 0, 0, 1, 0])
>
> You can then either cheat, and "flatten" the matrix into a vector and
> find the sum:
>
> puts m.to_v.sum
>
> (which works since they're ones)
>
> Or pretend you're in Matlab and do it as a set of matrix operations:
>
> colones = GSL::Matrix.new(1, m.size[1])
> colones[0].set_all(1)
>
> rowones = GSL::Matrix.new(1, m.size[0])
> rowones[0].set_all(1)
>
> puts ((colones * m.transpose) * rowones.transpose)[0,0]
>
> -Dave
>
>
> wrote:
> > Hi everybody,
> >
> > I'm writing a fairly open-ended question. I'm hoping for suggestions,
> > opinions, advice. Suppose I have n arrays, each of which has m
> > entries. m is a fairly large integer, on the order of 10,000. Each
> > entry is either 1 or 0.
> >
> > The first task I need to accomplish is figuring out how many times a 1
> > occurs in the ith entry in an array. So for concreteness, if I had
> > arrays:
> >
> > first = [1,0,0,0,0]
> > second = [1,1,0,0,0]
> > third = [0,0,0,1,0]
> >
> > I would end up with
> > count = [2,1,0,1,0]
> >
> > I'm just trying to give the general flavor of what I'm working on. I
> > know I can use some simple each_with_index loops to increment
> > count[index] (something along the lines of
> >
> > count = Array.new(m,0)
> > [first, second, third].each do |array|
> > array.each_with_index do |item, index|
> > count[index] += item
> > end
> > end
> >
> > There are going to be m * 3n * (two constant multipliers for the
> > looping) object allocations and method calls. m is fairly large, and I
> > have other, similar, tasks to accomplish with this data. The faster I
> > can process this data, the more data I can process in a given amount of
> > time, and the more accurate the analysis will be.
> >
> > Is there a slick way to do this with unpacking and packing? Or some
> > other way to do this with strings? Any modules or libraries I should
> > look into? I'm fairly new to Ruby, though not to scientific
> > computation. I realize I won't be able to do this any faster than with
> > m * n * (a constant multiplier), but I need that constant multiplier to
> > be as small as possible. Any advice would be appreciated.
> >
> > Thanks!
> > 'cid 'ooh


 
Reply With Quote
 
poopdeville@gmail.com
Guest
Posts: n/a
 
      10-30-2006

wrote:
> Hi everybody,
>
> I'm writing a fairly open-ended question. I'm hoping for suggestions,
> opinions, advice.

<snip>

Thanks for all your replies. I've worked with the Gnu Scientific
Library (and ATLAS, too) before, but I didn't realize there were Ruby
binding for it. I settled on a Ruby/GSL with a custom compiled ATLAS
backend (instead of just BLAS like GSL uses normally). It's screaming
fast.

'cid 'ooh

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Any surveys about the premature optimization status nowadays? jimxoch C++ 40 10-05-2009 09:28 AM
Zero Optimization and Sign Optimization??? Ravikiran C Programming 22 11-24-2008 03:19 AM
What is Premature Optimization? tenxian Java 6 04-30-2008 11:31 PM
Premature end of script headers Wayne Deleersnyder Perl 1 11-21-2003 01:00 PM
premature end of headers, code and output looks fine nomadx Perl 1 09-19-2003 04:01 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57