> I don't know about speed/memory use, but perhaps Struct (in stdlib)
> or Arrayfields:
>
> http://raa.ruby-lang.org/project/arrayfields/
>
> does what you want?
>
> Alex Gutteridge
>
> Bioinformatics Center
> Kyoto University
Thanks for the suggestions. I've looked at both as options and took
the time to do some profiling (speed and memory). [ See the SUMMARY
at the end if you want the short answer ]
It's really only pseudo-scientific, but it's better than where I was
at before.
Memory profiling turns out to be kind of a pain to get going. I
created a little plugin to do this based on
http://t-a-w.blogspot.com/2007/02/ruby-and-judy.html
module MemoryUsage
MY_START_MEMORY = {}
def mem_start
file = IO.read("/proc/#{Process.pid}/smaps")
MY_START_MEMORY['size'] = _size(file)
MY_START_MEMORY['rss'] = _rss(file)
end
def mem_stop
file = IO.read("/proc/#{Process.pid}/smaps")
size_dif = _size(file) - MY_START_MEMORY['size']
rss_dif = _rss(file) - MY_START_MEMORY['rss']
"SIZE: #{size_dif} kB, RSS: #{rss_dif} kB"
end
def _size(file_string)
file_string.scan(/^Size:\s+(\d+) kB/).map{|x| x[0].to_i}.inject{|
a,b| a+b}
end
def _rss(file_string)
file_string.scan(/^Rss:\s+(\d+) kB/).map{|x| x[0].to_i}.inject{|
a,b| a+b}
end
end
class Object
include MemoryUsage
end
## then you simply require 'memory_usage'
## and put 'mem_start' in your code at the beginning
## and 'puts mem_stop' in your code at the end
I will summarize the results. A struct is really nice and easy to
use:
MyClass = Struct.new(:att1, :att2, :att3)
myobject = MyClass.new(value1, value2, value3)
myobject.att1 = a_different_value # can get/set by method
puts myobject[0] ## puts value1 (can get/set by array index)
I made an array of about 500,000 arrays (6 values each). Then tested
each style of array for speed in object creation, object access by
keyword and object access by index (where applicable).
## MyObject (this is my object)
class MyObject < Array
def a ; self[0] ; end
def b ; self[1] ; end
def c ; self[2] ; end
def d ; self[3] ; end
def e ; self[4] ; end
def f ; self[5] ; end
def g ; self[6] ; end
end
## STRUCT CLASS
MyStructClass = Struct.new( *$fields )
# MyStructClass.new( *atts ) ## -> to create object
## NORMAL OBJECT
class NormalObject
attr_accessor( *$fields )
def initialize(*args)
(@a, @b, @c, @d, @e, @f, @g) = args
end
end
Access by index:
MyObject, Structs, and Arrays all have equivalent access times by
index.
Access by keywords:
Structs have essentially equivalent access times by keyword as by
index (blazing fast) and its about the same as for a normal object.
MyObject takes about 2 times as long to access by keyword.
Initialization:
Arrays initialize fastest followed by MyObject. Struct takes about
twice as long to initialize as MyObject (I think because you have to
pass in the field as a *list). Normal objects take the longest to
initialize (even though I'm efficiently setting all properties in one
call).
MEMORY:
Arrays take about 20 times less memory than structs or MyObject.
MyObject uses a little less memory than a struct, but it depends on
how the initialization is implemented. Normal objects (in this
experiment, mileage will vary I think) took a little more than twice
as much memory as Structs or MyObject.
I didn't test ArrayFields (even though it looks like a very useful
class) based on my understanding that each array that is created must
then be set with the fields that it will use (which seems like it
would be too much overhead for what I'm doing, although I could be
wrong).
*** SUMMARY *** (for situations with gazillions of set-length
objects):
1. don't use objects--if you can, use arrays. Faster and way leaner.
2. If you have to use an object, struct seems like a good
compromise. Simple to use, fast, extendable, fast access by keywords,
a little slow on object creation. Significantly more memory than an
array, but much less than a normal object.
3. The ugly MyObject from above is probably best of you want faster
initialization and plan to access mostly by index. However, it's not
implemented in any easy to create way. This is where someone with some
meta-programming magic might be able to really clean things up...
4. Avoid normal objects. They take a long time to initialize and
take up a lot of memory.