Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > Performance question

Reply
Thread Tools

Performance question

 
 
Pito Salas
Guest
Posts: n/a
 
      07-13-2009
I have a scenario where I will be storing thousands of objects in an
array. Each one will have, say, two attributes, name and age. Some of
them may have other attributes in addition.

Note: I don't need to have any methods, inheritance or any other
'class-ish' behavior. Just the data.

Here are three representations. Which one will be faster? Which one will
be smaller?

class O
def initialize (n, a)
@name = n
@age = a
end

O.new("jack", 11)

OR

{:name => "Jack", :age => 11}

OR

Struct.new("O", :name, :age)
O.new("Jack", 11)
--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
 
 
 
Joel VanderWerf
Guest
Posts: n/a
 
      07-13-2009
Pito Salas wrote:
> I have a scenario where I will be storing thousands of objects in an
> array. Each one will have, say, two attributes, name and age. Some of
> them may have other attributes in addition.
>
> Note: I don't need to have any methods, inheritance or any other
> 'class-ish' behavior. Just the data.
>
> Here are three representations. Which one will be faster? Which one will
> be smaller?


Probably the hash representation, but what about an array of

[ [name, age], ...]

Can't beat that for fast access, and storing a fixnum (age) in an array
is particularly space-efficient (only 4 bytes for this field, per
entry). Another option is the arrayfields gem, if you want array storage
but keyword interface.

--
vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

 
Reply With Quote
 
 
 
 
Ben Giddings
Guest
Posts: n/a
 
      07-13-2009
On Monday 13 July 2009 15:15:16 Pito Salas wrote:
> Here are three representations. Which one will be faster? Which one will
> be smaller?


Thousands of records, each with two fields is a very small amount of data.

With "Name" and "Age" as an example, if you have a name like "Apu
Nahasapeemapetilon" and set aside 4 bytes for the age, 100,000 records would
take up roughly: 21+4 * 100,000 => 25 * 100,000 bytes, or 2.5 megabytes.
There will be some overhead as well, but even if you double that to 5
megabytes, it's tiny by the standards of modern memory.

Speed will probably also not be a major issue. All you're doing is looking up
a value stored in memory.

Rather than worry too much about whether to use structures, objects or hashes
for speed, why not choose the data type that makes the rest of the program
easiest to write and to read. If later you decide it needs to be faster or
smaller, you can adjust it, but unless you're running on an embedded system,
it's unlikely you'll have to.

Ben


 
Reply With Quote
 
Pito Salas
Guest
Posts: n/a
 
      07-14-2009
Ben Giddings wrote:

> Rather than worry too much about whether to use structures, objects or
> hashes for speed, why not choose the data type that makes the rest of the
> program easiest to write and to read. If later you decide it needs to be
> faster or smaller, you can adjust it, but unless you're running on an
> embedded system, it's unlikely you'll have to.


The example I gave was purposely oversimplified to make it easy to
explain and understand. In reality the records will be far more complex
and the numbers perhaps in the hundreds of thousands.

But still I do agree with you. I was just trying to see if one of the
three choices was clearly brain dead or clearly the best one. Would
using a hash repeat over and over the text of the keys (I assume 'no')
or have far slower access? Would using a class that never would have a
method incur a major performance overhead because accessing each value
required a method call anyway?

- Pito





--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
Roger Pack
Guest
Posts: n/a
 
      07-14-2009

> Here are three representations. Which one will be faster? Which one will
> be smaller?


My answer is try them each and find out
I remember open structs using up a lot of memory, for some reason
though.
=r
--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
Ben Giddings
Guest
Posts: n/a
 
      07-14-2009
On Jul 13, 2009, at 21:23, Pito Salas wrote:
> The example I gave was purposely oversimplified to make it easy to
> explain and understand. In reality the records will be far more
> complex
> and the numbers perhaps in the hundreds of thousands.


Ok, well if you have an average number of bytes per record, you can
probably guesstimate how much space each record will type.

> But still I do agree with you. I was just trying to see if one of the
> three choices was clearly brain dead or clearly the best one. Would
> using a hash repeat over and over the text of the keys (I assume 'no')


When you use a string (or anything else) as a hash key, Ruby
uses .hash to figure out a hash key for it. The access time for
people["matz"].age should only be slightly slower than matz.age,
because people["matz"] has to find the hash value of "matz", look up
the value in the hash, then look up the data, whereas matz.age only
has to look up the data.

I don't know the internals of Ruby's struct vs. class implementations,
but they should be pretty similar.

> or have far slower access? Would using a class that never would have a
> method incur a major performance overhead because accessing each value
> required a method call anyway?


I'm not sure quite what you mean -- a class that never would have a
method? Do you mean a class that has no associated instance methods,
other than the attribute accessors? If you mean the attribute
accessor methods, it does add a tiny bit of overhead, but I'm pretty
sure most of that is implemented in C. Whether the classes have
additional methods associated (but never called) should not slow them
down or add additional space per instance. If you never call the
methods they're just additional memory used once in the object
definition. In memory, each unique instance of the class should just
have the data associated with each instance of a class.

If you happen to tack an extra method onto one particular instance of
a class, you'll end up with unique method data for that one instance.

In other words, somewhere in memory you should have something like:

bob = Employee.new("bob smith", 42)
ang = Employee.new("angela carter", 35)
jj = Employee.new("jason james", 73)
def jj.retired?
true
end

Employees:
initialize: (name, age); @name = var, @age = age
to_s: "Name: #{name}, Age: #{age}"
name: name
name=: name = var
age: age
age=: age = var
...

Data:
{employee, bob smith, 42}
{employee, angela carter, 35}
{employee, jason james, 73, retired?: true}
...

(I'm not completely sure how Ruby does the internals, esp instance
methods on objects, but it should be something like that).

Overall, all three implementations you talked about should be
relatively similar in speed and relatively similar in size. I think
it's easiest just to throw some sample data at them if you're
interested in how they perform. You can even use ObjectSpace to get
an idea of the count of objects in memory at various points in your
program.

Ben


 
Reply With Quote
 
David Masover
Guest
Posts: n/a
 
      07-14-2009
On Monday 13 July 2009 08:23:26 pm Pito Salas wrote:
> Would
> using a hash repeat over and over the text of the keys (I assume 'no')


Definitely no, because it looks like you're using symbols, not strings. In 1.8,
at least, it should be roughly as efficient as using integers.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Performance Tutorials Services - Boosting Performance by DisablingUnnecessary Services on Windows XP Home Edition Software Engineer Javascript 0 06-10-2011 02:18 AM
Newbie Question: python mysqldb performance question cjl Python 3 05-21-2007 02:49 AM
Performance related Question..... Cris Rock ASP .Net 1 02-12-2004 10:47 AM
Web Form Performance Versus Single File Performance jm ASP .Net 1 12-12-2003 11:14 PM
.NET Performance Question, Please Reply Don Beal ASP .Net 13 09-29-2003 03:46 PM



Advertisments