Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > Array and hash iteration questions

Reply
Thread Tools

Array and hash iteration questions

 
 
Ben Giddings
Guest
Posts: n/a
 
      09-30-2003
I have a CSV file and I'm trying to do a few things with it. Essentially
what it boils down to is: count the number of times a certain value is
seen, then count the number of times another value is seen in conjunction
with the first one.

I'm iterating over the lines of the file, and splitting them into an array
with arr = line.split(/,/). That part works well, but there are a few
questions about how to do something efficiently.

In order to count the number of times something is seen, I took the approach:

cases = Hash.new(0)
...
cases[arr[324]] += 1
...

But now I want to save the number of cases where another value occurs with
the first one. (Essentially errors indexed by case)

The approach I have now is:

cases = Hash.new(0)
errors = Hash.new(0)
...
case = arr[324]
cases[case] += 1
if arr[532] =~ /Error/
errors[case] += 1
end
...

That works, but it seems to me that I really should be doing this with one
hash, not two. Any suggestions?

Next, I want to print out the values. It is easy to do this with
cases.each, but I'd like to print them out, sorted by case. The best
solution I have so far uses cases.keys.sort.each, then inside the block
uses cases[key] (and errors[key]).

Any ideas would be appreciated.

Ben


 
Reply With Quote
 
 
 
 
Robert Klemme
Guest
Posts: n/a
 
      10-01-2003

"Ben Giddings" <bg-> schrieb im Newsbeitrag
news:...
> I have a CSV file and I'm trying to do a few things with it.

Essentially
> what it boils down to is: count the number of times a certain value is
> seen, then count the number of times another value is seen in

conjunction
> with the first one.
>
> I'm iterating over the lines of the file, and splitting them into an

array
> with arr = line.split(/,/). That part works well, but there are a few
> questions about how to do something efficiently.
>
> In order to count the number of times something is seen, I took the

approach:
>
> cases = Hash.new(0)
> ..
> cases[arr[324]] += 1
> ..
>
> But now I want to save the number of cases where another value occurs

with
> the first one. (Essentially errors indexed by case)
>
> The approach I have now is:
>
> cases = Hash.new(0)
> errors = Hash.new(0)
> ..
> case = arr[324]
> cases[case] += 1
> if arr[532] =~ /Error/
> errors[case] += 1
> end
> ..
>
> That works, but it seems to me that I really should be doing this with

one
> hash, not two. Any suggestions?


cases = Hash.new {|h,k| h[k] = [0, 0]}
...
ca = arr[324]
counter = cases[ca]
counter[0] += 1

counter[1] += 1 if /Error/ =~ arr[532]

> Next, I want to print out the values. It is easy to do this with
> cases.each, but I'd like to print them out, sorted by case. The best
> solution I have so far uses cases.keys.sort.each, then inside the block
> uses cases[key] (and errors[key]).


cases.sort.each do |ca, counter|
printf "%10s: %4d", ca, counter[0]
printf " %4d", counter[1] if counter[1] > 0
print "\n"
end

Regards

robert

 
Reply With Quote
 
 
 
 
Ben Giddings
Guest
Posts: n/a
 
      10-01-2003
Robert Klemme wrote:
> cases = Hash.new {|h,k| h[k] = [0, 0]}


Ah. I couldn't remember how to use the block form properly. I'm actually
going to use:

cases = Hash.new {|hash, key| hash[key] = Hash.new(0)}

Because it will make some of the later stuff more clear like

cases[case]['Number'] += 1
cases[case]['Errors'] += 1 if arr[OFFSET] =~ /Error/

> cases.sort.each do |ca, counter|
> printf "%10s: %4d", ca, counter[0]
> printf " %4d", counter[1] if counter[1] > 0
> print "\n"
> end


Aha, I just assumed hash didn't have a sort method, because the concept of
a "sorted hash" seemed meaningless, but since it actually returns an array
containing [key, value] pairs, that's perfect!

Thanks Robert

Ben


 
Reply With Quote
 
Robert Klemme
Guest
Posts: n/a
 
      10-02-2003

"Ben Giddings" <bg-> schrieb im Newsbeitrag
news:...
> Robert Klemme wrote:
> > cases = Hash.new {|h,k| h[k] = [0, 0]}

>
> Ah. I couldn't remember how to use the block form properly. I'm

actually
> going to use:
>
> cases = Hash.new {|hash, key| hash[key] = Hash.new(0)}
>
> Because it will make some of the later stuff more clear like
>
> cases[case]['Number'] += 1
> cases[case]['Errors'] += 1 if arr[OFFSET] =~ /Error/


No need to use a Hash for this...

Number = 0
Errors = 1

cases[case][Number] += 1
cases[case][Errors] += 1 if arr[OFFSET] =~ /Error/

I might be a bit pricky, but storing the array ref saves one hash lookup.
It *can* affect performance if you have a large amount of cases... (see
below; although the timing is dominated by the iteration here, you can see
that the array is faster)

counters = cases[case]
counters[Number] += 1
counters[Errors] += 1 if arr[OFFSET] =~ /Error/

You could as well do

cases[case].instance_eval do
self[Number] += 1
self[Errors] += 1 if arr[OFFSET] =~ /Error/
end

I'm getting carried away...

> > cases.sort.each do |ca, counter|
> > printf "%10s: %4d", ca, counter[0]
> > printf " %4d", counter[1] if counter[1] > 0
> > print "\n"
> > end

>
> Aha, I just assumed hash didn't have a sort method, because the concept

of
> a "sorted hash" seemed meaningless, but since it actually returns an

array
> containing [key, value] pairs, that's perfect!


It is! Thanks to Matz's wisdom.

> Thanks Robert


You're welcome.

Kind regards

robert


10:17:02 [ruby]: ruby -rprofile lookups.rb
% cumulative self self total
time seconds seconds calls ms/call ms/call name
62.50 13.93 13.93 2 6962.50 11140.50 Integer#upto
26.22 19.77 5.84 100001 0.06 0.06 Hash#[]
11.28 22.28 2.51 100001 0.03 0.03 Array#[]
0.07 22.30 0.01 1 15.00 15.00
Profiler__.start_profile
0.00 22.30 0.00 2 0.00 11140.50 Object#test
0.00 22.30 0.00 3 0.00 0.00 Module#method_added
0.00 22.30 0.00 1 0.00 11171.00 Object#testArray
0.00 22.30 0.00 1 0.00 22281.00 #toplevel
0.00 22.30 0.00 1 0.00 11110.00 Object#testHash
10:17:25 [ruby]: cat lookups.rb


def test(coll)
0.upto( 100000 ) do
coll[2]
end
end

def testHash
test( { 0 => 0, 1 => 1, 2 => 2 } )
end

def testArray
test( [0, 1, 2] )
end

testHash
testArray

10:18:15 [ruby]:

 
Reply With Quote
 
Robert Klemme
Guest
Posts: n/a
 
      10-02-2003

"Robert Klemme" <> schrieb im Newsbeitrag
news:blgp2a$bvnb8$...
>
> "Ben Giddings" <bg-> schrieb im Newsbeitrag
> news:...
> > Robert Klemme wrote:
> > > cases = Hash.new {|h,k| h[k] = [0, 0]}

> >
> > Ah. I couldn't remember how to use the block form properly. I'm

> actually
> > going to use:
> >
> > cases = Hash.new {|hash, key| hash[key] = Hash.new(0)}
> >
> > Because it will make some of the later stuff more clear like
> >
> > cases[case]['Number'] += 1
> > cases[case]['Errors'] += 1 if arr[OFFSET] =~ /Error/

>
> No need to use a Hash for this...
>
> Number = 0
> Errors = 1
>
> cases[case][Number] += 1
> cases[case][Errors] += 1 if arr[OFFSET] =~ /Error/
>
> I might be a bit pricky, but storing the array ref saves one hash

lookup.

> It *can* affect performance if you have a large amount of cases... (see
> below; although the timing is dominated by the iteration here, you can

see
> that the array is faster)


This sentence should really have appeared several lines above: it's the
argument in favour of using arrays instead of hashes for the counters.

Regards

robert

 
Reply With Quote
 
Alan Chen
Guest
Posts: n/a
 
      10-02-2003
"Robert Klemme" <> wrote in message news:<blgp2a$bvnb8$>...
> No need to use a Hash for this...
>
> Number = 0
> Errors = 1
>
> cases[case][Number] += 1
> cases[case][Errors] += 1 if arr[OFFSET] =~ /Error/
>
> I might be a bit pricky, but storing the array ref saves one hash lookup.
> It *can* affect performance if you have a large amount of cases... (see
> below; although the timing is dominated by the iteration here, you can see
> that the array is faster)


I'm not sure if my testing method is quite consistent, but making a specific
record object looks like it could speed things up even more...

>ruby -rprofile lookups.rb

% cumulative self self total
time seconds seconds calls ms/call ms/call name
73.74 13.08 13.08 3 4359.00 5911.67 Integer#upto
14.47 15.64 2.57 100001 0.03 0.03 Hash#[]
11.79 17.73 2.09 100001 0.02 0.02 Array#[]
0.08 17.75 0.01 1 15.00 15.00 Profiler__.start_profile
0.00 17.75 0.00 1 0.00 17735.00 #toplevel
0.00 17.75 0.00 1 0.00 0.00 Class#inherited
0.00 17.75 0.00 1 0.00 1329.00 Object#testObj
0.00 17.75 0.00 2 0.00 8203.00 Object#test
0.00 17.75 0.00 1 0.00 0.00 TestObj#initialize
0.00 17.75 0.00 1 0.00 8203.00 Object#testArray
0.00 17.75 0.00 9 0.00 0.00 Module#method_added
0.00 17.75 0.00 1 0.00 8203.00 Object#testHash
0.00 17.75 0.00 1 0.00 0.00 Module#attr_accessor
0.00 17.75 0.00 1 0.00 0.00 Class#new
>type lookups.rb

def test(coll)
0.upto( 100000 ) do
coll[2]
end
end

def testHash
test( { 0 => 0, 1 => 1, 2 => 2 } )
end

def testArray
test( [0, 1, 2] )
end


# a simple record class...
class TestObj
attr_accessor :num, :err
def initialize
@num = 0
@err = 0
end
end

def testObj
to = TestObj.new
0.upto( 100000 ) do
to.err
end
end

testHash
testArray
testObj

> 10:17:02 [ruby]: ruby -rprofile lookups.rb
> % cumulative self self total
> time seconds seconds calls ms/call ms/call name
> 62.50 13.93 13.93 2 6962.50 11140.50 Integer#upto
> 26.22 19.77 5.84 100001 0.06 0.06 Hash#[]
> 11.28 22.28 2.51 100001 0.03 0.03 Array#[]
> 0.07 22.30 0.01 1 15.00 15.00
> Profiler__.start_profile
> 0.00 22.30 0.00 2 0.00 11140.50 Object#test
> 0.00 22.30 0.00 3 0.00 0.00 Module#method_added
> 0.00 22.30 0.00 1 0.00 11171.00 Object#testArray
> 0.00 22.30 0.00 1 0.00 22281.00 #toplevel
> 0.00 22.30 0.00 1 0.00 11110.00 Object#testHash

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
hash of hash of hash of hash in c++ rp C++ 1 11-10-2011 04:45 PM
Struts - Problem with nested iteration or double iteration Rudi Java 5 10-01-2008 03:30 AM
Hash#select returns an array but Hash#reject returns a hash... Srijayanth Sridhar Ruby 19 07-02-2008 12:49 PM
Benchmark segfault [Was: Array#inject to create a hash versus Hash[*array.collect{}.flatten] ] Michal Suchanek Ruby 6 06-13-2007 04:40 AM
Array#inject to create a hash versus Hash[*array.collect{}.flatten] -- Speed, segfault Anthony Martinez Ruby 4 06-11-2007 08:16 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57