Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > Benchmark segfault [Was: Array#inject to create a hash versus Hash[*array.collect{}.flatten] ]

Reply
Thread Tools

Benchmark segfault [Was: Array#inject to create a hash versus Hash[*array.collect{}.flatten] ]

 
 
Michal Suchanek
Guest
Posts: n/a
 
      06-12-2007
On 11/06/07, Robert Klemme <(E-Mail Removed)> wrote:
> On 11.06.2007 07:41, Anthony Martinez wrote:
> > So, I was tinkering with ways to build a hash out of transforming an
> > array, knowing the standard/idiomatic
> >
> > id_list = [:test1, :test2, :test3]
> > id_list.inject({}) { |a,e| a[e]=e.object_id ; a }
> >
> > I also decided to try something like this:
> >
> > Hash[ *id_list.collect { |e| [e,e.object_id]}.flatten]
> >
> > and further (attempt to) optimize it via
> > Hash[ *id_list.collect { |e| [e,e.object_id]}.flatten!]
> > and
> > Hash[ *id_list.collect! { |e| [e,e.object_id]}.flatten!]
> >
> > Running this via Benchmark#bmbm gives pretty interesting, and to me,
> > unexpected, results (on a 3.2 GHz P4, 1GB of RAM, FC5 with ruby 1.8.4)
> >
> > require 'benchmark'
> > id_list = (1..1_000_000).to_a
> > Benchmark::bmbm do |x|
> > x.report("inject") { id_list.inject({}) { |a,e| a[e] = e.object_id ; a} }
> > x.report("non-bang") { Hash[ *id_list.collect { |e| [e,e.object_id]}.flatten] }
> > x.report("bang") { Hash[ *id_list.collect { |e| [e,e.object_id]}.flatten!] }
> > x.report("two-bang") { Hash[ *id_list.collect! { |e| [e,e.object_id]}.flatten!] }
> > end
> >
> > Rehearsal --------------------------------------------
> > inject 16.083333 0.033333 16.116667 ( 9.670747)
> > non-bang 1657.050000 1.800000 1658.850000 (995.425642)
> > bang 1593.716667 0.016667 1593.733333 (956.334565)
> > two-bang 1604.816667 1.350000 1606.166667 (963.803356)
> > -------------------------------- total: 4874.866667sec
> >
> > user system total real
> > inject 5.183333 0.000000 5.183333 ( 3.102379)
> > non-bang zsh: segmentation fault ruby
> >
> > Ow?
> >
> > Also, I just thought of a similar way to accomplish the same thing:
> >
> > x.report("zip") { Hash[ *id_list.zip(id_list.collect {|e| e.object_id})] }
> >
> > Array#collect! won't work right with this, of course, but it seems to
> > have equally-bad performance. Is Array#inject just optimized for this,
> > or something?

>
> The reason why you are seeing this (performance as well as timing) is
> most likely caused by the different approach. When you use #inject you
> just create one copy of the Array (the Hash). When you use #collect you
> create at least one additional copy of the large array plus a ton of two
> element arrays. That's way less efficient considering memory usage and
> GC. You'll probably see much different results if your input array is
> much shorter (try with 10 or 100 elements).
>


I also get a segfault (with 1.8.5 on Gentoo):

ruby -w test.rb
Rehearsal --------------------------------------------
inject 7.210000 0.870000 8.080000 ( 13.011414)
non-bang 4620.790000 179.730000 4800.520000 (6029.976497)
bang 4586.140000 200.530000 4786.670000 (5970.267190)
two-bang 4599.560000 268.080000 4867.640000 (6035.687979)
------------------------------- total: 14462.910000sec

user system total real
inject 10.740000 1.990000 12.730000 ( 15.500874)
non-bang Segmentation fault
hramrach@hp-tc2110:11(0) 06120053 16]~ $ ruby -v
ruby 1.8.5 (2006-12-04 patchlevel 2) [i686-linux]

Of course, it would be better to test with 1.8.6 but it takes quite
some time to retest

Thanks

Michal

 
Reply With Quote
 
 
 
 
Robert Klemme
Guest
Posts: n/a
 
      06-12-2007
On 12.06.2007 11:52, Michal Suchanek wrote:
> On 11/06/07, Robert Klemme <(E-Mail Removed)> wrote:
>> On 11.06.2007 07:41, Anthony Martinez wrote:
>> > So, I was tinkering with ways to build a hash out of transforming an
>> > array, knowing the standard/idiomatic
>> >
>> > id_list = [:test1, :test2, :test3]
>> > id_list.inject({}) { |a,e| a[e]=e.object_id ; a }
>> >
>> > I also decided to try something like this:
>> >
>> > Hash[ *id_list.collect { |e| [e,e.object_id]}.flatten]
>> >
>> > and further (attempt to) optimize it via
>> > Hash[ *id_list.collect { |e| [e,e.object_id]}.flatten!]
>> > and
>> > Hash[ *id_list.collect! { |e| [e,e.object_id]}.flatten!]
>> >
>> > Running this via Benchmark#bmbm gives pretty interesting, and to me,
>> > unexpected, results (on a 3.2 GHz P4, 1GB of RAM, FC5 with ruby 1.8.4)
>> >
>> > require 'benchmark'
>> > id_list = (1..1_000_000).to_a
>> > Benchmark::bmbm do |x|
>> > x.report("inject") { id_list.inject({}) { |a,e| a[e] =

>> e.object_id ; a} }
>> > x.report("non-bang") { Hash[ *id_list.collect { |e|

>> [e,e.object_id]}.flatten] }
>> > x.report("bang") { Hash[ *id_list.collect { |e|

>> [e,e.object_id]}.flatten!] }
>> > x.report("two-bang") { Hash[ *id_list.collect! { |e|

>> [e,e.object_id]}.flatten!] }
>> > end
>> >
>> > Rehearsal --------------------------------------------
>> > inject 16.083333 0.033333 16.116667 ( 9.670747)
>> > non-bang 1657.050000 1.800000 1658.850000 (995.425642)
>> > bang 1593.716667 0.016667 1593.733333 (956.334565)
>> > two-bang 1604.816667 1.350000 1606.166667 (963.803356)
>> > -------------------------------- total: 4874.866667sec
>> >
>> > user system total real
>> > inject 5.183333 0.000000 5.183333 ( 3.102379)
>> > non-bang zsh: segmentation fault ruby
>> >
>> > Ow?
>> >
>> > Also, I just thought of a similar way to accomplish the same thing:
>> >
>> > x.report("zip") { Hash[ *id_list.zip(id_list.collect {|e|

>> e.object_id})] }
>> >
>> > Array#collect! won't work right with this, of course, but it seems to
>> > have equally-bad performance. Is Array#inject just optimized for this,
>> > or something?

>>
>> The reason why you are seeing this (performance as well as timing) is
>> most likely caused by the different approach. When you use #inject you
>> just create one copy of the Array (the Hash). When you use #collect you
>> create at least one additional copy of the large array plus a ton of two
>> element arrays. That's way less efficient considering memory usage and
>> GC. You'll probably see much different results if your input array is
>> much shorter (try with 10 or 100 elements).
>>

>
> I also get a segfault (with 1.8.5 on Gentoo):
>
> ruby -w test.rb
> Rehearsal --------------------------------------------
> inject 7.210000 0.870000 8.080000 ( 13.011414)
> non-bang 4620.790000 179.730000 4800.520000 (6029.976497)
> bang 4586.140000 200.530000 4786.670000 (5970.267190)
> two-bang 4599.560000 268.080000 4867.640000 (6035.687979)
> ------------------------------- total: 14462.910000sec
>
> user system total real
> inject 10.740000 1.990000 12.730000 ( 15.500874)
> non-bang Segmentation fault
> hramrach@hp-tc2110:11(0) 06120053 16]~ $ ruby -v
> ruby 1.8.5 (2006-12-04 patchlevel 2) [i686-linux]
>
> Of course, it would be better to test with 1.8.6 but it takes quite
> some time to retest
>
> Thanks
>
> Michal


I believe it's the star operator:

13:10:22 [Temp]: ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 1000000
13:10:35 [Temp]: ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 10000000
-e:1: [BUG] Segmentation fault
ruby 1.8.6 (2007-03-13) [i386-cygwin]

Aborted (core dumped)
13:10:39 [Temp]: ruby -e 'def f(*a) end; f(*Array.new(ARGV.shift.to_i))'
1000000
13:11:09 [Temp]: ruby -e 'def f(*a) end; f(*Array.new(ARGV.shift.to_i))'
10000000
-e:1: [BUG] Segmentation fault
ruby 1.8.6 (2007-03-13) [i386-cygwin]

Aborted (core dumped)
13:11:14 [Temp]: uname -a
CYGWIN_NT-5.2-WOW64 PDBXPWSRK38 1.5.24(0.156/4/2) 2007-01-31 10:57 i686
Cygwin
13:11:36 [Temp]: ruby --version
ruby 1.8.6 (2007-03-13 patchlevel 0) [i386-cygwin]

Kind regards

robert
 
Reply With Quote
 
 
 
 
Florian Frank
Guest
Posts: n/a
 
      06-12-2007
Robert Klemme wrote:
> I believe it's the star operator:
>
> 13:10:22 [Temp]: ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 1000000
> 13:10:35 [Temp]: ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 10000000
> -e:1: [BUG] Segmentation fault
> ruby 1.8.6 (2007-03-13) [i386-cygwin]
>
> Aborted (core dumped)
> 13:10:39 [Temp]: ruby -e 'def f(*a) end; f(*Array.new(ARGV.shift.to_i))'
> 1000000
> 13:11:09 [Temp]: ruby -e 'def f(*a) end; f(*Array.new(ARGV.shift.to_i))'
> 10000000
> -e:1: [BUG] Segmentation fault
> ruby 1.8.6 (2007-03-13) [i386-cygwin]
>
> Aborted (core dumped)
> 13:11:14 [Temp]: uname -a
> CYGWIN_NT-5.2-WOW64 PDBXPWSRK38 1.5.24(0.156/4/2) 2007-01-31 10:57 i686
> Cygwin
> 13:11:36 [Temp]: ruby --version
> ruby 1.8.6 (2007-03-13 patchlevel 0) [i386-cygwin]


The reason for this problem is, that the stack is used to pass A LOT of
arguments to the Hash.[] method. If the stack size of the executed process isn't
big enough (check your resource limits), this can lead to crashes.

--
Florian Frank

 
Reply With Quote
 
Robert Klemme
Guest
Posts: n/a
 
      06-12-2007
On 12.06.2007 16:16, Florian Frank wrote:
> Robert Klemme wrote:
>> I believe it's the star operator:
>>
>> 13:10:22 [Temp]: ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 1000000
>> 13:10:35 [Temp]: ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 10000000
>> -e:1: [BUG] Segmentation fault
>> ruby 1.8.6 (2007-03-13) [i386-cygwin]
>>
>> Aborted (core dumped)
>> 13:10:39 [Temp]: ruby -e 'def f(*a) end; f(*Array.new(ARGV.shift.to_i))'
>> 1000000
>> 13:11:09 [Temp]: ruby -e 'def f(*a) end; f(*Array.new(ARGV.shift.to_i))'
>> 10000000
>> -e:1: [BUG] Segmentation fault
>> ruby 1.8.6 (2007-03-13) [i386-cygwin]
>>
>> Aborted (core dumped)
>> 13:11:14 [Temp]: uname -a
>> CYGWIN_NT-5.2-WOW64 PDBXPWSRK38 1.5.24(0.156/4/2) 2007-01-31 10:57 i686
>> Cygwin
>> 13:11:36 [Temp]: ruby --version
>> ruby 1.8.6 (2007-03-13 patchlevel 0) [i386-cygwin]

>
> The reason for this problem is, that the stack is used to pass A LOT of
> arguments to the Hash.[] method. If the stack size of the executed process isn't
> big enough (check your resource limits), this can lead to crashes.


That was what I was guessing. However, I haven't enough insight to
verify. There is at least one point that causes me doubt: the array can
be handled and stored like any other array, so it should be on the heap
(or it is implicitly moved there).

Kind regards

robert
 
Reply With Quote
 
Mauricio Fernandez
Guest
Posts: n/a
 
      06-12-2007
On Tue, Jun 12, 2007 at 11:25:04PM +0900, Robert Klemme wrote:
> On 12.06.2007 16:16, Florian Frank wrote:
> >Robert Klemme wrote:
> >>I believe it's the star operator:
> >>
> >>13:10:22 [Temp]: ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 1000000
> >>13:10:35 [Temp]: ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 10000000
> >>-e:1: [BUG] Segmentation fault
> >>ruby 1.8.6 (2007-03-13) [i386-cygwin]

[...]
> >The reason for this problem is, that the stack is used to pass A LOT of
> >arguments to the Hash.[] method. If the stack size of the executed process
> >isn't
> >big enough (check your resource limits), this can lead to crashes.

>
> That was what I was guessing. However, I haven't enough insight to
> verify. There is at least one point that causes me doubt: the array can
> be handled and stored like any other array, so it should be on the heap
> (or it is implicitly moved there).


$ ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 2091000
-e:1:in `[]': stack level too deep (SystemStackError)
from -e:1
$ ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 2092000
Segmentation fault
$ ulimit -s
8192
$ ulimit -s 16384
$ ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 2092000
$ ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 4184000
-e:1:in `[]': stack level too deep (SystemStackError)
from -e:1
$ ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 4190000
Segmentation fault

--
Mauricio Fernandez - http://eigenclass.org - singular Ruby

 
Reply With Quote
 
Florian Frank
Guest
Posts: n/a
 
      06-12-2007
Robert Klemme wrote:
> That was what I was guessing. However, I haven't enough insight to
> verify. There is at least one point that causes me doubt: the array can
> be handled and stored like any other array, so it should be on the heap
> (or it is implicitly moved there).


Yes, the implementation of this is in eval.c's rb_call0, after that the
arguments are passed along via the argc/argv function arguments. Ruby also tries
to throw a SystemStackError in rb_call0 like Mauricio has shown, but this isn't
guaranteed to work in all cases.

--
Florian Frank

 
Reply With Quote
 
Nobuyoshi Nakada
Guest
Posts: n/a
 
      06-13-2007
Hi,

At Wed, 13 Jun 2007 00:50:30 +0900,
Florian Frank wrote in [ruby-talk:255348]:
> Yes, the implementation of this is in eval.c's rb_call0, after that the
> arguments are passed along via the argc/argv function arguments. Ruby also tries
> to throw a SystemStackError in rb_call0 like Mauricio has shown, but this isn't
> guaranteed to work in all cases.


Actually, the definition of TMP_ALLOC. Try replacing #ifdef
C_ALLOCA with #if 1.

--
Nobu Nakada

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
hash of hash of hash of hash in c++ rp C++ 1 11-10-2011 04:45 PM
invoking a segfault within a segfault handler - is this undefinedbehavior? Andrey Vul C Programming 8 07-30-2010 02:14 PM
Re: Mozilla versus IE versus Opera versus Safari Peter Potamus the Purple Hippo Firefox 0 05-08-2008 12:56 PM
equal? versus eql? versus == versus === verus <=> Paul Butcher Ruby 12 11-28-2007 06:06 AM
Array#inject to create a hash versus Hash[*array.collect{}.flatten] -- Speed, segfault Anthony Martinez Ruby 4 06-11-2007 08:16 AM



Advertisments