Benchmark segfault [Was: Array#inject to create a hash versus Hash[*array.collect{}.flatten] ]

Michal Suchanek · Jun 12, 2007

So, I was tinkering with ways to build a hash out of transforming an
array, knowing the standard/idiomatic

id_list = [:test1, :test2, :test3]
id_list.inject({}) { |a,e| a[e]=e.object_id ; a }

I also decided to try something like this:

Hash[ *id_list.collect { |e| [e,e.object_id]}.flatten]

and further (attempt to) optimize it via
Hash[ *id_list.collect { |e| [e,e.object_id]}.flatten!]
and
Hash[ *id_list.collect! { |e| [e,e.object_id]}.flatten!]

Running this via Benchmark#bmbm gives pretty interesting, and to me,
unexpected, results (on a 3.2 GHz P4, 1GB of RAM, FC5 with ruby 1.8.4)

require 'benchmark'
id_list = (1..1_000_000).to_a
Benchmark::bmbm do |x|
x.report("inject") { id_list.inject({}) { |a,e| a[e] = e.object_id ; a} }
x.report("non-bang") { Hash[ *id_list.collect { |e| [e,e.object_id]}.flatten] }
x.report("bang") { Hash[ *id_list.collect { |e| [e,e.object_id]}.flatten!] }
x.report("two-bang") { Hash[ *id_list.collect! { |e| [e,e.object_id]}.flatten!] }
end

Rehearsal --------------------------------------------
inject 16.083333 0.033333 16.116667 ( 9.670747)
non-bang 1657.050000 1.800000 1658.850000 (995.425642)
bang 1593.716667 0.016667 1593.733333 (956.334565)
two-bang 1604.816667 1.350000 1606.166667 (963.803356)
-------------------------------- total: 4874.866667sec

user system total real
inject 5.183333 0.000000 5.183333 ( 3.102379)
non-bang zsh: segmentation fault ruby

Ow?

Also, I just thought of a similar way to accomplish the same thing:

x.report("zip") { Hash[ *id_list.zip(id_list.collect {|e| e.object_id})] }

Array#collect! won't work right with this, of course, but it seems to
have equally-bad performance. Is Array#inject just optimized for this,
or something?

Click to expand...

The reason why you are seeing this (performance as well as timing) is
most likely caused by the different approach. When you use #inject you
just create one copy of the Array (the Hash). When you use #collect you
create at least one additional copy of the large array plus a ton of two
element arrays. That's way less efficient considering memory usage and
GC. You'll probably see much different results if your input array is
much shorter (try with 10 or 100 elements).

I also get a segfault (with 1.8.5 on Gentoo):

ruby -w test.rb
Rehearsal --------------------------------------------
inject 7.210000 0.870000 8.080000 ( 13.011414)
non-bang 4620.790000 179.730000 4800.520000 (6029.976497)
bang 4586.140000 200.530000 4786.670000 (5970.267190)
two-bang 4599.560000 268.080000 4867.640000 (6035.687979)
------------------------------- total: 14462.910000sec

user system total real
inject 10.740000 1.990000 12.730000 ( 15.500874)
non-bang Segmentation fault
hramrach@hp-tc2110:11(0) 06120053 16]~ $ ruby -v
ruby 1.8.5 (2006-12-04 patchlevel 2) [i686-linux]

Of course, it would be better to test with 1.8.6 but it takes quite
some time to retest ;-)

Thanks

Michal

Robert Klemme · Jun 12, 2007

So, I was tinkering with ways to build a hash out of transforming an
array, knowing the standard/idiomatic

id_list = [:test1, :test2, :test3]
id_list.inject({}) { |a,e| a[e]=e.object_id ; a }

I also decided to try something like this:

Hash[ *id_list.collect { |e| [e,e.object_id]}.flatten]

and further (attempt to) optimize it via
Hash[ *id_list.collect { |e| [e,e.object_id]}.flatten!]
and
Hash[ *id_list.collect! { |e| [e,e.object_id]}.flatten!]

Running this via Benchmark#bmbm gives pretty interesting, and to me,
unexpected, results (on a 3.2 GHz P4, 1GB of RAM, FC5 with ruby 1.8.4)

require 'benchmark'
id_list = (1..1_000_000).to_a
Benchmark::bmbm do |x|
x.report("inject") { id_list.inject({}) { |a,e| a[e] = e.object_id ; a} }
x.report("non-bang") { Hash[ *id_list.collect { |e| [e,e.object_id]}.flatten] }
x.report("bang") { Hash[ *id_list.collect { |e| [e,e.object_id]}.flatten!] }
x.report("two-bang") { Hash[ *id_list.collect! { |e| [e,e.object_id]}.flatten!] }
end

Rehearsal --------------------------------------------
inject 16.083333 0.033333 16.116667 ( 9.670747)
non-bang 1657.050000 1.800000 1658.850000 (995.425642)
bang 1593.716667 0.016667 1593.733333 (956.334565)
two-bang 1604.816667 1.350000 1606.166667 (963.803356)
-------------------------------- total: 4874.866667sec

user system total real
inject 5.183333 0.000000 5.183333 ( 3.102379)
non-bang zsh: segmentation fault ruby

Ow?

Also, I just thought of a similar way to accomplish the same thing:

x.report("zip") { Hash[ *id_list.zip(id_list.collect {|e| e.object_id})] }

Array#collect! won't work right with this, of course, but it seems to
have equally-bad performance. Is Array#inject just optimized for this,
or something?

Click to expand...

The reason why you are seeing this (performance as well as timing) is
most likely caused by the different approach. When you use #inject you
just create one copy of the Array (the Hash). When you use #collect you
create at least one additional copy of the large array plus a ton of two
element arrays. That's way less efficient considering memory usage and
GC. You'll probably see much different results if your input array is
much shorter (try with 10 or 100 elements).

Click to expand...

I also get a segfault (with 1.8.5 on Gentoo):

ruby -w test.rb
Rehearsal --------------------------------------------
inject 7.210000 0.870000 8.080000 ( 13.011414)
non-bang 4620.790000 179.730000 4800.520000 (6029.976497)
bang 4586.140000 200.530000 4786.670000 (5970.267190)
two-bang 4599.560000 268.080000 4867.640000 (6035.687979)
------------------------------- total: 14462.910000sec

user system total real
inject 10.740000 1.990000 12.730000 ( 15.500874)
non-bang Segmentation fault
hramrach@hp-tc2110:11(0) 06120053 16]~ $ ruby -v
ruby 1.8.5 (2006-12-04 patchlevel 2) [i686-linux]

Of course, it would be better to test with 1.8.6 but it takes quite
some time to retest ;-)

Thanks

Michal

I believe it's the star operator:

13:10:22 [Temp]: ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 1000000
13:10:35 [Temp]: ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 10000000
-e:1: [BUG] Segmentation fault
ruby 1.8.6 (2007-03-13) [i386-cygwin]

Aborted (core dumped)
13:10:39 [Temp]: ruby -e 'def f(*a) end; f(*Array.new(ARGV.shift.to_i))'
1000000
13:11:09 [Temp]: ruby -e 'def f(*a) end; f(*Array.new(ARGV.shift.to_i))'
10000000
-e:1: [BUG] Segmentation fault
ruby 1.8.6 (2007-03-13) [i386-cygwin]

Aborted (core dumped)
13:11:14 [Temp]: uname -a
CYGWIN_NT-5.2-WOW64 PDBXPWSRK38 1.5.24(0.156/4/2) 2007-01-31 10:57 i686
Cygwin
13:11:36 [Temp]: ruby --version
ruby 1.8.6 (2007-03-13 patchlevel 0) [i386-cygwin]

Kind regards

robert

Florian Frank · Jun 12, 2007

Robert said:
I believe it's the star operator:

13:10:22 [Temp]: ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 1000000
13:10:35 [Temp]: ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 10000000
-e:1: [BUG] Segmentation fault
ruby 1.8.6 (2007-03-13) [i386-cygwin]

Aborted (core dumped)
13:10:39 [Temp]: ruby -e 'def f(*a) end; f(*Array.new(ARGV.shift.to_i))'
1000000
13:11:09 [Temp]: ruby -e 'def f(*a) end; f(*Array.new(ARGV.shift.to_i))'
10000000
-e:1: [BUG] Segmentation fault
ruby 1.8.6 (2007-03-13) [i386-cygwin]

Aborted (core dumped)
13:11:14 [Temp]: uname -a
CYGWIN_NT-5.2-WOW64 PDBXPWSRK38 1.5.24(0.156/4/2) 2007-01-31 10:57 i686
Cygwin
13:11:36 [Temp]: ruby --version
ruby 1.8.6 (2007-03-13 patchlevel 0) [i386-cygwin]

The reason for this problem is, that the stack is used to pass A LOT of
arguments to the Hash.[] method. If the stack size of the executed process isn't
big enough (check your resource limits), this can lead to crashes.

Robert Klemme · Jun 12, 2007

Robert said:
Robert said:

I believe it's the star operator:

13:10:22 [Temp]: ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 1000000
13:10:35 [Temp]: ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 10000000
-e:1: [BUG] Segmentation fault
ruby 1.8.6 (2007-03-13) [i386-cygwin]

Aborted (core dumped)
13:10:39 [Temp]: ruby -e 'def f(*a) end; f(*Array.new(ARGV.shift.to_i))'
1000000
13:11:09 [Temp]: ruby -e 'def f(*a) end; f(*Array.new(ARGV.shift.to_i))'
10000000
-e:1: [BUG] Segmentation fault
ruby 1.8.6 (2007-03-13) [i386-cygwin]

Aborted (core dumped)
13:11:14 [Temp]: uname -a
CYGWIN_NT-5.2-WOW64 PDBXPWSRK38 1.5.24(0.156/4/2) 2007-01-31 10:57 i686
Cygwin
13:11:36 [Temp]: ruby --version
ruby 1.8.6 (2007-03-13 patchlevel 0) [i386-cygwin]

Click to expand...

The reason for this problem is, that the stack is used to pass A LOT of
arguments to the Hash.[] method. If the stack size of the executed process isn't
big enough (check your resource limits), this can lead to crashes.

That was what I was guessing. However, I haven't enough insight to
verify. There is at least one point that causes me doubt: the array can
be handled and stored like any other array, so it should be on the heap
(or it is implicitly moved there).

Kind regards

robert

Mauricio Fernandez · Jun 12, 2007

Robert said:
Robert said:

I believe it's the star operator:

13:10:22 [Temp]: ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 1000000
13:10:35 [Temp]: ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 10000000
-e:1: [BUG] Segmentation fault
ruby 1.8.6 (2007-03-13) [i386-cygwin]

Click to expand...

[...]
The reason for this problem is, that the stack is used to pass A LOT of
arguments to the Hash.[] method. If the stack size of the executed process
isn't
big enough (check your resource limits), this can lead to crashes.

Click to expand...

That was what I was guessing. However, I haven't enough insight to
verify. There is at least one point that causes me doubt: the array can
be handled and stored like any other array, so it should be on the heap
(or it is implicitly moved there).

$ ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 2091000
-e:1:in `[]': stack level too deep (SystemStackError)
from -e:1
$ ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 2092000
Segmentation fault
$ ulimit -s
8192
$ ulimit -s 16384
$ ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 2092000
$ ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 4184000
-e:1:in `[]': stack level too deep (SystemStackError)
from -e:1
$ ruby -e 'Hash[*Array.new(ARGV.shift.to_i) ]' 4190000
Segmentation fault

Florian Frank · Jun 12, 2007

Robert said:
That was what I was guessing. However, I haven't enough insight to
verify. There is at least one point that causes me doubt: the array can
be handled and stored like any other array, so it should be on the heap
(or it is implicitly moved there).

Yes, the implementation of this is in eval.c's rb_call0, after that the
arguments are passed along via the argc/argv function arguments. Ruby also tries
to throw a SystemStackError in rb_call0 like Mauricio has shown, but this isn't
guaranteed to work in all cases.

Nobuyoshi Nakada · Jun 13, 2007

Hi,

At Wed, 13 Jun 2007 00:50:30 +0900,
Florian Frank wrote in [ruby-talk:255348]:

Yes, the implementation of this is in eval.c's rb_call0, after that the
arguments are passed along via the argc/argv function arguments. Ruby also tries
to throw a SystemStackError in rb_call0 like Mauricio has shown, but this isn't
guaranteed to work in all cases.

Actually, the definition of TMP_ALLOC. Try replacing #ifdef
C_ALLOCA with #if 1.

Array#inject to create a hash versus Hash[*array.collect{}.flatten] -- Speed, segfault	4	Jun 11, 2007
Ruby Weekly News 6th - 12th June 2005	0	Jun 14, 2005
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	15	Apr 1, 2006

Benchmark segfault [Was: Array#inject to create a hash versus Hash[*array.collect{}.flatten] ]

Michal Suchanek

Robert Klemme

Florian Frank

Robert Klemme

Mauricio Fernandez

Florian Frank

Nobuyoshi Nakada

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads