map and join or inject?

A

ako...

hello,

given an array of strings A, i need to map a given function F to each
element of A and concatenate the results.

which way is more memory efficient:

1. A.inject("") { |t,x| t + F(x) }
2. (A.map { |x| F(x) }).join

thanks
konstantin
 
E

Erik Veenstra

----------------------------------------------------------------

$ vi test.rb ; cat test.rb
GC.disable

1000.times do
(1..1000).map{|s| s.to_s}.join("")
#(1..1000).inject(""){|r, s| s.to_s; r}
end

count = 0
ObjectSpace.each_object do
count += 1
end
p count

$ ruby test.rb
1003392

----------------------------------------------------------------

$ vi test.rb ; cat test.rb
GC.disable

1000.times do
#(1..1000).map{|s| s.to_s}.join("")
(1..1000).inject(""){|r, s| s.to_s; r}
end

count = 0
ObjectSpace.each_object do
count += 1
end
p count

$ ruby test.rb
2001392
 
A

ako...

thanks, i did not know that one can count objects using ObjectSpace.

it looks like map+join use fewer objects than inject. even though map
creates a whole new array. i would expect inject to use fewer objects.
strange...

konstantin
 
B

brent.rowland

ako... said:
thanks, i did not know that one can count objects using ObjectSpace.

it looks like map+join use fewer objects than inject. even though map
creates a whole new array. i would expect inject to use fewer objects.
strange...

konstantin

It's true that map appears to create fewer objects, which may have some
advantages, but the situation isn't entirely one-sided. Now consider
the case where the result of each function consumes a whole megabyte of
memory. Using map and join, you get 1000 megabyte-long strings in an
array, so none of them can be freed. You're already a gig into your
memory before you join them, resulting in another gig used.

If, as in the example, you have garbage collection turned off, you'll
be 2 gigs (plus overhead) into your memory+swap in either scenario.
However, with the garbage collector enabled and using inject, the
results of each iteration could be freed, keeping your maximum memory
demand lower.

Of course, the complete answer would have to factor in the
implementations of the string concatenations and of the join method,
the number of memory reallocations needed by each, and the degree of
heap fragmentation incurred.

Personally, I would use some kind of memory or file stream with
inject--so that not only can the intermediate result objects be freed
but also so memory reallocations can be minimized.

In the end, Erik's approach, using metrics rather than speculation,
will be most effective.

Brent Rowland
 
J

James Edward Gray II

it looks like map+join use fewer objects than inject. even though map
creates a whole new array. i would expect inject to use fewer objects.
strange...

It has to do with the block you gave inject() and the definition of
String.+(). Watch:
1669064: "a"
1668834: "ab"
1668624: "abc"
1668424: "abcd"
1668304: "abcde"
1668144: "abcdef"
1668064: "abcdefg"
1667854: "abcdefgh"
1667764: "abcdefghi"
1667534: "abcdefghij"
=> "abcdefghij"

Each of those intermediate Strings is a new object, which is right.
That's what String.+() does:

$ ri -T String#+
--------------------------------------------------------------- String#+
str + other_str => new_str
------------------------------------------------------------------------
Concatenation---Returns a new String containing other_str
concatenated to str.

"Hello from " + self.to_s #=> "Hello from main"

The keywords there being "new String." String does have an append
method though:

$ ri -T 'String#<<'
-------------------------------------------------------------- String#<<
str << fixnum => str
str.concat(fixnum) => str
str << obj => str
str.concat(obj) => str
------------------------------------------------------------------------
Append---Concatenates the given object to str. If the object is a
Fixnum between 0 and 255, it is converted to a character before
concatenation.

a = "hello "
a << "world" #=> "hello world"
a.concat(33) #=> "hello world!"

And if we use that, we can get down to less objects than map(),
because we don't need the Array:
938158: "a"
938158: "ab"
938158: "abc"
938158: "abcd"
938158: "abcde"
938158: "abcdef"
938158: "abcdefg"
938158: "abcdefgh"
938158: "abcdefghi"
938158: "abcdefghij"
=> "abcdefghij"

Hope that helps.

James Edward Gray II
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,211
Messages
2,571,092
Members
47,693
Latest member
david4523

Latest Threads

Top