C
Chuck Remes
I need to remove duplicates from an array of arrays. I can't use
Array#uniq because some fields are different and not part of the
"key." Here's an example where the first 3 elements of each sub array
are the "key" and determine uniqueness. I want to keep only the first
one I get.
The return value of deduplicating this array should be: [[1, 2, 3, 4,
5]]
Here is my first attempt at solving the problem:
?> dupes = ary.select { |row| row[0..2] == line[0..2] }
?> dupes.first?> dedup a
=> [[1, 2, 3, 4, 5]]
This works. However, it is *super slow* when operating on my dataset.
My arrays contain hundreds of thousands of sub arrays. The unique key
for each sub array is the first 12 (of 18) elements. It is taking many
seconds to produce each intermediate array ("dupes" in the example
above), so deduping the entire thing would likely take days.
Anyone have a superior and faster solution?
cr
Array#uniq because some fields are different and not part of the
"key." Here's an example where the first 3 elements of each sub array
are the "key" and determine uniqueness. I want to keep only the first
one I get.
=> [[1, 2, 3, 4, 5], [1, 2, 3, 9, 4], [1, 2, 3, 4, 4]]a = [[1, 2, 3, 4, 5], [1, 2, 3, 9, 4], [1, 2, 3, 4, 4]]
The return value of deduplicating this array should be: [[1, 2, 3, 4,
5]]
Here is my first attempt at solving the problem:
?> dupes = ary.select { |row| row[0..2] == line[0..2] }
?> dupes.first?> dedup a
=> [[1, 2, 3, 4, 5]]
This works. However, it is *super slow* when operating on my dataset.
My arrays contain hundreds of thousands of sub arrays. The unique key
for each sub array is the first 12 (of 18) elements. It is taking many
seconds to produce each intermediate array ("dupes" in the example
above), so deduping the entire thing would likely take days.
Anyone have a superior and faster solution?
cr