How to get non-unique elements from an array?

P

Park Heesob

Hi,
On second thought, Mark's may have been the best overall. Either way,
both Mark's and Martin's are good algorithms. Many of the others were
good as well.

Mine isn't the shortest, but it's the fastest I've seen thus far! Twice
as fast as the next closest! Why does everyone leave me out of benchmark,
;)

---my code
t=[]; a.delete_if{ |e| r=(not t.include? e); t.push(e); r }
---end my code

Zach
Here is my code

t = Hash.new(0); a.each{|e|t[e]+=1};t.delete_if{|k,v|v==1}.keys

It is reasonably fast for both small and large arrays
small array
zach 0.328000 0.000000 0.328000 ( 0.328000)
markvh 0.390000 0.000000 0.390000 ( 0.390000)
martin 0.422000 0.000000 0.422000 ( 0.422000)
xxx 0.344000 0.000000 0.344000 ( 0.344000) <== MINE

large array
zach 2.204000 0.000000 2.204000 ( 2.203000)
markvh 0.015000 0.000000 0.015000 ( 0.016000)
martin 0.000000 0.000000 0.000000 ( 0.000000)
xxx 0.000000 0.000000 0.000000 ( 0.000000) <== MINE

Park Heesob
 
S

Sam Kong

I must confess that I love this group...:)
Very enthusiastic!!!

Ryan said:
Overall Martin's was the best. As he suspected, David's was quite
slow. Sam's as well.

Ryan, thanks for the benchmarking.
Please, add my original solution to your benchmark.
It seems to be very fast.
The reason I asked the question was not performance but briefness and
rubish style.
The slow one I came up later looks very intuitive and brief but suffers
from performance.
Briefness and performance don't always go together.

Regards,
Sam
 
R

Ryan Leavengood

Ryan, thanks for the benchmarking.
Please, add my original solution to your benchmark.
It seems to be very fast.

Yes it is pretty fast, see below.
The reason I asked the question was not performance but briefness and
rubish style.
The slow one I came up later looks very intuitive and brief but suffers
from performance.
Briefness and performance don't always go together.

Absolutely. Many "cute" solutions to problems like this are quite
slow. But keep in mind the benchmarking cases are pretty extreme, and
Ruby is quite fast in all cases except the particularly slow
algorithms.

Here are the latest results, with Park's solution and Sam's first
solution added to the mix:

Small array, many iterations:
user system total real
david 6.063000 0.015000 6.078000 ( 6.094000)
jeff 1.719000 0.000000 1.719000 ( 1.719000)
jegII 1.844000 0.000000 1.844000 ( 1.860000)
markvh 1.437000 0.000000 1.437000 ( 1.438000)
martin 1.750000 0.000000 1.750000 ( 1.766000)
paolo 3.469000 0.016000 3.485000 ( 3.499000)
park 2.141000 0.000000 2.141000 ( 2.157000)
paul 4.016000 0.000000 4.016000 ( 4.188000)
robert 3.500000 0.015000 3.515000 ( 3.766000)
samk1 2.094000 0.016000 2.110000 ( 2.266000)
samk2 5.391000 0.000000 5.391000 ( 5.546000)
simonk 1.609000 0.000000 1.609000 ( 1.610000)
simons 2.547000 0.000000 2.547000 ( 2.578000)
zach 0.344000 0.000000 0.344000 ( 0.344000)

Big array, one iteration:
user system total real
david 35.093000 0.000000 35.093000 ( 35.906000)
jeff 0.110000 0.000000 0.110000 ( 0.109000)
jegII 0.000000 0.000000 0.000000 ( 0.000000)
markvh 2.172000 0.000000 2.172000 ( 2.172000)
martin 0.016000 0.000000 0.016000 ( 0.016000)
paolo 0.015000 0.000000 0.015000 ( 0.015000)
park 0.015000 0.000000 0.015000 ( 0.015000)
paul 0.063000 0.000000 0.063000 ( 0.063000)
robert 0.047000 0.000000 0.047000 ( 0.047000)
samk1 0.031000 0.000000 0.031000 ( 0.031000)
samk2 17.750000 0.000000 17.750000 ( 17.812000)
simonk 4.203000 0.000000 4.203000 ( 4.297000)
simons 2.344000 0.000000 2.344000 ( 2.500000)
zach 2.156000 0.000000 2.156000 ( 2.297000)

Here is the benchmark code, which I've tweaked a bit so that hopefully
each test won't affect another (again, beware of line-wrapping):

require 'benchmark'

def run_benchmark(bx, name, iterations, &block)
GC.start
bx.report(name) { iterations.times &block }
end

[[[0,1,2,3,4,5,2,3,2,3], 50000],[(1..5000).to_a + (2000..2500).to_a,
1]].each do |a, iterations|
Benchmark.bm(10) do |bx|
run_benchmark(bx, 'david', iterations) { a.uniq.find_all {|x|
a.find_all {|y| y =3D=3D x }.size > 1 } }
run_benchmark(bx, 'jeff', iterations) { a.uniq.inject(a.dup) {
|b,i| b.delete_at(b.index(i)) ; b } }
run_benchmark(bx, 'jegII', iterations) { seen =3D
Hash.new(0);a.select { |e| (seen[e] +=3D 1) > 1 }.uniq }
run_benchmark(bx, 'markvh', iterations) { t=3D[]; a.select{ |e|
r=3Dt.include?e; t<<e; r }.uniq }
run_benchmark(bx, 'martin', iterations) { h1 =3D {};h2 =3D {};a.each
{|i| h2 =3D true if h1;h1 =3D true };h2.keys }
run_benchmark(bx, 'paolo', iterations) { h =3D {};u =3D a.inject([])
{|res, x| h[x] ? res + [x] : h[x] =3D res}.uniq }
run_benchmark(bx, 'park', iterations) { t =3D Hash.new(0);
a.each{|e|t[e]+=3D1};t.delete_if{|k,v|v=3D=3D1}.keys }
run_benchmark(bx, 'paul', iterations) { a.sort.join('
').to_s.scan(/(\d+)\s(\1)+/).flatten.uniq }
run_benchmark(bx, 'robert', iterations) { a.inject(Hash.new(0))
{|h,e| h[e]+=3D1;h}.select {|k,v|v>1}.map {|x,y|x} }
run_benchmark(bx, 'samk1', iterations) { h =3D {};a.each do |i| if
h then h +=3D 1 else h =3D 1 end;end;u =3D []; h.each do |k, v| if
v > 1 then u << k end; end; u }
run_benchmark(bx, 'samk2', iterations) { a.select{|e|
a.grep(e).length > 1}.uniq }
run_benchmark(bx, 'simonk', iterations) { a.select{|e| a.index(e)
!=3D a.rindex(e)}.uniq }
run_benchmark(bx, 'simons', iterations) { uary =3D a.uniq;
a.map.delete_if{|i| x=3Duary.member?(i); uary.delete(i); x} }
run_benchmark(bx, 'zach', iterations) { t=3D[]; a.delete_if{ |e|
r=3D(not t.include? e); t.push(e); r }.uniq }
end
end

Regards,
Ryan
 
Z

Zach Dennis

Could someone explain to my dense brain....how some of these solutions
can be slower on little arrays, but they have "0.0000" time on a big
array? Even if it was "0.0001"....I would buy it, but '0.0000' is
wigging me out. =)


Zach
 
A

Ara.T.Howard

Here are the latest results, with Park's solution and Sam's first solution
added to the mix:

a couple of observations: zach's method destroys the array at each iteration
and is unique in this respect. to be fair a 'dup' must be added to his
approach. you arrays are composed in a quite regular fashion which give the
algoritm's that favour linear search a huge advantage - a random distribution
of dups makes the task much harder. i implemented changes for the above and
now:

harp:~ > ruby a.rb
user system total real
david 5.330000 0.000000 5.330000 ( 5.350960)
jeff 0.370000 0.000000 0.370000 ( 0.373027)
jegII 0.320000 0.010000 0.330000 ( 0.324357)
markvh 0.610000 0.000000 0.610000 ( 0.606635)
martin 0.300000 0.000000 0.300000 ( 0.308640)
paolo 0.540000 0.000000 0.540000 ( 0.537689)
park 0.400000 0.000000 0.400000 ( 0.403531)
paul 0.660000 0.000000 0.660000 ( 0.662007)
robert 0.640000 0.000000 0.640000 ( 0.642736)
samk1 0.410000 0.000000 0.410000 ( 0.418733)
samk2 4.730000 0.000000 4.730000 ( 5.043174)
simonk 1.060000 0.000000 1.060000 ( 1.052031)
simons 1.150000 0.000000 1.150000 ( 1.160381)
zach 0.640000 0.000000 0.640000 ( 0.641282)
user system total real
david 12.600000 0.000000 12.600000 ( 12.637510)
jeff 0.200000 0.000000 0.200000 ( 0.203639)
jegII 0.010000 0.000000 0.010000 ( 0.007662)
markvh 0.880000 0.000000 0.880000 ( 0.872928)
martin 0.010000 0.000000 0.010000 ( 0.007404)
paolo 0.030000 0.000000 0.030000 ( 0.028461)
park 0.010000 0.000000 0.010000 ( 0.009939)
paul 0.020000 0.000000 0.020000 ( 0.019959)
robert 0.010000 0.010000 0.020000 ( 0.015718)
samk1 0.010000 0.000000 0.010000 ( 0.009769)
samk2 10.050000 0.000000 10.050000 ( 10.088609)
simonk 1.950000 0.000000 1.950000 ( 1.949996)
simons 1.420000 0.000000 1.420000 ( 1.415801)
zach 0.890000 0.000000 0.890000 ( 0.890465)

jegII, martin, and samk1 are two orders of magnitude faster than the delete_if
(destroys 'a') approach!

also, your concept of 'big' doesn't seem that, er, big - running with larger
numbers is also revealing - i started a process with

harp:~ > ruby a.rb 424 42424 (factor of 10 increase in size)

1 hour ago and it's still not complete so many of the algorithms obviously
scale horribly. i'll post the results when (if) it finishes.


here's the code:

harp:~ > cat a.rb
require 'benchmark'

def run_benchmark(bx, name, iterations, &block)
GC.start
fork{ bx.report(name) { iterations.times &block }; exit! }
Process::wait
end

def dupd_array size, min_percent_dups
a = Array::new(size).map{ rand size }
n = ((size * min_percent_dups) / 2).ceil
n.times{ i = rand(size); a = a[[i + 1, size - 1].min]}
a
end

min = Integer(ARGV.shift || 42)
max = Integer(ARGV.shift || 4242)
min_percent_dups = Float(ARGV.shift || 0.15)

small = dupd_array min, min_percent_dups
big = dupd_array max, min_percent_dups

[
[small, 4242],
[big, 1],
].each do |a, iterations|
Benchmark.bm(10) do |bx|
run_benchmark(bx, 'david', iterations) { a.uniq.find_all {|x| a.find_all {|y| y == x }.size > 1 } }
run_benchmark(bx, 'jeff', iterations) { a.uniq.inject(a.dup) { |b,i| b.delete_at(b.index(i)) ; b } }
run_benchmark(bx, 'jegII', iterations) { seen = Hash.new(0);a.select { |e| (seen[e] += 1) > 1 }.uniq }
run_benchmark(bx, 'markvh', iterations) { t=[]; a.select{ |e| r=t.include?e; t<<e; r }.uniq }
run_benchmark(bx, 'martin', iterations) { h1 = {};h2 = {};a.each {|i| h2 = true if h1;h1 = true };h2.keys }
run_benchmark(bx, 'paolo', iterations) { h = {};u = a.inject([]) {|res, x| h[x] ? res + [x] : h[x] = res}.uniq }
run_benchmark(bx, 'park', iterations) { t = Hash.new(0); a.each{|e|t[e]+=1};t.delete_if{|k,v|v==1}.keys }
run_benchmark(bx, 'paul', iterations) { a.sort.join(' ').to_s.scan(/(\d+)\s(\1)+/).flatten.uniq }
run_benchmark(bx, 'robert', iterations) { a.inject(Hash.new(0)) {|h,e| h[e]+=1;h}.select {|k,v|v>1}.map {|x,y|x} }
run_benchmark(bx, 'samk1', iterations) { h = {};a.each do |i| if h then h += 1 else h = 1 end;end;u = []; h.each do |k, v| if v > 1 then u << k end; end; u }
run_benchmark(bx, 'samk2', iterations) { a.select{|e| a.grep(e).length > 1}.uniq }
run_benchmark(bx, 'simonk', iterations) { a.select{|e| a.index(e) != a.rindex(e)}.uniq }
run_benchmark(bx, 'simons', iterations) { uary = a.uniq; a.map.delete_if{|i| x=uary.member?(i); uary.delete(i); x} }
run_benchmark(bx, 'zach', iterations) { t=[]; a.dup.delete_if{ |e| r=(not t.include? e); t.push(e); r }.uniq }
end
end


-a
--
===============================================================================
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| anything that contradicts experience and logic should be abandoned.
| -- h.h. the 14th dalai lama
===============================================================================
 
A

Ara.T.Howard

also, your concept of 'big' doesn't seem that, er, big - running with larger
numbers is also revealing - i started a process with

harp:~ > ruby a.rb 424 42424 (factor of 10 increase in size)

1 hour ago and it's still not complete so many of the algorithms obviously
scale horribly. i'll post the results when (if) it finishes.

results so far:

harp:~ > ruby a.rb 424 42424
user system total real
david 616.620000 0.000000 616.620000 (626.621697)
jeff 10.610000 0.000000 10.610000 ( 10.635370)
jegII 3.380000 0.000000 3.380000 ( 3.371940)
markvh 43.970000 0.000000 43.970000 ( 44.051656)
martin 3.120000 0.000000 3.120000 ( 3.122285)
paolo 7.210000 0.000000 7.210000 ( 7.238185)
park 4.270000 0.000000 4.270000 ( 4.291279)
paul 6.700000 0.000000 6.700000 ( 6.729307)
robert 7.780000 0.000000 7.780000 ( 7.779309)
samk1 4.510000 0.010000 4.520000 ( 4.520682)
samk2 432.290000 0.000000 432.290000 (433.597829)
simonk 76.810000 0.040000 76.850000 ( 77.561346)
simons 56.480000 0.000000 56.480000 ( 61.661304)
zach 42.590000 0.010000 42.600000 ( 42.661407)
user system total real
david
...
...


i expect another hour to complete...

it's a dead heat between jegII and martin! bets? ;-)


-a
--
===============================================================================
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| anything that contradicts experience and logic should be abandoned.
| -- h.h. the 14th dalai lama
===============================================================================
 
P

pauldacus

Just a quick Excel job to show who won this horse race:

name real
martin 0.316044 <==Winner
jegII 0.332019
park 0.41347
samk1 0.428502
paolo 0.56615
jeff 0.576666
robert 0.658454
paul 0.681966
markvh 1.479563
zach 1.531747
simons 2.576182
simonk 3.002027
samk2 15.131783
david 17.98847


Probably as instructive of how to do it fast, was how to do it slow.
The brute-force one-pass method seemed to blow all else away. Creating
a hash or array isn't nearly as expensive as the dup-N-sort, dup-N-uniq
or multi-pass approaches which require a search through the array/hash
on every iteration. My mediocre 'scan' method was expensive in
start-up terms (dup & uniq & sort... yuk!), and just OK on a big array,
where start up was minimal.

Lesson learned for me: When dealing with big or small arrays, look for
a one pass method, putting results into a newly created array/hash.
Avoid methods that create new dup's, sort, uniq, or scan/grep or find
or find_all.

Good job Martin!

Ara.T.Howard said:
Here are the latest results, with Park's solution and Sam's first solution
added to the mix:

a couple of observations: zach's method destroys the array at each iteration
and is unique in this respect. to be fair a 'dup' must be added to his
approach. you arrays are composed in a quite regular fashion which give the
algoritm's that favour linear search a huge advantage - a random distribution
of dups makes the task much harder. i implemented changes for the above and
now:

harp:~ > ruby a.rb
user system total real
david 5.330000 0.000000 5.330000 ( 5.350960)
jeff 0.370000 0.000000 0.370000 ( 0.373027)
jegII 0.320000 0.010000 0.330000 ( 0.324357)
markvh 0.610000 0.000000 0.610000 ( 0.606635)
martin 0.300000 0.000000 0.300000 ( 0.308640)
paolo 0.540000 0.000000 0.540000 ( 0.537689)
park 0.400000 0.000000 0.400000 ( 0.403531)
paul 0.660000 0.000000 0.660000 ( 0.662007)
robert 0.640000 0.000000 0.640000 ( 0.642736)
samk1 0.410000 0.000000 0.410000 ( 0.418733)
samk2 4.730000 0.000000 4.730000 ( 5.043174)
simonk 1.060000 0.000000 1.060000 ( 1.052031)
simons 1.150000 0.000000 1.150000 ( 1.160381)
zach 0.640000 0.000000 0.640000 ( 0.641282)
user system total real
david 12.600000 0.000000 12.600000 ( 12.637510)
jeff 0.200000 0.000000 0.200000 ( 0.203639)
jegII 0.010000 0.000000 0.010000 ( 0.007662)
markvh 0.880000 0.000000 0.880000 ( 0.872928)
martin 0.010000 0.000000 0.010000 ( 0.007404)
paolo 0.030000 0.000000 0.030000 ( 0.028461)
park 0.010000 0.000000 0.010000 ( 0.009939)
paul 0.020000 0.000000 0.020000 ( 0.019959)
robert 0.010000 0.010000 0.020000 ( 0.015718)
samk1 0.010000 0.000000 0.010000 ( 0.009769)
samk2 10.050000 0.000000 10.050000 ( 10.088609)
simonk 1.950000 0.000000 1.950000 ( 1.949996)
simons 1.420000 0.000000 1.420000 ( 1.415801)
zach 0.890000 0.000000 0.890000 ( 0.890465)

jegII, martin, and samk1 are two orders of magnitude faster than the delete_if
(destroys 'a') approach!

also, your concept of 'big' doesn't seem that, er, big - running with larger
numbers is also revealing - i started a process with

harp:~ > ruby a.rb 424 42424 (factor of 10 increase in size)

1 hour ago and it's still not complete so many of the algorithms obviously
scale horribly. i'll post the results when (if) it finishes.


here's the code:

harp:~ > cat a.rb
require 'benchmark'

def run_benchmark(bx, name, iterations, &block)
GC.start
fork{ bx.report(name) { iterations.times &block }; exit! }
Process::wait
end

def dupd_array size, min_percent_dups
a = Array::new(size).map{ rand size }
n = ((size * min_percent_dups) / 2).ceil
n.times{ i = rand(size); a = a[[i + 1, size - 1].min]}
a
end

min = Integer(ARGV.shift || 42)
max = Integer(ARGV.shift || 4242)
min_percent_dups = Float(ARGV.shift || 0.15)

small = dupd_array min, min_percent_dups
big = dupd_array max, min_percent_dups

[
[small, 4242],
[big, 1],
].each do |a, iterations|
Benchmark.bm(10) do |bx|
run_benchmark(bx, 'david', iterations) { a.uniq.find_all {|x| a.find_all {|y| y == x }.size > 1 } }
run_benchmark(bx, 'jeff', iterations) { a.uniq.inject(a.dup) { |b,i| b.delete_at(b.index(i)) ; b } }
run_benchmark(bx, 'jegII', iterations) { seen = Hash.new(0);a.select { |e| (seen[e] += 1) > 1 }.uniq }
run_benchmark(bx, 'markvh', iterations) { t=[]; a.select{ |e| r=t.include?e; t<<e; r }.uniq }
run_benchmark(bx, 'martin', iterations) { h1 = {};h2 = {};a.each {|i| h2 = true if h1;h1 = true };h2.keys }
run_benchmark(bx, 'paolo', iterations) { h = {};u = a.inject([]) {|res, x| h[x] ? res + [x] : h[x] = res}.uniq }
run_benchmark(bx, 'park', iterations) { t = Hash.new(0); a.each{|e|t[e]+=1};t.delete_if{|k,v|v==1}.keys }
run_benchmark(bx, 'paul', iterations) { a.sort.join(' ').to_s.scan(/(\d+)\s(\1)+/).flatten.uniq }
run_benchmark(bx, 'robert', iterations) { a.inject(Hash.new(0)) {|h,e| h[e]+=1;h}.select {|k,v|v>1}.map {|x,y|x} }
run_benchmark(bx, 'samk1', iterations) { h = {};a.each do |i| if h then h += 1 else h = 1 end;end;u = []; h.each do |k, v| if v > 1 then u << k end; end; u }
run_benchmark(bx, 'samk2', iterations) { a.select{|e| a.grep(e).length > 1}.uniq }
run_benchmark(bx, 'simonk', iterations) { a.select{|e| a.index(e) != a.rindex(e)}.uniq }
run_benchmark(bx, 'simons', iterations) { uary = a.uniq; a.map.delete_if{|i| x=uary.member?(i); uary.delete(i); x} }
run_benchmark(bx, 'zach', iterations) { t=[]; a.dup.delete_if{ |e| r=(not t.include? e); t.push(e); r }.uniq }
end
end


-a
--
===============================================================================
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| anything that contradicts experience and logic should be abandoned.
| -- h.h. the 14th dalai lama
===============================================================================
 
R

Ryan Leavengood

a couple of observations: zach's method destroys the array at each iterat= ion
and is unique in this respect. to be fair a 'dup' must be added to his
approach. you arrays are composed in a quite regular fashion which give = the
algoritm's that favour linear search a huge advantage - a random distribu= tion
of dups makes the task much harder. i implemented changes for the above = and
now:

Thanks a lot for this Ara, you make some excellent points. I did think
about how the linear nature of the test array might skew the results,
but I never got around to randomizing them. I also assumed that all
the algorithms worked correctly and non-destructively. I considered
adding some tests to ensure they all returned the same result, but
didn't bother.
also, your concept of 'big' doesn't seem that, er, big - running with lar= ger
numbers is also revealing - i started a process with

I know it wasn't very big, but I figured it would be big enough to
show how the algorithms scaled. Plus I'm using my work laptop and have
to get some work done, so I can't wait all day while Ruby eats up all
my CPU :)

Like Paul, I've found this thread very enlightening, and it will make
me think twice before using a "cute" solution (though you don't always
need the highest performer all the time.) Though James' solution is
pretty cute and fast, so good job.

Either way this thread was a very good indication of Ruby's TIMTOWTDI natur=
e.

Ryan
 
A

Ara.T.Howard

Thanks a lot for this Ara, you make some excellent points. I did think about
how the linear nature of the test array might skew the results, but I never
got around to randomizing them. I also assumed that all the algorithms
worked correctly and non-destructively. I considered adding some tests to
ensure they all returned the same result, but didn't bother.


I know it wasn't very big, but I figured it would be big enough to show how
the algorithms scaled. Plus I'm using my work laptop and have to get some
work done, so I can't wait all day while Ruby eats up all my CPU :)

don't worry -- your tax dollars funding my testing ;-)
Like Paul, I've found this thread very enlightening, and it will make me
think twice before using a "cute" solution (though you don't always need the
highest performer all the time.) Though James' solution is pretty cute and
fast, so good job.

Either way this thread was a very good indication of Ruby's TIMTOWTDI
nature.

agreed all the way around. the final (for me) results:

small:

harp:~ > ruby a.rb
user system total real
david 5.330000 0.000000 5.330000 ( 5.350960)
jeff 0.370000 0.000000 0.370000 ( 0.373027)
jegII 0.320000 0.010000 0.330000 ( 0.324357)
markvh 0.610000 0.000000 0.610000 ( 0.606635)
martin 0.300000 0.000000 0.300000 ( 0.308640)
paolo 0.540000 0.000000 0.540000 ( 0.537689)
park 0.400000 0.000000 0.400000 ( 0.403531)
paul 0.660000 0.000000 0.660000 ( 0.662007)
robert 0.640000 0.000000 0.640000 ( 0.642736)
samk1 0.410000 0.000000 0.410000 ( 0.418733)
samk2 4.730000 0.000000 4.730000 ( 5.043174)
simonk 1.060000 0.000000 1.060000 ( 1.052031)
simons 1.150000 0.000000 1.150000 ( 1.160381)
zach 0.640000 0.000000 0.640000 ( 0.641282)
user system total real
david 12.600000 0.000000 12.600000 ( 12.637510)
jeff 0.200000 0.000000 0.200000 ( 0.203639)
jegII 0.010000 0.000000 0.010000 ( 0.007662)
markvh 0.880000 0.000000 0.880000 ( 0.872928)
martin 0.010000 0.000000 0.010000 ( 0.007404)
paolo 0.030000 0.000000 0.030000 ( 0.028461)
park 0.010000 0.000000 0.010000 ( 0.009939)
paul 0.020000 0.000000 0.020000 ( 0.019959)
robert 0.010000 0.010000 0.020000 ( 0.015718)
samk1 0.010000 0.000000 0.010000 ( 0.009769)
samk2 10.050000 0.000000 10.050000 ( 10.088609)
simonk 1.950000 0.000000 1.950000 ( 1.949996)
simons 1.420000 0.000000 1.420000 ( 1.415801)
zach 0.890000 0.000000 0.890000 ( 0.890465)


big:

harp:~ > ruby a.rb 424 42424
user system total real
david 616.620000 0.000000 616.620000 (626.621697)
jeff 10.610000 0.000000 10.610000 ( 10.635370)
jegII 3.380000 0.000000 3.380000 ( 3.371940)
markvh 43.970000 0.000000 43.970000 ( 44.051656)
martin 3.120000 0.000000 3.120000 ( 3.122285)
paolo 7.210000 0.000000 7.210000 ( 7.238185)
park 4.270000 0.000000 4.270000 ( 4.291279)
paul 6.700000 0.000000 6.700000 ( 6.729307)
robert 7.780000 0.000000 7.780000 ( 7.779309)
samk1 4.510000 0.010000 4.520000 ( 4.520682)
samk2 432.290000 0.000000 432.290000 (433.597829)
simonk 76.810000 0.040000 76.850000 ( 77.561346)
simons 56.480000 0.000000 56.480000 ( 61.661304)
zach 42.590000 0.010000 42.600000 ( 42.661407)
user system total real
david 1525.140000 0.010000 1525.150000 (1540.030511)
jeff 19.840000 0.000000 19.840000 ( 19.871017)
jegII 0.100000 0.000000 0.100000 ( 0.096985)
markvh 101.610000 0.000000 101.610000 (102.777395)
martin 0.090000 0.000000 0.090000 ( 0.091271)
paolo 20.650000 0.610000 21.260000 ( 22.851724)
park 0.130000 0.000000 0.130000 ( 0.140809)
paul 0.330000 0.000000 0.330000 ( 0.333704)
robert 0.210000 0.000000 0.210000 ( 0.211370)
samk1 0.130000 0.000000 0.130000 ( 0.125668)
samk2 1010.770000 0.000000 1010.770000 (1037.171224)
simonk 178.140000 0.010000 178.150000 (178.632616)
simons 123.920000 0.000000 123.920000 (124.054895)
zach 97.430000 0.000000 97.430000 ( 97.559515)


so __martin__ and __jegII__ effectively tie -- the diff is not significant
considering my cpu did some other, but not many, things during that time

it's interesting to note their solutions are an order of magnitude better that
the next fastest and __5__ orders of magnitude better than the worst.
defintitely a strong argument to always consider re-working algoritms before
using the c compiler and extconf.rb in anger.

regards.

-a
--
===============================================================================
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| anything that contradicts experience and logic should be abandoned.
| -- h.h. the 14th dalai lama
===============================================================================
 
S

Simon Kröger

Ara.T.Howard said:
]
it's interesting to note their solutions are an order of magnitude
better that
the next fastest and __5__ orders of magnitude better than the worst.
defintitely a strong argument to always consider re-working algoritms
before
using the c compiler and extconf.rb in anger.

regards.

-a

Hmm, it doesn't effect the winners, but i would still like to mention
that for a random testset i get these results:

david [11, 34, 36, 37, 0, 33, 23, 9, 16]
jeff [34, 34, 36, 37, 36, 33, 36, 34, 0, 37, 37, 23, 9, 16, 11, 11]
jegII [34, 36, 37, 33, 0, 23, 9, 16, 11]
markvh [34, 36, 37, 33, 0, 23, 9, 16, 11]
martin [16, 11, 0, 33, 23, 34, 36, 9, 37]
paolo [34, 36, 37, 33, 0, 23, 9, 16, 11]
park [16, 33, 0, 11, 23, 34, 36, 9, 37]
paul ["0", "9", "11", "1", "16", "23", "33", "34", "36", "37"]
robert [16, 33, 0, 11, 23, 34, 36, 9, 37]
samk1 [16, 33, 0, 11, 23, 34, 36, 9, 37]
samk2 [11, 34, 36, 37, 0, 33, 23, 9, 16]
simonk [11, 34, 36, 37, 0, 33, 23, 9, 16]
simons [34, 34, 36, 37, 36, 33, 36, 34, 0, 37, 37, 23, 9, 16, 11, 11]
zach [34, 36, 37, 33, 0, 23, 9, 16, 11]

which are at least 3 different 'solutions' (simons & jeff 16, paul 10,
all others 9)

ok, simons and jeff can be fixed by a simple 'uniq', pauls seems to be
broken (or made for other data?)

cheers

Simon
 
J

James Edward Gray II

so __martin__ and __jegII__ effectively tie -- the diff is not
significant
considering my cpu did some other, but not many, things during that
time

It's good to know I sometimes do the right thing, completely by
accident, of course. ;)

Seriously, thanks for the enlightening comparison. I learned
something as payment for my entry. Always nice.

James Edward Gray II
 
P

pauldacus

Yes... my solution seems to suck in that it doesn't work & is slow. I
believe this makes me Management Material!

But hopefully I can redeem myself with this humble solution (does seem
to work), which borrows heavily from my predecessors:

one = {}; two = {}; a.each{|i| one ? two=1 : one=1}; two.keys

It uses the much acclaimed one-pass method, but skips the 'uniq'
method, which seems like it's got to be expensive on a big array. No
sorting or counting, just a straight up "If this is not the first time
I've seen this thing, then it's the second, otherwise... it's the
first". I'd benchmark, but a copy/paste on previous code blows up. I
blame Windows.
 
S

Sylvain Joyeux

=A0 =A0 =A0 =A0run_benchmark(bx, 'jegII', iterations) { seen =3D
Hash.new(0);a.select { |e| (seen[e] +=3D 1) > 1 }.uniq }

I think you could improve this one a bit by simply doing
{ seen =3D Hash.new(0); a.select { |e| (seen[e] +=3D 1) =3D=3D 2 } }

Regards
Sylvain
 
P

pauldacus

For those still the slightest bit interested... I found two slightly
faster approaches. I stood on the shoulders of giants.... :)

paul2: one={};two={};a.each{|i| one ? two=1 : one=1;two.keys
paul2a: one={};two={};a.each{|i| (one && two=1) ||
one=1;two.keys

These two scripts are exactly the same concept, the second just uses &&
and || shortcut evaluation... which proved infintesimally faster....
actually scored a 'zero' on the large array.

user system total real
david 35.150000 0.000000 35.150000 ( 35.150000)
jeff 9.720000 0.000000 9.720000 ( 9.720000)
jegII 9.830000 0.000000 9.830000 ( 9.830000)
markvh 8.130000 0.000000 8.130000 ( 8.130000)
martin 8.680000 0.000000 8.680000 ( 8.680000)
paolo 18.350000 0.000000 18.350000 ( 18.350000)
park 11.310000 0.000000 11.310000 ( 11.310000)
paul 19.390000 0.000000 19.390000 ( 19.390000)
robert 22.570000 0.000000 22.570000 ( 22.570000)
samk1 12.740000 0.000000 12.740000 ( 12.740000)
samk2 29.000000 0.000000 29.000000 ( 29.000000)
simonk 8.190000 0.000000 8.190000 ( 8.190000)
simons 13.670000 0.000000 13.670000 ( 13.670000)
zach 1.540000 0.000000 1.540000 ( 1.540000)
paul2 1.760000 0.000000 1.760000 ( 1.760000)
paul2a 1.760000 0.000000 1.760000 ( 1.760000)
user system total real
david 206.850000 0.000000 206.850000 (206.850000)
jeff 0.770000 0.000000 0.770000 ( 0.770000)
jegII 0.110000 0.000000 0.110000 ( 0.110000)
markvh 10.820000 0.000000 10.820000 ( 10.820000)
martin 0.050000 0.000000 0.050000 ( 0.050000)
paolo 0.170000 0.000000 0.170000 ( 0.170000)
park 0.110000 0.000000 0.110000 ( 0.110000)
paul 0.220000 0.000000 0.220000 ( 0.220000)
robert 0.220000 0.000000 0.220000 ( 0.220000)
samk1 0.110000 0.000000 0.110000 ( 0.110000)
samk2 79.210000 0.000000 79.210000 ( 79.210000)
simonk 22.850000 0.000000 22.850000 ( 22.850000)
simons 11.370000 0.000000 11.370000 ( 11.370000)
zach 11.750000 0.000000 11.750000 ( 11.750000)
paul2 0.050000 0.000000 0.050000 ( 0.050000)
paul2a 0.000000 0.000000 0.000000 ( 0.000000)


A combined score shows them to be prettty good, shortcut evaluation
winning by a nose:

1.76 paul2a
1.81 paul2
8.73 martin
9.94 jegII
10.49 jeff
11.42 park
12.85 samk1
13.29 zach
18.52 paolo
18.95 markvh
19.61 paul
22.79 robert
25.04 simons
31.04 simonk
108.21 samk2
242.00 david
 
L

Lyndon Samson

------=_Part_21427_18098516.1129678659880
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

Yes... my solution seems to suck in that it doesn't work & is slow. I
believe this makes me Management Material!

But hopefully I can redeem myself with this humble solution (does seem
to work), which borrows heavily from my predecessors:

one =3D {}; two =3D {}; a.each{|i| one ? two=3D1 : one=3D1}; two=

keys


Basically my ( fast :) )solution using Hashs rather than a Bit fields...

bn =3D bn2 =3D 0
a.each {|el|
bn2 |=3D (1<<el) if bn[el] =3D=3D 1
bn |=3D (1<<el)
}


It uses the much acclaimed one-pass method, but skips the 'uniq'
 
P

pauldacus

Well... it was a short run. At least I keep my streak alive... Still,
the only thing I've won in my life was a can of soup when I was 8 years
old!

user system total real
david 34.320000 0.000000 34.320000 ( 34.320000)
jeff 9.560000 0.000000 9.560000 ( 9.560000)
jegII 9.390000 0.000000 9.390000 ( 9.390000)
markvh 7.860000 0.000000 7.860000 ( 7.860000)
martin 8.790000 0.000000 8.790000 ( 8.790000)
paolo 18.340000 0.000000 18.340000 ( 18.340000)
park 11.320000 0.000000 11.320000 ( 11.320000)
paul 20.150000 0.000000 20.150000 ( 20.150000)
robert 22.910000 0.000000 22.910000 ( 22.910000)
samk1 11.920000 0.000000 11.920000 ( 11.920000)
samk2 28.340000 0.000000 28.340000 ( 28.340000)
simonk 8.070000 0.000000 8.070000 ( 8.070000)
simons 13.790000 0.000000 13.790000 ( 13.790000)
zach 1.540000 0.000000 1.540000 ( 1.540000)
paul2 1.810000 0.000000 1.810000 ( 1.810000)
paul2a 1.760000 0.000000 1.760000 ( 1.760000)
Sylv 1.370000 0.000000 1.370000 ( 1.370000)
user system total real
david 207.840000 0.000000 207.840000 (207.840000)
jeff 0.770000 0.000000 0.770000 ( 0.770000)
jegII 0.060000 0.000000 0.060000 ( 0.060000)
markvh 10.440000 0.000000 10.440000 ( 10.440000)
martin 0.060000 0.000000 0.060000 ( 0.060000)
paolo 0.160000 0.000000 0.160000 ( 0.160000)
park 0.110000 0.000000 0.110000 ( 0.110000)
paul 0.330000 0.000000 0.330000 ( 0.330000)
robert 0.210000 0.000000 0.210000 ( 0.210000)
samk1 0.110000 0.000000 0.110000 ( 0.110000)
samk2 78.160000 0.000000 78.160000 ( 78.160000)
simonk 21.640000 0.000000 21.640000 ( 21.640000)
simons 16.310000 0.000000 16.310000 ( 16.310000)
zach 10.650000 0.000000 10.650000 ( 10.650000)
paul2 0.000000 0.000000 0.000000 ( 0.000000)
paul2a 0.000000 0.000000 0.000000 ( 0.000000)
Sylv 0.000000 0.000000 0.000000 ( 0.000000)

Well, it was a good ride while it lasted:

Sylv 1.37
paul2a 1.76
paul2 1.81
martin 8.85
jegII 9.45
jeff 10.33
park 11.43
samk1 12.03
zach 12.19
markvh 18.30
paolo 18.50
paul 20.48
robert 23.12
simonk 29.71
simons 30.10
samk2 106.50
david 242.16
 
P

pauldacus

Welp, can't leave well enough alone: I fixed Sylv's code to count true
duplicates. I really tried! Used ( >1 : != 1 : >=2) in place of (
==2 ). Here's full results:

paul2a 1.27
paul2 1.31
Sylv (>=2) 1.37
Sylv (>1) 1.38
Sylv (!=1) 1.43
martin 9.28
jegII 9.40
jeff 10.21
samk1 11.65
zach 12.53
park 12.80
markvh 18.95
paul 19.38
robert 19.84
paolo 21.91
simons 24.44
simonk 29.88
samk2 110.73
david 241.78

Strange that the '==' construct was measurably faster. But at this
point, I have to agree with the person who said that at this level, the
speed differences between the top 5 are immaterial.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,181
Messages
2,570,970
Members
47,537
Latest member
BellCorone

Latest Threads

Top