Counting how many times the same elements occurs in an array?

T

Thomas Greenwood

There's probably a fairly simple way to do this.

Basically I'm reading data from an xml file, I need to figure out how
many times identical data occurs in certain attributes, so far I've got
the data into two identical arrays and had the intention of nesting
iterators - seeing if the element was equal to the second and
incrementing every time a match was found. That obviously didn't work
out the way I initially thought.

This seems to be the jist of what I want but it's obviously returning a
count on every iteraton whereas I only want the final tally.

xml_events.each{|x|
puts "#{x} occurs #{xml_events.count(x)} times"
}

Any ideas?
 
J

John Feminella

You didn't mention what a particular xml_event object looks like, but
you'll probably want something like this:

xml_events.group_by(&:name).each do |name, events|
puts "there were #{events.size} events of type #{name}"
end

~ jf
 
R

Robert Klemme

There's probably a fairly simple way to do this.

Basically I'm reading data from an xml file, I need to figure out how
many times identical data occurs in certain attributes, so far I've got
the data into two identical arrays and had the intention of nesting
iterators - seeing if the element was equal to the second and
incrementing every time a match was found. That obviously didn't work
out the way I initially thought.

This seems to be the jist of what I want but it's obviously returning a
count on every iteraton whereas I only want the final tally.

xml_events.each{|x|
puts "#{x} occurs #{xml_events.count(x)} times"
}

Any ideas?

Two possible approaches:

irb(main):002:0> a = Array.new(10) { rand(4) }
=> [3, 2, 2, 1, 3, 3, 2, 3, 3, 3]

irb(main):003:0> a.inject(Hash.new(0)) {|sums,x| sums[x] += 1; sums}
=> {3=>6, 2=>3, 1=>1}

irb(main):004:0> a.group_by {|x| x}
=> {3=>[3, 3, 3, 3, 3, 3], 2=>[2, 2, 2], 1=>[1]}
irb(main):005:0> a.group_by {|x| x}.map {|k,v| [k, v.size]}
=> [[3, 6], [2, 3], [1, 1]]

Instead of #inject you can of course also use a more traditional approach:

irb(main):012:0> counts = Hash.new 0
=> {}
irb(main):013:0> a.each {|x| counts[x] += 1}
=> [3, 2, 2, 1, 3, 3, 2, 3, 3, 3]
irb(main):014:0> counts
=> {3=>6, 2=>3, 1=>1}

Kind regards

robert
 
T

Thomas Greenwood

I'm sure your solutions are better than mine, what I ended up doing;

xml_events = Array.new
temp_array = Array.new

[...]
#extract xml data and assign it to the events array.
[...]

xml_events.each{|x|
if temp_array.include?(x) == false
temp_array << x
puts "#{x} occurs #{xml_events.count(x)} times"
end
}

A kludge but it does the job.

Thanks for your help.
 
R

Robert Klemme

I'm sure your solutions are better than mine, what I ended up doing;

xml_events = Array.new
temp_array = Array.new

[...]
#extract xml data and assign it to the events array.
[...]

xml_events.each{|x|
if temp_array.include?(x) == false

This is dangerous: in Ruby false and nil are treated as boolean false.
It's better to not compare with boolean constants but rather to use
boolean operators and logic. In your case you could do

if !temp_array.include?(x)
unless temp_array.include?(x)
temp_array<< x
puts "#{x} occurs #{xml_events.count(x)} times"
end
}

A kludge but it does the job.

Your code has effort O(n*n) if I am not mistaken while the approach with
the Hash storage of counters only has O(n). That might not really make
a difference in your case but from the fact that you are iterating
xml_events over and over again (same for temp_array btw.) you might see
that it is "ugly" in a way.
Thanks for your help.

You're welcome.

Kind regards

robert
 
7

7stud --

Thomas Greenwood wrote in post #998795:
A kludge but it does the job.

After asking for advice on a computer programming forum, the chosen
solution should never be a kludge. Rather, the solution should be
elegant and inspiring, and you should learn somethin.
 
A

Adam Prescott

[Note: parts of this message were removed to make it a legal post.]

After asking for advice on a computer programming forum, the chosen
solution should never be a kludge. Rather, the solution should be
elegant and inspiring, and you should learn somethin.

If it isn't touted as the Perfect Solution to the problem, then it's much
better that we have input from people instead of having no input because of
a very high bar. At least people submitting their ideas means we get to see
how people might approach the problem even if it's potentially mistaken. In
having almost-correct code there's a record down which others can learn
from.
 
D

David Jacobs

Agreed, I'm not a huge fan of the solution. John and Robert's are much more straightforward, reusable, and elegant.

You shouldn't have to use a variable called temp_array ... almost ever.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,228
Members
46,818
Latest member
SapanaCarpetStudio

Latest Threads

Top