histogram of histograms

Charles L. Snyder · Feb 7, 2007

Hi

I have several text files that look like this:

Brazil, 10
Brazil, 13
Brazil, 9
Bulgaria, 1
Canada, 48
Canada, 52
Canada, 38
Canada, 55
Canada, 59
Chile, 1
Chile, 1
Chile, 2
China, 7
China, 18
China, 19
China, 22
China, 25

I need to iterate through the above file(s) and get the data
summarized in the form:

Canada, 252
China, 91
Chile, 4
Brazil, 32
Bulgaria, 1

I know how to go from a single column list with multiple repeated
values to a 'histogram' type list, ie:

my_hash = countries.inject(Hash.new { 0 }) { |counts, key| counts[key]
+= 1; counts}
my_hash = my_hash.sort { |a,b| a[1] <=> b[1] }

but I'm unable to figure out how to get the 2-column csv values into a
total by country as shown above.
(I do have another file "countries.txt" which is a unique list of
countries.)

Thanks in advance!

CLS

Martin DeMello · Feb 7, 2007

I need to iterate through the above file(s) and get the data
summarized in the form:

Canada, 252
China, 91
Chile, 4
Brazil, 32
Bulgaria, 1

#------------------------------------------------------------------
countries = <<HERE
Brazil, 10
Brazil, 13
Brazil, 9
Bulgaria, 1
Canada, 48
Canada, 52
Canada, 38
Canada, 55
Canada, 59
Chile, 1
Chile, 1
Chile, 2
China, 7
China, 18
China, 19
China, 22
China, 25
HERE

totals = Hash.new {|h, k| h[k] = 0}

countries.each_line {|line|
country, n = line.split(/,\s*/)
totals[country] += n.to_i
}

totals.keys.sort_by {|i| -totals}.each {|c|
puts "#{c}, #{totals[c]}"
}

#------------------------------------------------------------------

martin

Robert Klemme · Feb 7, 2007

I have several text files that look like this:

Brazil, 10
Brazil, 13
Brazil, 9
Bulgaria, 1
Canada, 48
Canada, 52
Canada, 38
Canada, 55
Canada, 59
Chile, 1
Chile, 1
Chile, 2
China, 7
China, 18
China, 19
China, 22
China, 25

I need to iterate through the above file(s) and get the data
summarized in the form:

Canada, 252
China, 91
Chile, 4
Brazil, 32
Bulgaria, 1

I would do that in stream mode, i.e. not first read all and then
summarize but directly summarize (see attached). Reason is, that this
is more efficient especially since these files look like they could be
large.

I know how to go from a single column list with multiple repeated
values to a 'histogram' type list, ie:

my_hash = countries.inject(Hash.new { 0 }) { |counts, key| counts[key]
+= 1; counts}

I don't know why you do this. Do you also need the number of occurrences?

my_hash = my_hash.sort { |a,b| a[1] <=> b[1] }

but I'm unable to figure out how to get the 2-column csv values into a
total by country as shown above.
(I do have another file "countries.txt" which is a unique list of
countries.)

You don't need the second file unless you want to report zero counts for
countries not present.

Kind regards

robert

counts = Hash.new 0
DATA.each do |line|
line.chomp!
country, val = line.split /,\s*/
counts[country] += val.to_i if country && val
end
counts.sort_by {|cn,co| -co}.each do |country, count|
print country, " ", count, "\n"
end
__END__
Brazil, 10
Brazil, 13
Brazil, 9
Bulgaria, 1
Canada, 48
Canada, 52
Canada, 38
Canada, 55
Canada, 59
Chile, 1
Chile, 1
Chile, 2
China, 7
China, 18
China, 19
China, 22
China, 25

dblack · Feb 7, 2007

Hi --

Hi

I have several text files that look like this:

Brazil, 10
Brazil, 13
Brazil, 9
Bulgaria, 1
Canada, 48
Canada, 52
Canada, 38
Canada, 55
Canada, 59
Chile, 1
Chile, 1
Chile, 2
China, 7
China, 18
China, 19
China, 22
China, 25

I need to iterate through the above file(s) and get the data
summarized in the form:

Canada, 252
China, 91
Chile, 4
Brazil, 32
Bulgaria, 1

I know how to go from a single column list with multiple repeated
values to a 'histogram' type list, ie:

my_hash = countries.inject(Hash.new { 0 }) { |counts, key| counts[key]
+= 1; counts}
my_hash = my_hash.sort { |a,b| a[1] <=> b[1] }

my_hash will actually become an array at that point

but I'm unable to figure out how to get the 2-column csv values into a
total by country as shown above.
(I do have another file "countries.txt" which is a unique list of
countries.)

Here's one way:

require 'scanf'

hash = Hash.new {0}
DATA.scanf("%s%d") {|key,count| hash[key] += count }

hash.sort.each {|k,v| puts "#{k} #{v}" }

__END__
Brazil, 10
Brazil, 13
Brazil, 9
Bulgaria, 1
etc.

That has the slight ugliness of including the comma in the key. You
could do:

hash[key.chomp(",")] += count

to avoid that, and then add the comma to the printout if you want it
back.

David

--
Q. What is THE Ruby book for Rails developers?
A. RUBY FOR RAILS by David A. Black (http://www.manning.com/black)
(See what readers are saying! http://www.rubypal.com/r4rrevs.pdf)
Q. Where can I get Ruby/Rails on-site training, consulting, coaching?
A. Ruby Power and Light, LLC (http://www.rubypal.com)

William James · Feb 7, 2007

Hi

I have several text files that look like this:

Brazil, 10
Brazil, 13
Brazil, 9
Bulgaria, 1
Canada, 48
Canada, 52
Canada, 38
Canada, 55
Canada, 59
Chile, 1
Chile, 1
Chile, 2
China, 7
China, 18
China, 19
China, 22
China, 25

I need to iterate through the above file(s) and get the data
summarized in the form:

Canada, 252
China, 91
Chile, 4
Brazil, 32
Bulgaria, 1

I know how to go from a single column list with multiple repeated
values to a 'histogram' type list, ie:

my_hash = countries.inject(Hash.new { 0 }) { |counts, key| counts[key]
+= 1; counts}
my_hash = my_hash.sort { |a,b| a[1] <=> b[1] }

but I'm unable to figure out how to get the 2-column csv values into a
total by country as shown above.
(I do have another file "countries.txt" which is a unique list of
countries.)

Thanks in advance!

CLS

hash = Hash.new(0)
"\
Brazil, 10
Brazil, 13
Brazil, 9
Bulgaria, 1
Canada, 48
Canada, 52
Canada, 38
Canada, 55
Canada, 59
Chile, 1
Chile, 1
Chile, 2
China, 7
China, 18
China, 19
China, 22
China, 25".each{|s| s.split(',').inject{|k,v| hash[k] += v.to_i }}
p hash

action_page.php form	2	Oct 25, 2020
Raspberry Pi Open Source PLC Communication Wonder LECPython, and Example of Communication with Omron PLC	0	Oct 9, 2024
Sorting Countries by Region	9	Nov 16, 2007
Trouble with prediction code, for the life of me I can't figure out why it isnt running properly. Help would be appreciated.	0	Jul 8, 2023
How to try a range of hex values in C# code ?	0	Nov 19, 2022
Minimum Total Difficulty	0	Nov 15, 2023
Opportunity of a lifetime to Attend a Amazing Event.	0	Apr 12, 2008
Which Is the Best Approach?	2	May 3, 2007

histogram of histograms

Charles L. Snyder

Martin DeMello

Robert Klemme

dblack

William James

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads