Partitioning with Set.divide

P

Peter Szinek

Hello all,

I have been playing with partitioning a set recently and I am stuck with
an issue. The whole story is here:

http://www.rubyrailways.com/partitioning-sets-in-ruby/

A quick version for those who would not like to read the article:

Consider this input:

a 53 2 3
b 8 62 1 23
a 9 0 31
b 4 45 4 16 7
b 1 23
c 3 42 2 31 4 6
a 1 3 22
a 7 83 1 23 3
b 1 14 4 15 16 2
c 5 16 2 34

the goal is to create a partition based on the character in the first
column, i.e.:

<Set: <Set: {"a 9 0 31", "a 7 83 1 23 3", "a 53 2 3", "a 1 3 22 "}>,
<Set: {"b 1 23 ", "b 1 14 4 15 16 2", "b 8 62 1 23", "b 4 45 4 16 7"}>,
<Set: {"c 5 16 2 34", "c 3 42 2 31 4 6"}>}>

Which is exactly what Set.divide does. However, there is one problem: I
would like to know if there are duplicate lines. I.e. divide returns the
same result, no matter that the input is this:

c 5 16 2 34
c 5 16 2 34
c 5 16 2 34

or this:

c 5 16 2 34

What I would need is a modified divide which returns also the count of
the elements in the input set (at least for those elements which are
more than once in the set). Is this doable or do I have to roll some
code to do this for me additionally?

Cheers,
Peter

__
http://www.rubyrailways.com :: Ruby and Web2.0 blog
http://scrubyt.org :: Ruby web scraping framework
http://rubykitchensink.ca/ :: The indexed archive of all things Ruby
 
S

SonOfLilit

Well, you could first count how many repetitions there are of each
line and then partition the set of pairs [line, count].

e.g.

h = Hash.new {0} # is this how I set a default value?

STDIN.each_line {|l| h[l] += 1}

partitioning = Set.new(h.to_a).divide{|a| a[0][0]}

Aur
 
S

SonOfLilit

Well, you could first count how many repetitions there are of each
line and then partition the set of pairs [line, count].

e.g.

h = Hash.new {0} # is this how I set a default value?

STDIN.each_line {|l| h[l] += 1}

partitioning = Set.new(h.to_a).divide{|a| a[0][0]}

Aur

Hello all,

I have been playing with partitioning a set recently and I am stuck with
an issue. The whole story is here:

http://www.rubyrailways.com/partitioning-sets-in-ruby/

A quick version for those who would not like to read the article:

Consider this input:

a 53 2 3
b 8 62 1 23
a 9 0 31
b 4 45 4 16 7
b 1 23
c 3 42 2 31 4 6
a 1 3 22
a 7 83 1 23 3
b 1 14 4 15 16 2
c 5 16 2 34

the goal is to create a partition based on the character in the first
column, i.e.:

<Set: <Set: {"a 9 0 31", "a 7 83 1 23 3", "a 53 2 3", "a 1 3 22 "}>,
<Set: {"b 1 23 ", "b 1 14 4 15 16 2", "b 8 62 1 23", "b 4 45 4 16 7"}>,
<Set: {"c 5 16 2 34", "c 3 42 2 31 4 6"}>}>

Which is exactly what Set.divide does. However, there is one problem: I
would like to know if there are duplicate lines. I.e. divide returns the
same result, no matter that the input is this:

c 5 16 2 34
c 5 16 2 34
c 5 16 2 34

or this:

c 5 16 2 34

What I would need is a modified divide which returns also the count of
the elements in the input set (at least for those elements which are
more than once in the set). Is this doable or do I have to roll some
code to do this for me additionally?

Cheers,
Peter

__
http://www.rubyrailways.com :: Ruby and Web2.0 blog
http://scrubyt.org :: Ruby web scraping framework
http://rubykitchensink.ca/ :: The indexed archive of all things Ruby

Even better:
h = Hash.new {0} # is this how I set a default value?
STDIN.each_line {|l| h[l] += 1}
partitioning = h.to_set.divide{|a| a[0][0]} # changed to .to_set,
which is great. might need to be .to_set{|k,v| k,v}
 
S

SonOfLilit

Sorry for the triple post, but I've read the article and propose a
completely different approach to the problem you present (also in a
comment there with typos):

h = {}
open('input.txt').each_line{|l| h[l[0..0]] += l[2..-1].split('
').inject(0) {|c,x| c+=x.to_i; c}}}
p h.map{|k,v| {k => v}} # to turn {"a" => 80, "b" => 60} into [{"a" =>
80}, {b => 60}]
# which is a pretty weird data structure IMHO, but I'll play by your rules
Let's look at your code:
#############
input = Set.new open('input.txt').readlines.map{|e| e.chomp}
groups = input.divide {|x,y| x.map[0][0] == y.map[0][0] } # what's map for?
#build the array of hashes
p groups.map.inject([]) {|a,g| # what's map for?
#build the hashes for the number sequences with same letters
a << g.map.inject(Hash.new(0)) {|h,v|
#for every sequence, sum the numbers it contains
h[v[0..0]] += v[2..-1].split(' ').inject(0) {|c,x|
c+=x.to_i; c}; h
}; a
}
#############
divide() seems redundant in your code. You both 1) divide(); and 2)
implement divide on your own; in the same code. You do four passes on
the lines (and maybe more in the map() calls, I can't figure those
out), in one of them passing twice on each line's content, forcing the
data to stay all in memory the whole time.

My code, which isn't debugged but shows my idea, passes once on the
lines, and in that pass twice on each line's contents. It doesn't keep
the data in memory.

BTW I couldn't understand the use of map() without a block. Mind explaining?

Aur Saraf
 
S

SonOfLilit

Ah, sorry. tryruby.hobix.com helped me find that hash.map() is
identical to hash.to_a
 
P

Peter Szinek

SonOfLilit said:
Ah, sorry. tryruby.hobix.com helped me find that hash.map() is
identical to hash.to_a

Well the problem was that I just came across Set.divide and wanted to
demonstrate it on an example - so I dug out an older problem of mine
where I thought I can show it off - you have seen the result.

I think this was a typical manifestation of the 'If you have a hammer,
everything looks like a nail' problem.

I guess if my goal would have been solving the task at hand, rather than
putting 'divide' into action, I would come up with a solution similar to
yours...

Thanks for the suggestion!

Cheers,
Peter
__
http://www.rubyrailways.com :: Ruby and Web2.0 blog
http://scrubyt.org :: Ruby web scraping framework
http://rubykitchensink.ca/ :: The indexed archive of all things Ruby
 
R

Robert Klemme

Hello all,

I have been playing with partitioning a set recently and I am stuck with
an issue. The whole story is here:

http://www.rubyrailways.com/partitioning-sets-in-ruby/

A quick version for those who would not like to read the article:

Consider this input:

a 53 2 3
b 8 62 1 23
a 9 0 31
b 4 45 4 16 7
b 1 23
c 3 42 2 31 4 6
a 1 3 22
a 7 83 1 23 3
b 1 14 4 15 16 2
c 5 16 2 34

the goal is to create a partition based on the character in the first
column, i.e.:

<Set: <Set: {"a 9 0 31", "a 7 83 1 23 3", "a 53 2 3", "a 1 3 22 "}>,
<Set: {"b 1 23 ", "b 1 14 4 15 16 2", "b 8 62 1 23", "b 4 45 4 16 7"}>,
<Set: {"c 5 16 2 34", "c 3 42 2 31 4 6"}>}>

Which is exactly what Set.divide does. However, there is one problem: I
would like to know if there are duplicate lines. I.e. divide returns the
same result, no matter that the input is this:

c 5 16 2 34
c 5 16 2 34
c 5 16 2 34

or this:

c 5 16 2 34

What I would need is a modified divide which returns also the count of
the elements in the input set (at least for those elements which are
more than once in the set). Is this doable or do I have to roll some
code to do this for me additionally?

Basically you need bags. Since a quick check does not reveal any, you
can roll your own pretty easily with a Hash with default value 0. This
is what I'd do: (see script at end). Of course you could save another
line by inlining "key".

Kind regards

robert

8<-----------------

#!/usr/bin/ruby

require 'pp'

parts = Hash.new {|h,k| h[k] = Hash.new(0)}

DATA.each do |line|
line.chomp!
key = line[/^\w+/]
parts[key][line] += 1
end

pp parts

__END__
a 53 2 3
b 8 62 1 23
a 9 0 31
b 4 45 4 16 7
b 1 23
c 3 42 2 31 4 6
a 1 3 22
a 7 83 1 23 3
b 1 14 4 15 16 2
c 5 16 2 34
c 5 16 2 34
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,969
Messages
2,570,161
Members
46,708
Latest member
SherleneF1

Latest Threads

Top