thoughts on a more generic Array#partition function

R

Rudi Cilibrasi

An experiment in a more generic partition function. The current
Array#partition function takes a code block and interprets the result
in a boolean way to create two lists. What if the user wants to create
more than two lists?

A common idea is to define a sort of indicator function that maps each
element of the array to a value that indicates which list to assign the
element to. Then, all elements that resulted in the same value would be
assigned the same partition list. This is a good system, but has one
small problem: It cannot return the results in a definitely ordered
Array because there is no way to know what order the return values are unless
they are Sortable. Since TrueClass and FalseClass are not sortable, it
means that it is hopeless to try to preserve compatibility with the old
partition function without extending the TrueClass and FalseClass to at
least allow <=> comparisons between true,true and true,false, etc.

Therefore, in this demonstration I want to show four points:

1) It is possible to write a generic partition function that can sort into
any number of lists and not just two. This generic partition function,
called npartition, can use the boolean types true and false as
well as any integer, string, symbol, or other code block return types instead.
In the program below, I demonstrate using the boolean and integer modes to
partition a small array of integers.

2) A generic partition function can be perfectly compatible with
the pre-existing two-way split partition function as long as users are
willing to apply a boolean cast in the cases where booleans are not
explicitly returned: return i must become return i ? true : false to be safe
in those cases that previously relied on a Boolean context cast.

3) It is impossible to have a non-array-data-dependent (canonical) ordering
for the returned partition lists unless we require the possible sorting
key (returned from code block) values to be Sortable. If we do not require
this, then we would be forced to use the input key appearance sequence as
the output ordering and this would shift depending on which element is
listed first in the Array. It is desirable to have a standard order that
occurs regardless of the order of appearance in the input array. In fact,
the current partition function effectively imposes an ordering on the true
and false implicitly by its return ordering of true first then false.

4) It is convenient, therefore, to allow a (any) standard ordering to be
imposed on TrueClass and FalseClass. By not imposing a standard ordering,
we make it difficult for others to do so in a safe way that won't conflict.
Yet, there are often cases where code could make great use of any particular
ordering for true,false as is implied by mathematical lattice theory. It
would also be convenient to include nil in this ordering: true nil false
I cannot tell if this is possible in Ruby without breaking other things.
---------------------------------------------------------------------

class Array
def npartition(&cb)
labels = inject( { } ) { |n, a|
k = cb.call(a)
n[k] = [ ] unless n.has_key?(k)
n[k] << a
n
}
labels.keys.sort.map { |a| labels[a] }
end
end

# returns a boolean sorting key given an integer
def bfunc(q)
q % 2 == 0
end

# returns an integer sorting key given an integer
def ifunc(q)
q % 2
end

# Reverse Alphabetical by class name for singleton classes only
def ofunc(sa, sb)
sb.class.to_s <=> sa.class.to_s
end

def testpart(a)
r1 = r2 = "failed"
begin
r1 = a.partition() { |i| yield i }
rescue
puts $!
end
puts "From partition: #{r1.inspect}"
begin
r2 = a.npartition() { |i| yield i }
rescue
puts $!
end
puts "From npartition: #{r2.inspect}"
end

a = [ 1, 2, 3, 4, 7, 9, 11, 12, 13, 14 ]
puts "Without TrueClass FalseClass ordering extension:"
puts "Boolean case:"
testpart(a) { |i| bfunc(i) }
puts "Integer case:"
testpart(a) { |i| ifunc(i) }
class TrueClass
def <=>(b) ofunc(self,b) end
end
class FalseClass
def <=>(b) ofunc(self,b) end
end
puts "\nWith TrueClass FalseClass ordering extension:"
puts "Boolean case:"
testpart(a) { |i| bfunc(i) }
puts "Integer case:"
testpart(a) { |i| ifunc(i) }

puts "\nSpecial (new extension) three-way list split"
testpart(a) { |i| i % 3 }
exit 0
 
P

Peña, Botp

From: Rudi Cilibrasi [mailto:[email protected]]=20
# Subject: thoughts on a more generic Array#partition function
# ....
# puts "\nSpecial (new extension) three-way list split"
# testpart(a) { |i| i % 3 }

how does it compare to #group_by
RUBY_VERSION =3D> "1.9.0"
a =3D [ 1, 2, 3, 4, 7, 9, 11, 12, 13, 14 ]
=3D> [1, 2, 3, 4, 7, 9, 11, 12, 13, 14]
a.group_by { |i| i % 3 }
=3D> {1=3D>[1, 4, 7, 13], 2=3D>[2, 11, 14], 0=3D>[3, 9, 12]}

RUBY_VERSION =3D> "1.8.7"
a =3D [ 1, 2, 3, 4, 7, 9, 11, 12, 13, 14 ]
=3D> [1, 2, 3, 4, 7, 9, 11, 12, 13, 14]
a.group_by { |i| i % 3 }
=3D> {0=3D>[3, 9, 12], 1=3D>[1, 4, 7, 13], 2=3D>[2, 11, 14]}

kind regards -botp
 
R

Rudi Cilibrasi

Hi Botp,

From: Rudi Cilibrasi [mailto:[email protected]]
# Subject: thoughts on a more generic Array#partition function
# ....
# puts "\nSpecial (new extension) three-way list split"
# testpart(a) { |i| i % 3 }

how does it compare to #group_by

It is very similar. Thank you Botp for calling it to my attention.
And thank you Matz for reading my email.
The differences I can understand are as follows:

1) group_by requires eql? and hash methods defined for return types.
npartition requires Sortable .
2) group_by returns a hash, npartition returns an array of arrays

Now that I see group_by I can see that it is not hard to implement the
general partition function using it as just:
h =3D a.group_by() { |i| func(i) }
h.keys.sort.map { |k| h[k] }

This is not so bad, but perhaps has a small disadvantage that it
requires eql? and hash methods defined unlike npartition. But I
cannot figure out a simpler version to get array of array's output.
Best regards,

Rudi

--=20
Git, Hg (Mercurial), and Subversion (svn) hosting over SSH
http://sshcontrol.com/
 
R

Rudi Cilibrasi

Ooops, I just realized npartition requires eql? and hash methods
defined too of course.
So there is not much difference left. I am wondering if it is worth
it for the language
otherwise though to consider each of the small adjustments.

Best regards,

Rudi
 
P

Peña, Botp

From: Rudi Cilibrasi [mailto:[email protected]]=20
# h =3D a.group_by() { |i| func(i) }
# h.keys.sort.map { |k| h[k] }
# This is not so bad, but perhaps has a small disadvantage that it
# requires eql? and hash methods defined unlike npartition. But I
# cannot figure out a simpler version to get array of array's output.

just apply #values to it,=20
h=3Da.group_by { |i| i % 3 }
=3D> {0=3D>[3, 9, 12], 1=3D>[1, 4, 7, 13], 2=3D>[2, 11, 14]}
=3D> [[3, 9, 12], [1, 4, 7, 13], [2, 11, 14]]

kind regards -botp
 
R

Rick DeNatale

From: Rudi Cilibrasi [mailto:[email protected]]
# h =3D a.group_by() { |i| func(i) }
# h.keys.sort.map { |k| h[k] }
# This is not so bad, but perhaps has a small disadvantage that it
# requires eql? and hash methods defined unlike npartition. But I
# cannot figure out a simpler version to get array of array's output.

just apply #values to it,

Not exactly the same, values doesn't guarantee the desired ordering by key.
Ruby 1.9 orders hashes by insertion history, Ruby < 1.8.7 doesn't order the=
m
at all, not sure about 1.8.7

--=20
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/
 
R

Robert Klemme

2008/7/3 Rick DeNatale said:
From: Rudi Cilibrasi [mailto:[email protected]]
# h =3D a.group_by() { |i| func(i) }
# h.keys.sort.map { |k| h[k] }
# This is not so bad, but perhaps has a small disadvantage that it
# requires eql? and hash methods defined unlike npartition. But I
# cannot figure out a simpler version to get array of array's output.

just apply #values to it,

Not exactly the same, values doesn't guarantee the desired ordering by ke=
y.

That's easily fixed:

arbitrary_hash.sort_by {|k,| k}.map {|k,v| v}

Having said that the #group_by implementation seems the more general
solution (because of less requirements, i.e. keys do not need to
provide <=3D>).

Kind regards

robert

--=20
use.inject do |as, often| as.you_can - without end
 
R

Rudi Cilibrasi

2008/7/3 Rick DeNatale said:
From: Rudi Cilibrasi [mailto:[email protected]]
Not exactly the same, values doesn't guarantee the desired ordering by k=
ey.

That's easily fixed:

arbitrary_hash.sort_by {|k,| k}.map {|k,v| v}

Having said that the #group_by implementation seems the more general
solution (because of less requirements, i.e. keys do not need to
provide <=3D>).

Hi Robert,

I have come to agree with your analysis. I think this one-line
expression you made is very good: As far as source-code brevity,
clarity, and correctness this expression you invented gets high marks
because there are no dangling outer variable etc and that is great.
But my small problem now is the extra code-block calls during runtime.
I guess there is one more code-block call per element in the
expression as written compared to npartition. So it is now down to a

1) small efficiency difference (probably npartition is slightly faster
but untested)
2) small typing difference (Arrays versus hashses -- can go either way here=
)
3) less requirements for group_by

Thanks for the great input so far. Best regards,

Rudi
 
C

Calamitas

I have come to agree with your analysis. I think this one-line
expression you made is very good: As far as source-code brevity,
clarity, and correctness this expression you invented gets high marks
because there are no dangling outer variable etc and that is great.
But my small problem now is the extra code-block calls during runtime.
I guess there is one more code-block call per element in the
expression as written compared to npartition. So it is now down to a

1) small efficiency difference (probably npartition is slightly faster
but untested)
2) small typing difference (Arrays versus hashses -- can go either way here)
3) less requirements for group_by

Thanks for the great input so far. Best regards,

Maybe I'm being dense, but isn't the expression you assign to labels
in your definition of npartition an implementation of group_by? If so,
I would expect npartition to be slower than Ruby's group_by unless
your implementation of group_by is more efficient than Ruby's.

Peter
 
R

Robert Klemme

2008/7/3 Rudi Cilibrasi said:
2008/7/3 Rick DeNatale said:
From: Rudi Cilibrasi [mailto:[email protected]]
Not exactly the same, values doesn't guarantee the desired ordering by =
key.

That's easily fixed:

arbitrary_hash.sort_by {|k,| k}.map {|k,v| v}

Having said that the #group_by implementation seems the more general
solution (because of less requirements, i.e. keys do not need to
provide <=3D>).

I have come to agree with your analysis. I think this one-line
expression you made is very good: As far as source-code brevity,
clarity, and correctness this expression you invented gets high marks
because there are no dangling outer variable etc and that is great.

Oh, thank you!
But my small problem now is the extra code-block calls during runtime.
I guess there is one more code-block call per element in the
expression as written compared to npartition. So it is now down to a

1) small efficiency difference (probably npartition is slightly faster
but untested)

I'd be guessing that #group_by is faster - because it's implemented in
C. But you should benchmark for a definitive answer. I have been
surprise more often than not - following big O calculus can be quite
misleading at times. :)

Cheers

robert

--=20
use.inject do |as, often| as.you_can - without end
 
P

Peña, Botp

From: Rick DeNatale [mailto:[email protected]]=20
# Not exactly the same, values doesn't guarantee the desired=20
# ordering by key. Ruby 1.9 orders hashes by insertion history,=20
# Ruby < 1.8.7 doesn't order them
# at all, not sure about 1.8.7

current test:

1.9 not sorted
1.8.6 not sorted
1.8.7 sorted

i prefer 1.8.7 default behaviour

kind regards -botp
 
P

Peña, Botp

From: Yukihiro Matsumoto [mailto:[email protected]]=20
# |current test:
# |
# |1.9 not sorted
# |1.8.6 not sorted
# |1.8.7 sorted
# |
# |i prefer 1.8.7 default behaviour
# Hmm, from my understanding, 1.9 preserves the order, prior version
# does not. Could you show us the test you got the above conclusion?

sir Matz, as requested,
[RUBY_VERSION, RUBY_RELEASE_DATE, RUBY_REVISION]
=3D> ["1.9.0", "2008-06-20", 17482]
RUBY_PLATFORM
=3D> "i386-mswin32"
a =3D [ 1, 2, 3, 4, 7, 9, 11, 12, 13, 14 ]
=3D> [1, 2, 3, 4, 7, 9, 11, 12, 13, 14]
a.group_by { |i| i % 3 }
=3D> {1=3D>[1, 4, 7, 13], 2=3D>[2, 11, 14], 0=3D>[3, 9, 12]}

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
at this point, 1.9 and 1.8.7 differs
h=3Da.group_by { |i| i % 3 }
=3D> {1=3D>[1, 4, 7, 13], 2=3D>[2, 11, 14], 0=3D>[3, 9, 12]} =20
=3D> [[1, 4, 7, 13], [2, 11, 14], [3, 9, 12]]

kind regards -botp
 
R

Robert Klemme

2008/7/4 Pe=F1a said:
From: Yukihiro Matsumoto [mailto:[email protected]]
# |current test:
# |
# |1.9 not sorted
# |1.8.6 not sorted
# |1.8.7 sorted
# |
# |i prefer 1.8.7 default behaviour
# Hmm, from my understanding, 1.9 preserves the order, prior version
# does not. Could you show us the test you got the above conclusion?

sir Matz, as requested,
[RUBY_VERSION, RUBY_RELEASE_DATE, RUBY_REVISION]
=3D> ["1.9.0", "2008-06-20", 17482]
RUBY_PLATFORM
=3D> "i386-mswin32"
a =3D [ 1, 2, 3, 4, 7, 9, 11, 12, 13, 14 ]
=3D> [1, 2, 3, 4, 7, 9, 11, 12, 13, 14]
a.group_by { |i| i % 3 }
=3D> {1=3D>[1, 4, 7, 13], 2=3D>[2, 11, 14], 0=3D>[3, 9, 12]}

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
at this point, 1.9 and 1.8.7 differs

Where exactly? I cannot seem to see it. Above and below look
identical. What am I missing?
h=3Da.group_by { |i| i % 3 }
=3D> {1=3D>[1, 4, 7, 13], 2=3D>[2, 11, 14], 0=3D>[3, 9, 12]}

I get

09:29:30 bas$ ruby --version
ruby 1.8.7 (2008-06-20 patchlevel 22) [i386-cygwin]
09:29:40 bas$ ruby -e 'p [ 1, 2, 3, 4, 7, 9, 11, 12, 13, 14 ].group_by
{ |i| i % 3 }'
{0=3D>[3, 9, 12], 1=3D>[1, 4, 7, 13], 2=3D>[2, 11, 14]}

09:29:43 bas$ ruby19 --version
ruby 1.9.0 (2008-03-01 revision 15664) [i386-cygwin]
09:29:47 bas$ ruby19 -e 'p [ 1, 2, 3, 4, 7, 9, 11, 12, 13, 14
].group_by { |i| i % 3 }'
{1=3D>[1, 4, 7, 13], 2=3D>[2, 11, 14], 0=3D>[3, 9, 12]}

And there *is* a difference.

Kind regards

robert

--=20
use.inject do |as, often| as.you_can - without end
 
R

Rick DeNatale

2008/7/4 Pe=F1a said:
From: Yukihiro Matsumoto [mailto:[email protected]]
# |current test:
# |
# |1.9 not sorted
# |1.8.6 not sorted
# |1.8.7 sorted
# |
# |i prefer 1.8.7 default behaviour
# Hmm, from my understanding, 1.9 preserves the order, prior version
# does not. Could you show us the test you got the above conclusion?
[RUBY_VERSION, RUBY_RELEASE_DATE, RUBY_REVISION]
=3D> ["1.9.0", "2008-06-20", 17482]
RUBY_PLATFORM
=3D> "i386-mswin32"
a =3D [ 1, 2, 3, 4, 7, 9, 11, 12, 13, 14 ]
=3D> [1, 2, 3, 4, 7, 9, 11, 12, 13, 14]
a.group_by { |i| i % 3 }
=3D> {1=3D>[1, 4, 7, 13], 2=3D>[2, 11, 14], 0=3D>[3, 9, 12]}

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
at this point, 1.9 and 1.8.7 differs

Where exactly? I cannot seem to see it. Above and below look
identical. What am I missing?
h=3Da.group_by { |i| i % 3 }
=3D> {1=3D>[1, 4, 7, 13], 2=3D>[2, 11, 14], 0=3D>[3, 9, 12]}

I get

09:29:30 bas$ ruby --version
ruby 1.8.7 (2008-06-20 patchlevel 22) [i386-cygwin]
09:29:40 bas$ ruby -e 'p [ 1, 2, 3, 4, 7, 9, 11, 12, 13, 14 ].group_by
{ |i| i % 3 }'
{0=3D>[3, 9, 12], 1=3D>[1, 4, 7, 13], 2=3D>[2, 11, 14]}

09:29:43 bas$ ruby19 --version
ruby 1.9.0 (2008-03-01 revision 15664) [i386-cygwin]
09:29:47 bas$ ruby19 -e 'p [ 1, 2, 3, 4, 7, 9, 11, 12, 13, 14
].group_by { |i| i % 3 }'
{1=3D>[1, 4, 7, 13], 2=3D>[2, 11, 14], 0=3D>[3, 9, 12]}

And there *is* a difference.

Yes there is. And I suspect that some following this thread are confused b=
y
the difference between sorted and ordered.

Ruby 1.8 (at least before 1.8.7) gives no specification of the order in
which a hash will yield it's keys, values or key-value pairs for methods in
the each family.

Ruby 1.9 keeps track of the INSERTION order so what we are seeing in the 1.=
9
output is a result of the fact that group_by inserts a key value pair each
time it encounters a NEW value from the block., so with the array [1, 2, 3,
...] the block (which returns element % 3) will first return 1, then 2, the=
n
0, after which all of the possible return values have been exhausted. So
the hash is ordered by key as 1 =3D> ... , 2=3D> ..., 3=3D>...

In the case of Ruby 1.8, the enumeration order is an accident of the way th=
e
key values hash, since the hash value of a fixnum n in MRI Ruby 1.8 is
n*2+1, in this case the enumeration order is somewhat predictable.

But in general hashes are not enumerated in key-sort order in any version
of Ruby as far as I know.

--=20
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,186
Members
46,740
Latest member
JudsonFrie

Latest Threads

Top