Collections of structured-data objects: what approach?

G

Graham Wideman

Folks:

I'm new to Ruby, though not to programming in general. So I'm looking around
for some of the mechanisms I'm used to finding, and one of them is an
apparatus for Collections (as for example used in C++, OP, VB etc).

I'm hoping that someone can point me in the right direction as to where this
functionality can be found, or failing that suggest the most advantageous
starting point to create it based on more primitive classes.

Main features:

1. The contained objects are user-defined types having multiple fields (data
members). This in itself seems no problem for Ruby.

2. Order/Retrieve by index. The collection should stay ordered by insertion
order and support retrieval by integer index. (Sorted collection is a
separate issue, of course.)

3. Insert/Append/Remove/Delete. (List behavior) Allow appending (at end) or
inserting at arbitrary location, and removal or deletion of arbitrary
members.

4. Find by key: (Dictionary behavior etc) Ability to retrieve member objects
by supplying a value to match against one of the object's fields. Often this
is simple a Name field on each object, which functions as a key, but at
other times one might want lookup based on some other field or fields.

So far I've read plenty that seems related: arrays, hashes, Enumerable, and
several related chapters in PickAxe and Ruby for Rails. Although
"collections" are mentioned in many of these sources, I've not yet hit on a
treatment of the ruby way for the Collection basics described above (eg: R
on R's Collection chapter's description of "insert" is simply replacing the
Nth element of an array).

Needless to say, I'd much appreciate comments illuminating this area!

Thanks

Graham
 
A

Alex Young

Graham said:
Folks:

I'm new to Ruby, though not to programming in general. So I'm looking around
for some of the mechanisms I'm used to finding, and one of them is an
apparatus for Collections (as for example used in C++, OP, VB etc).

I'm hoping that someone can point me in the right direction as to where this
functionality can be found, or failing that suggest the most advantageous
starting point to create it based on more primitive classes.
You can get away with just using an Array for almost all of this.
Main features:

1. The contained objects are user-defined types having multiple fields (data
members). This in itself seems no problem for Ruby.
2. Order/Retrieve by index. The collection should stay ordered by insertion
order and support retrieval by integer index. (Sorted collection is a
separate issue, of course.)
3. Insert/Append/Remove/Delete. (List behavior) Allow appending (at end) or
inserting at arbitrary location, and removal or deletion of arbitrary
members.
4. Find by key: (Dictionary behavior etc) Ability to retrieve member objects
by supplying a value to match against one of the object's fields. Often this
is simple a Name field on each object, which functions as a key, but at
other times one might want lookup based on some other field or fields.
None of these are a problem for Array:

a = [{:a => 1, :b => 2}, {:c => 4, :d => 5}]
a[1] # => {:c=>4, :d=>5}
a << {:e => 6, :f => 7}
# => [{:b=>2, :a=>1}, {:c=>4, :d=>5}, {:e=>6, :f=>7}]
a.select {|x| x[:c] == 4} # => [{:c=>4, :d=>5}]

b = [1,2,3,4]
b.insert(2,5) # => [1, 2, 5, 3, 4]


So far I've read plenty that seems related: arrays, hashes, Enumerable, and
several related chapters in PickAxe and Ruby for Rails. Although
"collections" are mentioned in many of these sources, I've not yet hit on a
treatment of the ruby way for the Collection basics described above (eg: R
on R's Collection chapter's description of "insert" is simply replacing the
Nth element of an array).
If so, that's simply wrong. Array#insert(n, element) makes the inserted
element the n'th and pushes everything after it along one.
 
R

Robert Klemme

Folks:

I'm new to Ruby, though not to programming in general. So I'm looking around
for some of the mechanisms I'm used to finding, and one of them is an
apparatus for Collections (as for example used in C++, OP, VB etc).

I'm hoping that someone can point me in the right direction as to where this
functionality can be found, or failing that suggest the most advantageous
starting point to create it based on more primitive classes.

Main features:

1. The contained objects are user-defined types having multiple fields (data
members). This in itself seems no problem for Ruby.

2. Order/Retrieve by index. The collection should stay ordered by insertion
order and support retrieval by integer index. (Sorted collection is a
separate issue, of course.)

3. Insert/Append/Remove/Delete. (List behavior) Allow appending (at end) or
inserting at arbitrary location, and removal or deletion of arbitrary
members.

4. Find by key: (Dictionary behavior etc) Ability to retrieve member objects
by supplying a value to match against one of the object's fields. Often this
is simple a Name field on each object, which functions as a key, but at
other times one might want lookup based on some other field or fields.

So far I've read plenty that seems related: arrays, hashes, Enumerable, and
several related chapters in PickAxe and Ruby for Rails. Although
"collections" are mentioned in many of these sources, I've not yet hit on a
treatment of the ruby way for the Collection basics described above (eg: R
on R's Collection chapter's description of "insert" is simply replacing the
Nth element of an array).

Needless to say, I'd much appreciate comments illuminating this area!

For the sake of discussion I present a different approach. As long as
you don't need 4 (lookup by key) an Array is perfect. However, with the
lookup IMHO a new data structure is preferred because then consistency
can be handled internally. I'll attach a half grown quick hack to
demonstrate what I mean.

Kind regards

robert


# untested and incomplete sample
require 'set'

class IndexedCollection
include Enumerable

def initialize(enum=[])
@data = []
@indexes = {}
add_all enum
end

def add_index(field)
field = field.to_sym
idx = @indexes[field] = Hash.new {|h,k| h[k] = Set.new}

@data.each do |e|
idx[e.send(field)] << e
end

self
end

def lookup(field, value)
field = field.to_sym
idx = @indexes[field] or
raise ArgumentError, "No such index: #{field}"

idx[value].to_a
end

def add_all(enum)
enum.each {|e| self << e}
self
end

def <<(e)
@data << e

@indexes.each do |field, index|
index[e.send(field)] << e
end

self
end

def each(&b)
@data.each(&b)
self
end

def size() @data.size end

end
 
G

Graham Wideman

Alex:

Ah-ha! Thanks for getting me on the right track.

Couple of points (for others following along...)
You can get away with just using an Array for almost all of this.

a = [{:a => 1, :b => 2}, {:c => 4, :d => 5}]
a[1] # => {:c=>4, :d=>5}
a << {:e => 6, :f => 7}
# => [{:b=>2, :a=>1}, {:c=>4, :d=>5}, {:e=>6, :f=>7}]
a.select {|x| x[:c] == 4} # => [{:c=>4, :d=>5}]

b = [1,2,3,4]
b.insert(2,5) # => [1, 2, 5, 3, 4]

Now THAT's the set of examples I needed, and that should be in the major
Ruby docs.

And I'd add Array#delete

(... though I'm not yet clear on the sematics. Does this simply remove the
object from the array, or also delete it's data? Ie: if there's another
reference to this object, is the data still there? To be tested...)
[...Ruby for Rail's Collection chapter's description of "insert"...]
If so, that's simply wrong.

Yep, so it seems!

PickAxe's "Containers" etc chapter manages to get us all the way to lambda
functions and closures, but its "SongList" example is too simple to be as
useful as your examples above.

Meanwhile, Ruby for Rails' "Collections, containers" etc chapter says the
following (emphasis mine):

-----------------------------
11.2.2 Inserting, retrieving and removing array elements
Because an array is an ordered collection, any object you add to the array
goes either at the beginning, at the end, or somewhere in the middle. The
most general technique for INSERTING one or more items into an array is the
setter method []= (square brackets...
-----------------------------

.... and goes on to give "insert" examples which do not insert, they simply
assign to an array slot. (And there's no mention of the insert method). So
I had written off array as being the complete answer, since its supposed
insert syntax "obviously" didn't really know how to insert.

Now that you've pointed out that there's an insert method called, ahem,
"insert", I see that it actually IS the array class that handles Collection
duty, and I can focus there for some more serious RTFM-ing.

Thanks again for the quick clarification!

Graham
 
A

Alex Young

Graham Wideman wrote:
And I'd add Array#delete

(... though I'm not yet clear on the sematics. Does this simply remove the
object from the array, or also delete it's data? Ie: if there's another
reference to this object, is the data still there? To be tested...)
a = "foo"
b = [a, "bar", "qux", a]
b.delete(a)
b # => ["bar", "qux"]
a # => "foo"

So no, Array#delete doesn't delete the data itself, only the reference
to it. You may also want:

b.delete_at(0)
b # => ["qux"]

Now that you've pointed out that there's an insert method called, ahem,
"insert", I see that it actually IS the array class that handles Collection
duty, and I can focus there for some more serious RTFM-ing.
Array is surprisingly powerful, especially when you take the inject
method into account. It's a real workhorse.
Thanks again for the quick clarification!
No worries :)
 
G

Graham Wideman

Robert:

Question and then a comment:

Question: What does the keyword "self" do when sitting on a line by itself,
as in several of your methods?

Comment:
For the sake of discussion I present a different approach. As long as
you don't need 4 (lookup by key) an Array is perfect. However, with the
lookup IMHO a new data structure is preferred because then consistency
can be handled internally. I'll attach a half grown quick hack to
demonstrate what I mean.

If I'm reading you correctly, you are saying a bit more than this:

1. For lookup (or for that matter if one want to implement a
SortedCollection), then it's desirable to have an index for the array on the
key field(s), which can be a ruby hash. (Ie: one *could* use Enumerable#find
or find_all, but on larger collections that's slow?)

2. But if you are going to implement an index using a helper hash, then you
want a way to keep the hash up-to-date as items are inserted or deleted from
the array.

3. So you advocate a class to wrap these together so that "consistency can
be handled internally".

Thanks for your reply,

Graham
 
G

Graham Wideman

Alex:
So no, Array#delete doesn't delete the data itself, only the reference to
it. You may also want:

b.delete_at(0)
b # => ["qux"] [...]
Array is surprisingly powerful, especially when you take the inject method
into account. It's a real workhorse.

Thanks,

Graham
 
S

Simen Edvardsen

Question: What does the keyword "self" do when sitting on a line by itself,
as in several of your methods?

Ruby returns the last value of a method. self on a line by itself returns self.

Graham said:
You can get away with just using an Array for almost all of this.

a = [{:a => 1, :b => 2}, {:c => 4, :d => 5}]
a[1] # => {:c=>4, :d=>5}
a << {:e => 6, :f => 7}
# => [{:b=>2, :a=>1}, {:c=>4, :d=>5}, {:e=>6, :f=>7}]
a.select {|x| x[:c] == 4} # => [{:c=>4, :d=>5}]

b = [1,2,3,4]
b.insert(2,5) # => [1, 2, 5, 3, 4]

Now THAT's the set of examples I needed, and that should be in the major
Ruby docs.

It is in the main Ruby docs. It's spread between the methods in use,
because we couldn't anticipate every single combination of methods you
might need to use and document them all in one place.
 
R

Robert Klemme

Graham said:
Robert:

Question and then a comment:

Question: What does the keyword "self" do when sitting on a line by itself,
as in several of your methods?

Simen answered that one. As an additional note, traditionally #each
returns self. Also, by doing this you can chain methods.
Comment:


If I'm reading you correctly, you are saying a bit more than this:

1. For lookup (or for that matter if one want to implement a
SortedCollection), then it's desirable to have an index for the array on the
key field(s), which can be a ruby hash. (Ie: one *could* use Enumerable#find
or find_all, but on larger collections that's slow?)

find is O(n) which can seriously hurt your applications performance if
the Array grows beyond a certain limit. Hash lookup on the other hand
is O(1).
2. But if you are going to implement an index using a helper hash, then you
want a way to keep the hash up-to-date as items are inserted or deleted from
the array.

3. So you advocate a class to wrap these together so that "consistency can
be handled internally".

Right. You got it. That's what OO is about.

Kind regards

robert
 
G

Graham Wideman

Simen:

Thanks for your reply....
It is in the main Ruby docs. It's spread between the methods in use,
because we couldn't anticipate every single combination of methods you
might need to use and document them all in one place.

I was more commenting on the major ruby docs like pickaxe, Ruby for Rails,
FAQs and so on. While some mention collections, this mostly just introduces
basic features of hashes and arrays. None of the discussion that I've run
across so far actually puts it all together into a comprehensive discussion
of how the collections functionality familiar from other languages
(Java, C#, C++, OP, VB etc) can be acquired from array and hash plus
whatever.

Yes, I agree that all the features and methods are documented in the
language guide (eg: in pickaxe), but one needs to know which set of those
features are to be assembled to create the familiar collection
functionality, in the absence of a big flashing sign on an actual Collection
class. (Especially bearing in mind that other languages/libraries go to some
pains to distinguish the concepts of Array and Collection -- not to say that
any particular other language is the most righteous).

Anyhow, for my part I'm a great deal further along than I was a couple of
days ago, thanks in part to helpful responses here and elsewhere.

Thanks all!

Graham
 
G

Graham Wideman

Robert:

Thanks, and...
Right. You got it. That's what OO is about.

Well, I already had the OO part, just needed the ruby version of it :) plus
connecting the dots might help others stumbling along this way, and you
never know it might turn out that your "consistency" comment was referring
to something else I didn't see.

Anyhow, this discussion has been most helpful to alert me to all the
capabilities that Array has, and calibrating my sense that a little bit of
assembly required to implement collections with various features, they're
not just in some library that "of course" everyone knows about and uses :)

Graham
 
R

Robert Klemme

Graham said:
Robert:

Thanks, and...

You're welcome!
Well, I already had the OO part, just needed the ruby version of it :) plus
connecting the dots might help others stumbling along this way, and you
never know it might turn out that your "consistency" comment was referring
to something else I didn't see.

I probably should have put "encapsulation" somewhere in that posting.
:) Anyway, I'm glad I could help sort these things out.
Anyhow, this discussion has been most helpful to alert me to all the
capabilities that Array has, and calibrating my sense that a little bit of
assembly required to implement collections with various features, they're
not just in some library that "of course" everyone knows about and uses :)

Yeah, the reason is probably that Array and Hash are so powerful on the
one hand and that requirements are so different on the other hand. Java
follows a tad different approach by providing collections for usual
situations.

Kind regards

robert
 
D

dblack

Hi --

[...Ruby for Rail's Collection chapter's description of "insert"...]
If so, that's simply wrong.

Yep, so it seems!

Actually I don't talk about Array#insert, so my description of it
can't be wrong :)
PickAxe's "Containers" etc chapter manages to get us all the way to lambda
functions and closures, but its "SongList" example is too simple to be as
useful as your examples above.

Meanwhile, Ruby for Rails' "Collections, containers" etc chapter says the
following (emphasis mine):

-----------------------------
11.2.2 Inserting, retrieving and removing array elements
Because an array is an ordered collection, any object you add to the array
goes either at the beginning, at the end, or somewhere in the middle. The
most general technique for INSERTING one or more items into an array is the
setter method []= (square brackets...
-----------------------------

... and goes on to give "insert" examples which do not insert, they simply
assign to an array slot. (And there's no mention of the insert method). So
I had written off array as being the complete answer, since its supposed
insert syntax "obviously" didn't really know how to insert.

Yeah, I tend to use the term "insert" sort of generically; and the
insert method is among the many methods that didn't make the "for
Rails" cut. (Keep in mind that R4R isn't a complete Ruby core method
reference.)

But I can see that using a method name as an informal way to describe
what another method does might be awkward.


David

--
David A. Black | (e-mail address removed)
Author of "Ruby for Rails" [1] | Ruby/Rails training & consultancy [3]
DABlog (DAB's Weblog) [2] | Co-director, Ruby Central, Inc. [4]
[1] http://www.manning.com/black | [3] http://www.rubypowerandlight.com
[2] http://dablog.rubypal.com | [4] http://www.rubycentral.org
 
G

Graham Wideman

David:
Actually I don't talk about Array#insert, so my description of it
can't be wrong :)

.... and for balance, I should say that I *am* enjoying other parts of the
book :)

Graham
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top