Hashes and ordering

K

Kristof Bastiaensen

Kristof Bastiaensen said:
Markus wrote:
Where are you going with this? There seem to be three unrelated
things here:

Well, I have been thinking along the lines of: What if we had a
built-in ordered indexable collection in Ruby?
What about an association list (Array)?

[["one", 1], ["two", 2], ["three", 3]].assoc("one") => ["one", 1]

You could wrap it in a class to have it behave more like a Hash.

Very inefficient if you have many lookups (O(n)). And it doesn't maintain
order automatically.

Yes, that's true. I mentioned it because it is already in the language.
The typical and most efficient implementation in the general case of an
ordered map is a tree AFAIK.

Yes! And a balanced tree. It is probably the best when you want
the map ordered by key.

If you wanted insertion order, what would you think of the combination
of a Hash and a linked list? The hash could provide key lookup, and the
linked list would keep the order. (It would need to be only a
single-linked list).
Kind regards

robert

Regards,
Kristof
 
M

Markus

Hal said:
I want insertion order.


Replaced/modified values -- a very good question to which I have
given no thought.

If (as you said elsewhere) you are wanting something "fairly fundamental
to the language" these are exactly the sorts of things you should give
thought to. I've been pondering it on and off today (while shopping
with my wife) and I can't see an unambiguous "least surprise" way to
implement something like this.

The main problem is that hashes, queues, maps, sets, arrays, etc. each
provide a one of the semantically clean ways of answering the questions
"what does it mean when I do..."; if none of these meet your needs, it's
fairly likely that either: 1) you are looking for a linked combination
of two or more well-defined collection classes, or 2) what you are
looking for isn't semantically clean.

The syntax question (=>, etc.) is secondary. It's not at all a good
idea to provide syntactic sugar to invoke cluttered semantics.

So:

Are you looking for a tagged list or queue? A multimap? A ordered list
of key-value pairs that supports fast sub-list extraction by key?
Or...?

If you had your class, what would you expect the output (and/or error
message) to be for....

p [1=>2, 3=>4]

p [1=>2, 3=>4, 1=>5]

x = [1=>2, 3=>4, 1=>5]
p x[1]

x = [1=>2, 3=>4]
x[3] = 6
p x

x = [1=>2, 3=>4]
x[3],x[1] = x[1],x[3]
p x

x = [1=>2, 3=>4]
x << [7=>8]
p x

x = [1=>2, 3=>4]
x += [7=>8]
p x

x = [1=>2, 3=>4]
x << [1=>8]
p x

x = [1=>2, 3=>4]
x += [1=>8]
p x

x = [1=>2, 3=>4]
x << [7=>8]
p x

x = [1=>2, 3=>4]
x |= [7=>8, 3=> 9]
p x

x = [1=>2, 3=>4]
x &= [1=>2, 3=>9, 6=>7]
p x

x = [1=>2, 3=>4]
x << [7=>8]
p x

x = [1=>2, 3=>4]
x[1] = 3
p x

x = [1=>2, 3=>4]
x[1] += 1
p x

x = [1=>'2', 3=>'4']
x[1].sub!(/2/,'3')
p x

x = [1=>2, 3=>4]
x[1] += 1
p x

x = [1=>2, 3=>4]
x[3] = 6
x[1] = 7
p x

x = [1=>2, 3=>4]
p x.delete(1)
x[3] = 6
x[1] = 7
p x

x = [1=>2, 3=>4, 1=>5]
p x.delete(1)
x[3] = 6
x[1] = 7
p x

...and so forth.

My guess is that either++ 1) you will find in answering these questions
that you are really wanting a simple combination of two standard
collections, or 2) you will get tangled in trying to come up with
something consistent, or 3) I will learn something interesting.

-- MarkusQ
 
H

Hal Fulton

Markus said:
If (as you said elsewhere) you are wanting something "fairly fundamental
to the language" these are exactly the sorts of things you should give
thought to. I've been pondering it on and off today (while shopping
with my wife) and I can't see an unambiguous "least surprise" way to
implement something like this.

Quite right.
The main problem is that hashes, queues, maps, sets, arrays, etc. each
provide a one of the semantically clean ways of answering the questions
"what does it mean when I do..."; if none of these meet your needs, it's
fairly likely that either: 1) you are looking for a linked combination
of two or more well-defined collection classes, or 2) what you are
looking for isn't semantically clean.

I'm not convinced yet that what I want is not semantically clean.
The syntax question (=>, etc.) is secondary. It's not at all a good
idea to provide syntactic sugar to invoke cluttered semantics.

True, but that is a relative term. :) We shall see.
If you had your class, what would you expect the output (and/or error
message) to be for....

[snip very useful list]

I won't address this today, but I will save this list.

Actually I hadn't any intention of an RCR for another few weeks. (Sorry,
but I am a very slow thinker sometimes.)

Of course, if my general idea is shown to my satisfaction to be nonsense,
I will not submit an RCR at all. But if I still believe in it, I will submit
later.
My guess is that either++ 1) you will find in answering these questions
that you are really wanting a simple combination of two standard
collections, or 2) you will get tangled in trying to come up with
something consistent, or 3) I will learn something interesting.

I fear 1 or 2, but I hope for 3. ;)

In any case, your posts have been extremely valuable to me.


Thanks,
Hal
 
T

T. Onoma

I won't address this today, but I will save this list.

I talked this nearly to death about two years ago. It's somewhere in the
ruby-talk archives. But to summarize:

I came up with the idea of an indexed hash, basically a combined version of an
array and a hash, that could do both's job, plain and simple. But others told
me it would cause lookup conflicts because the hash key could be an integer.
The fix is to have different lookup mechanisms.

ih = [ :a => 1, :b => 2 ]

ih[0] #=> 1
ih[1] #=> 2

ih{:a} #=> 1
ih{:b} #=> 2

ih.pair(0) #=> :a, 1
ih.pair(1) #=> :b, 2

etc.

At one point I even asked what others thought about association (=>) being an
operator/method. You could arbitrarily do it anywhere.

a = "bomb!"
a => 'hal' # returns an association object

# also links 'hal' into a
p "#{a} is the #{a.association}" #=> "Hal is the bomb!"

Obviously this could then be utilized by arrrays.

[ :a => 1, :b => 2 ]

An array of two associations.

Would be fun to play with this some more.

-T
 
A

Aredridel

Oooh, I like that!


I won't address this today, but I will save this list.

I talked this nearly to death about two years ago. It's somewhere in the
ruby-talk archives. But to summarize:

I came up with the idea of an indexed hash, basically a combined version of an
array and a hash, that could do both's job, plain and simple. But others told
me it would cause lookup conflicts because the hash key could be an integer.
The fix is to have different lookup mechanisms.

ih = [ :a => 1, :b => 2 ]

ih[0] #=> 1
ih[1] #=> 2

ih{:a} #=> 1
ih{:b} #=> 2

ih.pair(0) #=> :a, 1
ih.pair(1) #=> :b, 2

etc.

At one point I even asked what others thought about association (=>) being an
operator/method. You could arbitrarily do it anywhere.

a = "bomb!"
a => 'hal' # returns an association object

# also links 'hal' into a
p "#{a} is the #{a.association}" #=> "Hal is the bomb!"

Obviously this could then be utilized by arrrays.

[ :a => 1, :b => 2 ]

An array of two associations.

Would be fun to play with this some more.

-T
 
R

Robert Klemme

Kristof Bastiaensen said:
Kristof Bastiaensen said:
On Sat, 04 Sep 2004 12:39:31 +0900, Hal Fulton wrote:

Markus wrote:
Where are you going with this? There seem to be three unrelated
things here:

Well, I have been thinking along the lines of: What if we had a
built-in ordered indexable collection in Ruby?


What about an association list (Array)?

[["one", 1], ["two", 2], ["three", 3]].assoc("one") => ["one", 1]

You could wrap it in a class to have it behave more like a Hash.

Very inefficient if you have many lookups (O(n)). And it doesn't
maintain
order automatically.

Yes, that's true. I mentioned it because it is already in the language.
The typical and most efficient implementation in the general case of an
ordered map is a tree AFAIK.

Yes! And a balanced tree. It is probably the best when you want
the map ordered by key.
Si.

If you wanted insertion order, what would you think of the combination
of a Hash and a linked list? The hash could provide key lookup, and the
linked list would keep the order. (It would need to be only a
single-linked list).

Yeah, a bit similar to an LRU cache implementation - only that you change
order only on insertion and don't have a size limit.

Kind regards

robert
 
R

Robert Klemme

Hal Fulton said:
Oops... thought you sent that one just to me. Got the cc first.

I'll reply to the list also:




Hi, Ara... yes, I know about arrayfields and have played with it.
I like it.

However, I really like the "=>" notation, and I want something
fairly fundamental to the language.


It's clear, yes, but it's not convenient (to me).
I really prefer {x=>y} or the new {x:y}.

Well, at least you can do

def SortedMap(h)
# any transformation of a Hash that
# does not depend on insertion order
# for example
h.sort
end

foo = SortedMap( "foo" => "bar", "two" => 2 )
foo.each {|k,v| print "key=", k, " value=", v, "\n" }

def InsertionOrder(*words)
words = words.shift if words.size == 1 && Array === words[0]
map = []
until words.empty?
map << [words.shift, words.shift]
end
map
end

bar = InsertionOrder %w{foo bar 1 2}
bar.each {|k,v| print "key=", k, " value=", v, "\n" }

Those method names might not be the best, but you get the picture.

IMHO that covers already some cases.

robert
 
M

Markus

Markus said:
My guess is that either++ 1) you will find in answering these questions
that you are really wanting a simple combination of two standard
collections, or 2) you will get tangled in trying to come up with
something consistent, or 3) I will learn something interesting.


Hal wrote:

I fear 1 or 2, but I hope for 3. ;)

Here's a stab at an instance of case #1, the simplest way I could come
up with to model the semantics I think you are looking for (note: this
is _not_ claimed to be the best way to implement the semantics, just the
clearest way to express them):

class An_ordered_hash
include Enumerable
#
# A pair of hashes, indicating for each key both the value paired
# with it and the order in which it was added.
#
def initialize
@ordered = {}
@hash = {}
@stamp = 0
end
def []=(k,v)
@ordered[k] = @stamp += 1
@hash[k] = v
end
#
# Iterators proceed in order-of-insertion
#
def keys_in_insertion_order
keys.sort_by { |k| @ordered[k] }
end
def each(*args,&block)
keys_in_insertion_order.each { |k| block.call(k,@hash[k]) }
end
alias each_pair each
def each_key(*args,&block)
keys_in_insertion_order.each { |k| block.call(k) }
end
def each_value(*args,&block)
keys_in_insertion_order.each { |k| block.call(@hash[k]) }
end
def collect(*args,&block)
keys_in_insertion_order.collect { |k| block.call(k,@hash[k]) }
end
#
# Pass anything we don't understand through to the hash
#
def method_missing(method,*args,&block)
@ordered.send method,*args, &block
@hash.send method,*args, &block
end
#
# A few extensions suggested by the array-like nature of the class
#
def <<(kv_pairs)
kv_pairs.each_pair { |k,v| self[k] = v }
self
end
def +(kv_pairs)
An_ordered_hash.new << self << kv_pairs
end
def |(kv_pairs)
self+kv_pairs
end
def &(kv_pairs)
result = self+{}
each { |k,v| result.delete(k) unless result[k] == kv_pairs[k] }
result
end
end


Since we don't have the syntactic sugar we'd like, I put together
something that at least preserves the meaning & intent of it:

class An_ordered_hash
def to_s
"_(" +collect { |k,v| "#{k}=>#{v}" }.join(')+_(')+')'
end
def to_str
to_s
end
def inspect
to_s
end
end

def _(kv_pairs)
An_ordered_hash.new + kv_pairs
end


This setup gives the following answers to the questions I'd asked
previously (properly translated of course):

p _(1=>2)+_(3=>4) # _(1=>2)+_(3=>4)

p _(1=>2)+_(3=>4)+_(1=>5) # _(3=>4)+_(1=>5)

x = _(1=>2)+_(3=>4)+_(1=>5)
p x[1] # 5

x = _(1=>2) << _(3=>4)
x[3] = 6
p x # _(1=>2)+_(3=>6)

x = _(1=>2) << _(3=>4)
x[3],x[1] = x[1],x[3]
p x # _(3=>2)+_(1=>4)

x = _(1=>2) << _(3=>4)
x << _(7=>8)
p x # _(1=>2)+_(3=>4)+_(7=>8)

x = _(1=>2) << _(3=>4)
x += _(7=>8)
p x # _(1=>2)+_(3=>4)+_(7=>8)

x = _(1=>2) << _(3=>4)
x << _(1=>8)
p x # _(3=>4)+_(1=>8)

x = _(1=>2) << _(3=>4)
x += _(1=>8)
p x # _(3=>4)+_(1=>8)

x = _(1=>2) << _(3=>4)
x << _(7=>8)
p x # _(1=>2)+_(3=>4)+_(7=>8)

x = _(1=>2) << _(3=>4)
x |= _(7=>8) << _(3=> 9)
p x # _(1=>2)+_(7=>8)+_(3=>9)

x = _(1=>2) << _(3=>4)
x &= _(1=>2) << _(3=>9) << _(6=>7)
p x # _(1=>2)

x = _(1=>2) << _(3=>4)
x << _(7=>8)
p x # _(1=>2)+_(3=>4)+_(7=>8)

x = _(1=>2) << _(3=>4)
x[1] = 3
p x # _(3=>4)+_(1=>3)

x = _(1=>2) << _(3=>4)
x[1] += 1
p x # _(3=>4)+_(1=>3)

x = _(1=>'2') << _(3=>'4')
x[1].sub!(/2/,'3')
p x # _(1=>3)+_(3=>4)

x = _(1=>2) << _(3=>4)
x[1] += 1
p x # _(3=>4)+_(1=>3)

x = _(1=>2) << _(3=>4)
x[3] = 6
x[1] = 7
p x # _(3=>6)+_(1=>7)

x = _(1=>2) << _(3=>4)
p x.delete(1) # 2
x[3] = 6
x[1] = 7
p x # _(3=>6)+_(1=>7)

x = _(1=>2) << _(3=>4) << _(1=>5)
p x.delete(1) # 5
x[3] = 6
x[1] = 7
p x # _(3=>6)+_(1=>7)


Does this agree with your intuition/desires? Or am I still at sea?

-- MarkusQ
 
M

Markus

Well, at least you can do

def SortedMap(h)
# any transformation of a Hash that
# does not depend on insertion order
# for example
h.sort
end

foo = SortedMap( "foo" => "bar", "two" => 2 )

The problem (at least as I understand it) with this approach is that
he's wanting them in INSERTION order; there is no guarantee that the the
hash that comes into SortedMap will be in insertion order, and I don't
see how the h.sort will fix that.

-- MarkusQ
 
C

Charles Hixson

Robert said:
On Sat, 04 Sep 2004 12:39:31 +0900, Hal Fulton wrote:

Markus wrote:
Where are you going with this? There seem to be three unrelated
things here:

Well, I have been thinking along the lines of: What if we had a
built-in ordered indexable collection in Ruby?


What about an association list (Array)?

[["one", 1], ["two", 2], ["three", 3]].assoc("one") => ["one", 1]

You could wrap it in a class to have it behave more like a Hash.


Very inefficient if you have many lookups (O(n)). And it doesn't
maintain
order automatically.

Yes, that's true. I mentioned it because it is already in the language.
The typical and most efficient implementation in the general case of an
ordered map is a tree AFAIK.


Yes! And a balanced tree. It is probably the best when you want
the map ordered by key.

Si.

If you wanted insertion order, what would you think of the combination
of a Hash and a linked list? The hash could provide key lookup, and the
linked list would keep the order. (It would need to be only a
single-linked list).


Yeah, a bit similar to an LRU cache implementation - only that you
change order only on insertion and don't have a size limit.

Kind regards

robert

FWIW, there is an RBTree for Ruby included in the rpa list of packages.
I don't know what it's interface it, or whether it works properly, but
it's there. (It seems to be implemented in C, so it might well be as
fast as Hash.)
 
C

Charles Hixson

Markus said:
The problem (at least as I understand it) with this approach is that
he's wanting them in INSERTION order; there is no guarantee that the the
hash that comes into SortedMap will be in insertion order, and I don't
see how the h.sort will fix that.

-- MarkusQ
If you really want it in insertion order, just use an array. The only
problem comes if you want it in two different orders, in which case you
also need to build an index in the order that you desire (and deletions
become problematical...i.e., you've got to ensure that the index stays
in sync).
 
H

Hal Fulton

Markus said:
Here's a stab at an instance of case #1, the simplest way I could come
up with to model the semantics I think you are looking for (note: this
is _not_ claimed to be the best way to implement the semantics, just the
clearest way to express them):

[snip code]

Yes, I have implemented things similar to this before. The one remaining
problem for me was always a convenient syntax for literals. (It may not
be so important to most people, but it is to me.)
This setup gives the following answers to the questions I'd asked
previously (properly translated of course):

p _(1=>2)+_(3=>4) # _(1=>2)+_(3=>4)

p _(1=>2)+_(3=>4)+_(1=>5) # _(3=>4)+_(1=>5)

[snip rest of examples]

Your underscore notation is clever, but compared to traditional hash
notation it still hurts my eyes (and my fingers).

As for semantics, I think you have captured my intent, although I
may have missed one or two items.

I will certainly save this for my notes also.

However, Matz's latest reply indicates to me that if order is added
to the Hash class, then even literals will be ordered.

I had thought that ordered literals would imply that

{a=>b,c=>d} != {c=>d,a=>b}

but I see now that is not the case.



Thanks,
Hal
 
T

T. Onoma

Oooh, I like that!

:) Well, how bout a taste then!


class Association < Array
def initialize( key, value )
self.replace [ key, value ]
end
def key
self[0]
end
def value
self[1]
end
end

module Kernel
# note this clashes with Bignum, Fixnum and Date :(
def >>( associate )
Association.new( self, associate )
end
end

# of course we need to teach array about working with associations
class Array
alias_method('get','[]')
def [](i)
r = self.get(i)
return r.value if r.is_a? Association
return r
end
alias_method('gat','at')
def at(i)
r = self.gat(i)
return r.value if r.is_a? Association
return r
end
end

if $0 == __FILE__
omap = [ :a >> 1, :b >> 2 ]
p omap[0]
omap.each { |a| p a }
puts "---"
omap.each { |k,v| p k,v }
end


Just threw this together, but looks like it works pretty well already. When I
have more time I'll finish her up, or if you'd like then go for it. Just let
me know so I can add it my lib too!

Thanx.
 
G

gabriele renzi

T. Onoma ha scritto:
:) Well, how bout a taste then!


class Association < Array
<snip>

Wel,, why not just using Array#assoc ?
Also, in the yaml dir there is an OMap class that may fit..
 
R

Robert Klemme

Markus said:
The problem (at least as I understand it) with this approach is that
he's wanting them in INSERTION order; there is no guarantee that the the
hash that comes into SortedMap will be in insertion order, and I don't
see how the h.sort will fix that.

That's why I put in the other example. A SortedMap can only maintain one
ordering and in this case it's the natural ordering of keys.

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,156
Messages
2,570,878
Members
47,413
Latest member
KeiraLight

Latest Threads

Top