why are Hashes so unsorted? what's your solution?

R

Ruby Baby

I try to avoid questions like "Why doesn't Ruby do what ___ does?"

But I'm stumped by hashes and asking for help to find a new solution.

I use MySQL databases a lot. Let's say I have a query to find the
top-selling albums in my store:

SELECT sku, artistname, albumtitle, sold FROM albums
ORDER BY sold DESC LIMIT 20;

When I get the hash of results from the database, I'm used to making
the "sku" the hash key since it's the only unique identifier.
A perfect hash key. Used everywhere by my iteration methods.

In PHP, this is no problem. I get the hash, and I can do the
"each" iteration on it, and get my top-selling albums in order.

In Ruby, it's an unordered jumble.

Why is that?

And what do you think is the Ruby way solution for what I need to do?


(The simple query I gave above was much simpler, but I have lots and
lots of MySQL queries where I want the results to come back in the
order that MySQL gave them, but with full access to the hash results.)



Thanks!
 
A

Ara.T.Howard

I try to avoid questions like "Why doesn't Ruby do what ___ does?"

But I'm stumped by hashes and asking for help to find a new solution.

I use MySQL databases a lot. Let's say I have a query to find the
top-selling albums in my store:

SELECT sku, artistname, albumtitle, sold FROM albums
ORDER BY sold DESC LIMIT 20;

When I get the hash of results from the database, I'm used to making
the "sku" the hash key since it's the only unique identifier.
A perfect hash key. Used everywhere by my iteration methods.

In PHP, this is no problem. I get the hash, and I can do the
"each" iteration on it, and get my top-selling albums in order.

In Ruby, it's an unordered jumble.

Why is that?

And what do you think is the Ruby way solution for what I need to do?

http://raa.ruby-lang.org/list.rhtml?name=ruby-rbtree

it is ultra-fast and the keys come out in order:

~ > cat a.rb
#!/usr/bin/env ruby
require 'rbtree'
rb = RBTree.new
42.times{|i| rb = i}
rb.each{|k,v| puts "#{ k } => #{ v }"}


~ > a.rb
0 => 0
1 => 1
2 => 2
3 => 3
<snip>
39 => 39
40 => 40
41 => 41



-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| ADDRESS :: E/GC2 325 Broadway, Boulder, CO 80305-3328
| URL :: http://www.ngdc.noaa.gov/stp/
| TRY :: for l in ruby perl;do $l -e "print \"\x3a\x2d\x29\x0a\"";done
===============================================================================
 
D

daz

Ruby Baby said:
I try to avoid questions like "Why doesn't Ruby do what ___ does?"

Go on, force yourself ;)
SELECT sku, artistname, albumtitle, sold FROM albums
ORDER BY sold DESC LIMIT 20;

When I get the hash of results from the database, I'm used to making
the "sku" the hash key since it's the only unique identifier.
A perfect hash key. Used everywhere by my iteration methods.

In PHP, this is no problem. I get the hash, and I can do the
"each" iteration on it, and get my top-selling albums in order.

Sounds to me that you get an ordered hash - a special form of hash.
In Ruby, it's an unordered jumble.

That's a hash; across the universe.
Why is that?

The answer to that would probably irritate you more than is necessary;
but, in short, because it's a Hash.
And what do you think is the Ruby way solution for what I need to do?


(The simple query I gave above was much simpler, but I have lots and
lots of MySQL queries where I want the results to come back in the
order that MySQL gave them, but with full access to the hash results.)

You've ruled out one option; of omitting the ORDER BY clause then
using Enumerable#sort_by on the hash before iterating -- so I won't
bring that up.

I'm not done, yet.

Or you could use a container which maintains its order - the Array
which, in your case, might be an array of hashes.

[{:sku => 10, :artist=>'Global Communication', :albumtitle => 'Pentamerous Metamorphosis'},
{:sku => 54, :artist=>'Eat Static', :albumtitle => 'Abduction'},
{...}
]


There'll be other ways, I'm sure ;)


daz
 
A

Aidan Rogers

In PHP, this is no problem. I get the hash, and I can do the
"each" iteration on it, and get my top-selling albums in order.

Just FYI, in PHP they aren't true hashes - they are associative arrays.
That's why they are in order.

Aidan
 
D

David Garamond

Ruby said:
I use MySQL databases a lot. Let's say I have a query to find the
top-selling albums in my store:

SELECT sku, artistname, albumtitle, sold FROM albums
ORDER BY sold DESC LIMIT 20;

When I get the hash of results from the database, I'm used to making
the "sku" the hash key since it's the only unique identifier.
A perfect hash key. Used everywhere by my iteration methods.

In PHP, this is no problem. I get the hash, and I can do the
"each" iteration on it, and get my top-selling albums in order.

In Ruby, it's an unordered jumble.
Why is that?

Most language don't guarantee insert order for hash, in fact IIRC some
deliberately randomize it. It's faster and in some cases needed for
security reasons. So I guess PHP is the minority here (plus I still hate
the fact that they merge the concept of array and hash, WTF with that?)
And what do you think is the Ruby way solution for what I need to do?

records = {
"sku1" => ["artist1","title1",100],
"sku2" => ["artist2","title2",200],
"sku3" => ["artist3","title3",300],
"sku4" => ["artist4","title4",400],
"sku5" => ["artist5","title5",500],
}
records.keys.sort.each {|k|
record = records[k]
# do stuff with record...
}

Or you can make a Hash that maintains insert order. I think there is one
at RAA (http://www.ruby-lang.org/en/raa.html).
 
R

Robert Klemme

Ruby Baby said:
I try to avoid questions like "Why doesn't Ruby do what ___ does?"

But I'm stumped by hashes and asking for help to find a new solution.

I use MySQL databases a lot. Let's say I have a query to find the
top-selling albums in my store:

SELECT sku, artistname, albumtitle, sold FROM albums
ORDER BY sold DESC LIMIT 20;

When I get the hash of results from the database, I'm used to making
the "sku" the hash key since it's the only unique identifier.
A perfect hash key. Used everywhere by my iteration methods.

In PHP, this is no problem. I get the hash, and I can do the
"each" iteration on it, and get my top-selling albums in order.

In Ruby, it's an unordered jumble.

Why is that?

That's due to the nature of hashing. I suggest you get yourself a copy of
an "data structures and algorithms" book (or find some online). The
simplest explanation is that a hash is an array and entries are found
using the #hash value of a key item by calulating that hash modulo the
array size. That way keys don't necessarily appear in order. Assuming an
array size of 10:

irb(main):009:0> 1.step(100,7){|i| printf "hash=%3d index=%3d\n", i, i %
10}
hash= 1 index= 1
hash= 8 index= 8
hash= 15 index= 5
hash= 22 index= 2
hash= 29 index= 9
hash= 36 index= 6
hash= 43 index= 3
hash= 50 index= 0
hash= 57 index= 7
hash= 64 index= 4
hash= 71 index= 1
hash= 78 index= 8
hash= 85 index= 5
hash= 92 index= 2
hash= 99 index= 9
=> 1

Of course this is simplistic and real hash implementations are more
complex, i.e. you need resizing and deal with collisions (hash 22 and hash
92, as well as multiple keys having the same hash value) etc.
And what do you think is the Ruby way solution for what I need to do?

Don't use a method that returns the query result as hash. The simplest
thing would be to return an array of arrays. Using a tree or any other
sorted associative container is only appropriate if you really need key
based accesses. But if you just want to present what the db returns then
array is the way to go since it preserves insertion order.

Regards

robert
 
G

Gavin Sinclair

I use MySQL databases a lot. Let's say I have a query to find the
top-selling albums in my store:
SELECT sku, artistname, albumtitle, sold FROM albums
ORDER BY sold DESC LIMIT 20;
When I get the hash of results from the database, [...]

Surely you get an _array_ of results from the database?

[ { "sku" => "sku1", "artistname" => "Miles Davis", ... },
{ ... },
...
]

Could you show me the data type you get back?

Gavin
 
A

anon luker

Ruby Baby said:
When I get the hash of results from the database, I'm used to making
the "sku" the hash key since it's the only unique identifier.
A perfect hash key.

I hope I don't sound as anal as lothar, but a perfect hash key would
be an array index - it would obviate the need for hashing.
 
D

Daniel Berger

Ruby Baby said:
I try to avoid questions like "Why doesn't Ruby do what ___ does?"

But I'm stumped by hashes and asking for help to find a new solution.

I use MySQL databases a lot. Let's say I have a query to find the
top-selling albums in my store:

SELECT sku, artistname, albumtitle, sold FROM albums
ORDER BY sold DESC LIMIT 20;

When I get the hash of results from the database, I'm used to making
the "sku" the hash key since it's the only unique identifier.
A perfect hash key. Used everywhere by my iteration methods.

If you use DBI, it *is* the key.

require "dbi"
dbh = DBI.connect(dsn,user,passwd)
sth = dbh.prepare("select sku, artistname, albumtitle, sold...")
sth.execute

while rec = sth.fetch
puts "SKU: " + rec["SKU"]
puts "Artist Name: " + rec["ARTISTNAME"]
# etc, etc...
end

sth.finish
dbh.disconnect

Regards,

Dan
 
G

Gavin Sinclair

If you use DBI, it *is* the key.
require "dbi"
dbh = DBI.connect(dsn,user,passwd)
sth = dbh.prepare("select sku, artistname, albumtitle, sold...")
sth.execute
while rec = sth.fetch
puts "SKU: " + rec["SKU"]
puts "Artist Name: " + rec["ARTISTNAME"]
# etc, etc...
end
sth.finish
dbh.disconnect


I really must present the above code in a more Rubyish fashion for
newbies who are watching:

require "dbi"
DBI.connect(dsn,user,passwd) do |dbi|
dbh.select_all("select sku, artistname, albumtitle, sold...") do |row|
puts "SKU: " + row["SKU"]
puts "Artist Name: " + row["ARTISTNAME"]
# etc, etc...
end
end

:)

Gavin
 
J

John W. Kennedy

David said:
Most language don't guarantee insert order for hash, in fact IIRC some
deliberately randomize it.

They /all/ do, or else they're calling something a "hash" that ain't.

They're /called/ "hashes" (and have been since the days of vacuum-tube
mainframes) /because/ the process "makes a hash of" the ordering.

--
John W. Kennedy
"But now is a new thing which is very old--
that the rich make themselves richer and not poorer,
which is the true Gospel, for the poor's sake."
-- Charles Williams. "Judgement at Chelmsford"
 
H

Hal Fulton

John said:
They /all/ do, or else they're calling something a "hash" that ain't.

They're /called/ "hashes" (and have been since the days of vacuum-tube
mainframes) /because/ the process "makes a hash of" the ordering.

You're correct, of course, as is Lothar.

The reason for the new (mis)use of the term, I believe, is this.

The language feature is named for its implementation (an unfortunate
historical quirk). I am more interested in its interface.

Ideally the language feature should have a name that reflects how it
can be used rather than how it is implemented.

I'm as guilty as anyone else of saying "ordered hash." When I say it,
I mean a data structure whose elements can be addressed like an array,
but with a non-numeric key which is an arbitrary object. Those are
interface issues.

I care about how I put things in and how I get them out. I don't want
to know how it's implemented (except that the implementation "pokes
through" when I retrieve the list and it's not in the original order).

This reminds me a little of the people who say you can't talk about
the "screen" of a laptop. It's a "display"; there's no network of wires
crisscrossing in it, and hence no "screen."

The term "hash" is so much more fun and elegant than "associative
array." Let's come up with a term that is both elegant and accurate
for an ordered entity with a hash-like interface.

Hal
 
G

Gavin Sinclair

[good comments snipped]
The term "hash" is so much more fun and elegant than "associative
array." Let's come up with a term that is both elegant and accurate
for an ordered entity with a hash-like interface.

When I was at uni, we were taught that the interface was "dictionary"
(maps keys to values) and the common implementations are hash, tree,
...

There's only so much you can reasonably dissociate interface from
implementation here, though: hashes are unordered; binary trees have
three orders to choose from! And of course people care about
performance and choose different implementations for different data
sets and use cases.

As for coming up with a term, I wouldn't worry about it too much.
What "order" are you talking about? Key order or insert order? :)

Cheers,
Gavin
 
L

Lothar Scholz

Hello Hal,

Thursday, March 4, 2004, 12:07:15 AM, you wrote:

HF> I'm as guilty as anyone else of saying "ordered hash." When I say it,
HF> I mean a data structure whose elements can be addressed like an array,
HF> but with a non-numeric key which is an arbitrary object. Those are
HF> interface issues.

The offical names are unordered collection (short form: collection),
ordered collection, indexable collection or
ordered indexable collection (for example avl-trees, red-black-trees,
skip-lists and others).
There is also no reason to associate a array access with integer indexes.

HF> The term "hash" is so much more fun and elegant than "associative
HF> array." Let's come up with a term that is both elegant and accurate
HF> for an ordered entity with a hash-like interface.

An hash is an associative array, and also a tree is a associative
array, but a tree is never a hash. So there is no shorter name for
your datastructure then "ordered indexable collection".
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,143
Messages
2,570,822
Members
47,368
Latest member
michaelsmithh

Latest Threads

Top