B
Bill Kelly
Hi,
I, too, share the same wish...
(Side note - agree about SDBM. Doesn't work properly
on Windows, and even on Linux if you try to store anything
but really tiny key/value pairs, the data file bloats up
to gigabyte size in no time, and pretty soon it just fails
to find a place to store the data.... ...)
In a current project, I started out by using YAML::Store,
because the human-readable data format was especially
useful to me. But it didn't take long before we'd added
enough records that the YAML file size grew too large to
be loaded in a reasonable time by the CGI app that needed
it. I considered switching to PStore, but really wanted
to keep the human-readable data. Instead I made a small
wrapper for YAML::Store (would work the same with PStore)
that, based on the key you're requesting, hashes out to
one of (currently 256) files on disk.
It could certainly be more sophisticated; but right now
it's been meeting my specific needs quite well. On the
chance that it may be useful to anyone else, here it is:
db-store.rb
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
require 'yaml/store'
class DBStore
def initialize(dbname)
@dbname = dbname
end
def transaction(key)
hashname = hashname_for_key(key)
ystore = YAML::Store.new(hashname_to_store_filename(hashname))
ystore.transaction do
yield(ystore)
end
end
def each_ystore
each_store_filename do |fname|
next unless File.exist? fname
ystore = YAML::Store.new(fname)
ystore.transaction do
yield ystore
end
end
end
def each
each_ystore do |ystore|
ystore.roots.each do |key|
rec = ystore[key]
yield rec
end
end
end
protected
def each_store_filename
each_hashname {|hn| yield hashname_to_store_filename(hn) }
end
def hashname_to_store_filename(hashname)
"#@dbname/#{hashname}.ystore"
end
def each_hashname
0.upto(255) {|n| yield idx_to_hashname(n) }
end
def idx_to_hashname(idx)
sprintf("%02x", idx)
end
def hashname_for_key(key)
sprintf("%02x", key.hash & 255)
end
end
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Usage example:
customer = Customer.new( *CGI_stuff )
dbstore = DBStore.new("path/to/datadir")
dbstore.transaction(customer.email) do |keystore|
if keystore.root?(customer.email) # record for this user already in database?
existing_cust = keystore[customer.email]
existing_cust.update( customer )
# ...
else
keystore[customer.email] = customer
# ...
end
end
So the main difference is that transactaion now needs
the database key you're wanting to deal with, so it
can go fetch the appropriate YAML::Store (or PStore,
etc.) database chunk from a disk file it knows must
contain that key. What transaction() then yields to
its block is just a normal YAML::Store object (or
PStore, etc.)
If you want to iterate over all the *Store database
chunks on disk, there's DBStore#each_ystore.
If you want to just iterate over all the records,
there's DBStore#each.
dbstore = DBStore.new("path/to/datadir")
dbstore.each do |customer|
puts customer.to_s
end
* * *
Looking at DBStore, it would seem to be easy to change so
that one could pass in the preferred "store" mechanism...
What if initialize() looked like this:
def initialize(dbname, storeclass=YAML::Store)
@dbname = dbname
@storeclass = storeclass
end
I believe it would then work with PStore as well, just
changing the explicit occurrances of YAML::Store to
@storeclass.
Also, that it hashes out to 256 files on disk is of
course arbitrary. It would certainly be trivial to
make it hash to a million files on disk and create
subdirectories as necessary... Perhaps just
def hashname_for_key(key)
hv = key.hash % 1000000
sprintf("%03d/%03d", hv / 1000, hv % 1000)
end
then putting a Dir.mkdir(File.dirname(hashname)) in
transaction()...
Anyway, for what it's worth . . .
Regards,
Bill
From: "Hal Fulton said:Hi, all...
I sometimes wish for a very simple database with the
following features:
1. Distributed as part of Ruby
2. Need not store entire db in memory
3. No SQL requirement
4. No special efficiency requirement
5. Available cross-platform
6. Database files are readable cross-platform
Typically I use DBM in this case. But it doesn't
meet (5) and (6), since it's not there on Windows
and the files can't be moved even across Linux
systems.
A simple marshal would be fine, but it violates (2).
I believe PStore would also?
SDBM works by default on Windows, but it is severely
limited if not actually buggy. Probably violates (6)
and (5) also.
Does anyone have any recommendation? Or would a
"universal built-in database" make an interesting
addition to our world?
I, too, share the same wish...
(Side note - agree about SDBM. Doesn't work properly
on Windows, and even on Linux if you try to store anything
but really tiny key/value pairs, the data file bloats up
to gigabyte size in no time, and pretty soon it just fails
to find a place to store the data.... ...)
In a current project, I started out by using YAML::Store,
because the human-readable data format was especially
useful to me. But it didn't take long before we'd added
enough records that the YAML file size grew too large to
be loaded in a reasonable time by the CGI app that needed
it. I considered switching to PStore, but really wanted
to keep the human-readable data. Instead I made a small
wrapper for YAML::Store (would work the same with PStore)
that, based on the key you're requesting, hashes out to
one of (currently 256) files on disk.
It could certainly be more sophisticated; but right now
it's been meeting my specific needs quite well. On the
chance that it may be useful to anyone else, here it is:
db-store.rb
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
require 'yaml/store'
class DBStore
def initialize(dbname)
@dbname = dbname
end
def transaction(key)
hashname = hashname_for_key(key)
ystore = YAML::Store.new(hashname_to_store_filename(hashname))
ystore.transaction do
yield(ystore)
end
end
def each_ystore
each_store_filename do |fname|
next unless File.exist? fname
ystore = YAML::Store.new(fname)
ystore.transaction do
yield ystore
end
end
end
def each
each_ystore do |ystore|
ystore.roots.each do |key|
rec = ystore[key]
yield rec
end
end
end
protected
def each_store_filename
each_hashname {|hn| yield hashname_to_store_filename(hn) }
end
def hashname_to_store_filename(hashname)
"#@dbname/#{hashname}.ystore"
end
def each_hashname
0.upto(255) {|n| yield idx_to_hashname(n) }
end
def idx_to_hashname(idx)
sprintf("%02x", idx)
end
def hashname_for_key(key)
sprintf("%02x", key.hash & 255)
end
end
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Usage example:
customer = Customer.new( *CGI_stuff )
dbstore = DBStore.new("path/to/datadir")
dbstore.transaction(customer.email) do |keystore|
if keystore.root?(customer.email) # record for this user already in database?
existing_cust = keystore[customer.email]
existing_cust.update( customer )
# ...
else
keystore[customer.email] = customer
# ...
end
end
So the main difference is that transactaion now needs
the database key you're wanting to deal with, so it
can go fetch the appropriate YAML::Store (or PStore,
etc.) database chunk from a disk file it knows must
contain that key. What transaction() then yields to
its block is just a normal YAML::Store object (or
PStore, etc.)
If you want to iterate over all the *Store database
chunks on disk, there's DBStore#each_ystore.
If you want to just iterate over all the records,
there's DBStore#each.
dbstore = DBStore.new("path/to/datadir")
dbstore.each do |customer|
puts customer.to_s
end
* * *
Looking at DBStore, it would seem to be easy to change so
that one could pass in the preferred "store" mechanism...
What if initialize() looked like this:
def initialize(dbname, storeclass=YAML::Store)
@dbname = dbname
@storeclass = storeclass
end
I believe it would then work with PStore as well, just
changing the explicit occurrances of YAML::Store to
@storeclass.
Also, that it hashes out to 256 files on disk is of
course arbitrary. It would certainly be trivial to
make it hash to a million files on disk and create
subdirectories as necessary... Perhaps just
def hashname_for_key(key)
hv = key.hash % 1000000
sprintf("%03d/%03d", hv / 1000, hv % 1000)
end
then putting a Dir.mkdir(File.dirname(hashname)) in
transaction()...
Anyway, for what it's worth . . .
Regards,
Bill