Store object in on disk / mini database

K

Kristian Sørensen

Hi!

Is there some way of writing e.g. a hash table to the filesystem, and
read it again? - Without having to parse in and output (and create the
table all over).

I need to store some information, which could be storred in a small
database (like sqlite - but I can't get ruby-sqlite installed proporly,
if sqlite is not installed in the default location). Is there an
interface to the Berkley DB (www.sleepycat.com)?


Best regards,
Kristian Sørensen.
 
A

Ara.T.Howard

Hi!

Is there some way of writing e.g. a hash table to the filesystem, and read
it again? - Without having to parse in and output (and create the table all
over).

I need to store some information, which could be storred in a small database
(like sqlite - but I can't get ruby-sqlite installed proporly, if sqlite is
not installed in the default location). Is there an interface to the Berkley
DB (www.sleepycat.com)?


Best regards,
Kristian Sørensen.

yes. yes.


jib:~ > cat a.rb
require 'pstore'

db = PStore::new 'db'

this_time = Time::now
last_time = nil

db.transaction do
if db.root? 'time'
last_time = db['time']
end
db['time'] = this_time
end


puts "this_time <#{ this_time }>"
puts "last_time <#{ last_time }>"

jib:~ > ruby a.rb
this_time <Mon Sep 20 13:05:29 MDT 2004>
last_time <>

jib:~ > ruby a.rb
this_time <Mon Sep 20 13:05:33 MDT 2004>
last_time <Mon Sep 20 13:05:29 MDT 2004>

jib:~ > ruby a.rb
this_time <Mon Sep 20 13:05:38 MDT 2004>
last_time <Mon Sep 20 13:05:33 MDT 2004>



jib:~ > cat b.rb
require 'bdb'

db = BDB::Btree.open "bdb", nil, BDB::CREATE, 0644

this_time = Time::now
last_time = nil

last_time = db['time']
db['time'] = this_time

puts "this_time <#{ this_time }>"
puts "last_time <#{ last_time }>"

db.close

jib:~ > ruby b.rb
this_time <Mon Sep 20 13:10:55 MDT 2004>
last_time <>

jib:~ > ruby b.rb
this_time <Mon Sep 20 13:10:56 MDT 2004>
last_time <Mon Sep 20 13:10:55 MDT 2004>

jib:~ > ruby b.rb
this_time <Mon Sep 20 13:11:01 MDT 2004>
last_time <Mon Sep 20 13:10:56 MDT 2004>


regards.

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| A flower falls, even though we love it;
| and a weed grows, even though we do not love it.
| --Dogen
===============================================================================
 
L

Lennon Day-Reynolds

Kristian,

If you're working with small datasets, you can use the built-in
'Marshal' module to persist data.

For example, if the variable 'data' contains your hash to be saved,
you can just do the following:

--
open('myapp.dat', 'wb') do |fh|
Marshal.dump(data, fh)
end
--

To load your data later, you can use 'Marshal.load', which will
restore an object from either an open filehandle or a string. If you
need transactions, take a look at the 'PStore' library, which is part
of the standard distribution; it wraps a convenient database-like
interface on top of the Marshal methods, complete with transactional
access.

If you don't want to keep everything in RAM, there are also DBM, GDBM,
and SDBM bindings in the standard distribution.
 
T

trans. (T. Onoma)

Thanks for both your suggestions! That was just what I needed! :)

There's also YAML.

require 'yaml'

# save
open('myapp.dat', 'w') {|fh| fh << data.to_yaml }

# retrieve
data = YAML.load(File.open('myapp.dat'))

[Note: This is off th top of my head, so it's untested. But basically like
that.]

Nice thing about YAML is that the file it creates is human readable and
editable!


T.
 
B

Bill Kelly

From: "trans. (T. Onoma) said:
Thanks for both your suggestions! That was just what I needed! :)

There's also YAML. [...]
Nice thing about YAML is that the file it creates is human readable and
editable!

Additionally YAML supports a drop-in PStore equivalent, so if your code
is already structured to use PStore, you can a YAML::Store the same way.

require 'yaml/store'

ystore = YAML::Store.new("my_datafile.ystore")

# use ystore just as you would a pstore:

my_hash = {"a"=>1, "b"=>2}
my_array = %w(dog cat elephant)

# store stuff in the database

ystore.transaction do
ystore["my_hash"] = my_hash
ystore["my_array"] = my_array
end

# print out all keys/values in database

ystore.transaction do
ystore.roots.each do |key|
puts ystore[key].inspect
end
end


# note the above code is untested


Regards,

Bill
 
A

Ara.T.Howard

From: "trans. (T. Onoma) said:
Thanks for both your suggestions! That was just what I needed! :)

There's also YAML. [...]
Nice thing about YAML is that the file it creates is human readable and
editable!

Additionally YAML supports a drop-in PStore equivalent, so if your code
is already structured to use PStore, you can a YAML::Store the same way.

require 'yaml/store'

ystore = YAML::Store.new("my_datafile.ystore")

# use ystore just as you would a pstore:

my_hash = {"a"=>1, "b"=>2}
my_array = %w(dog cat elephant)

# store stuff in the database

ystore.transaction do
ystore["my_hash"] = my_hash
ystore["my_array"] = my_array
end

# print out all keys/values in database

ystore.transaction do
ystore.roots.each do |key|
puts ystore[key].inspect
end
end


# note the above code is untested

yes - yaml is very, very cool - i use it alot for my own code. a couple of
things to be aware of

- yaml is alot slower than marshal. if your db has only 10,000 entries or
so this no problem

- flock does not work on nfs filesystems (used by pstore an
yaml::store)

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| A flower falls, even though we love it;
| and a weed grows, even though we do not love it.
| --Dogen
===============================================================================
 
M

Mauricio Fernández

yes - yaml is very, very cool - i use it alot for my own code. a couple of
things to be aware of

- yaml is alot slower than marshal. if your db has only 10,000 entries or
so this no problem

- flock does not work on nfs filesystems (used by pstore an
yaml::store)

- syck crashes quite often :-(
 
K

Kristian Sørensen

Hi!

This sounds VERY cool! I will definitly have a look at this tomorrow!!
Thanks! :-D

Cheers, KS.

Bill said:
From: "trans. (T. Onoma) said:
Thanks for both your suggestions! That was just what I needed! :)

There's also YAML.
[...]

Nice thing about YAML is that the file it creates is human readable and
editable!


Additionally YAML supports a drop-in PStore equivalent, so if your code
is already structured to use PStore, you can a YAML::Store the same way.

require 'yaml/store'

ystore = YAML::Store.new("my_datafile.ystore")

# use ystore just as you would a pstore:

my_hash = {"a"=>1, "b"=>2}
my_array = %w(dog cat elephant)

# store stuff in the database

ystore.transaction do
ystore["my_hash"] = my_hash
ystore["my_array"] = my_array
end

# print out all keys/values in database

ystore.transaction do
ystore.roots.each do |key|
puts ystore[key].inspect
end
end


# note the above code is untested


Regards,

Bill
 
W

why the lucky stiff

Mauricio said:
- syck crashes quite often :-(
Are you refering to the bug you found while working on rpa?
[ruby-core:02729] Or are you alluding to other bugs?

_why
 
M

Mauricio Fernández

Mauricio said:
- syck crashes quite often :-(
Are you refering to the bug you found while working on rpa?
[ruby-core:02729] Or are you alluding to other bugs?

Other bugs that look similar (assuming you fixed that one). And I've
had syck-related bugs with rpa-base quite recently (with some 1.8.2
CVS version).

I also have a proof of concept for a versioned FS datastore that
has the very nice property of crashing syck in no time :)
It's been a few weeks since I last tested it, but I hope its magic
still works -- if so, you can expect a copy in short.
 
A

Ara.T.Howard

I also have a proof of concept for a versioned FS datastore that has the
very nice property of crashing syck in no time :) It's been a few weeks
since I last tested it, but I hope its magic still works -- if so, you can
expect a copy in short.

have you seen this?

http://repetae.net/~john/computer/vsdb/

super cool idea - but crashes alot. i have a little c binding for testing
only if you are interested. what's the concept of your fs db?

cheers.

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| A flower falls, even though we love it;
| and a weed grows, even though we do not love it.
| --Dogen
===============================================================================
 
J

James Britt

trans. (T. Onoma) said:
Thanks for both your suggestions! That was just what I needed! :)


There's also YAML.

require 'yaml'

# save
open('myapp.dat', 'w') {|fh| fh << data.to_yaml }

# retrieve
data = YAML.load(File.open('myapp.dat'))

[Note: This is off th top of my head, so it's untested. But basically like
that.]

Nice thing about YAML is that the file it creates is human readable and
editable!

But you still reparse the data, which the OP wanted to avoid.

James
 
T

trans. (T. Onoma)

But you still reparse the data, which the OP wanted to avoid.

Ah, shucks!! :) Although, I imagine you reparse at some level no matter what.
But certainly Marshal is closer to the metal.

--
( o _ カラãƒ
// trans.
/ \ (e-mail address removed)

I don't give a damn for a man that can only spell a word one way.
-Mark Twain
 
M

Mauricio Fernández

have you seen this?

http://repetae.net/~john/computer/vsdb/

super cool idea - but crashes alot. i have a little c binding for testing

heh looks like http://cr.yp.to/cdb.html with rewrite-on-update (have to
read the code to make sure but it's 1am)
I believe something like rdbm would be better (http://www.fefe.de/rdbm/).
only if you are interested. what's the concept of your fs db?

I first learned about this approach via Eivind Eklund when talking about
OVCS. It's the method used by Subversion and monotone (AFAIR): index
data by its digest. A number of interesting things happen when you do so:
* full-tree versioning
* "implicit deltas" and fairly efficient compression of the data
* ...

I implemented a toy version control system on top of that which could host
itself in a couple days and ~500LoCs; it had O(1) branching, could manage
renaming, used implicit deltas and transparent compression of the data.

This can work on top of any structure able to hold key -> value
associations (where both are strings), so you can use any of the dbs
(gdbm, ndbm, sdbm, bdb, etc) or even a full-fledged rdbms if you want
(as done by monotone), but it could also work in-mem with a simple Hash
and serialization via Marshal, etc...
 
B

Bill Kelly

From: "James Britt said:
But you still reparse the data, which the OP wanted to avoid.

I'd thought the OP didn't want to manually write the code to parse keys
and values from a text file.... :) (As opposed to behind-the-scenes
parsing going on in a library...)

But IANTOP ;-D

Regards,

Bill
 
A

Ara.T.Howard

heh looks like http://cr.yp.to/cdb.html with rewrite-on-update (have to read
the code to make sure but it's 1am)

yes - true.
I believe something like rdbm would be better (http://www.fefe.de/rdbm/).

perhaps not as nfs safe...
I first learned about this approach via Eivind Eklund when talking about
OVCS. It's the method used by Subversion and monotone (AFAIR): index
data by its digest. A number of interesting things happen when you do so:
* full-tree versioning
* "implicit deltas" and fairly efficient compression of the data
* ...

I implemented a toy version control system on top of that which could host
itself in a couple days and ~500LoCs; it had O(1) branching, could manage
renaming, used implicit deltas and transparent compression of the data.

sound very cool.
This can work on top of any structure able to hold key -> value
associations (where both are strings), so you can use any of the dbs (gdbm,
ndbm, sdbm, bdb, etc) or even a full-fledged rdbms if you want (as done by
monotone), but it could also work in-mem with a simple Hash and
serialization via Marshal, etc...

any pointers to read about? sounds like a very interesting concept.


-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| A flower falls, even though we love it;
| and a weed grows, even though we do not love it.
| --Dogen
===============================================================================
 
D

Dick Davies

* Mauricio Fern?ndez said:
heh looks like http://cr.yp.to/cdb.html with rewrite-on-update (have to
read the code to make sure but it's 1am)
I believe something like rdbm would be better (http://www.fefe.de/rdbm/).


I first learned about this approach via Eivind Eklund when talking about
OVCS. It's the method used by Subversion and monotone (AFAIR): index
data by its digest. A number of interesting things happen when you do so:
* full-tree versioning
* "implicit deltas" and fairly efficient compression of the data
* ...

By 'index by digest', do you mean something like Venti:

http://www.cs.bell-labs.com/sys/doc/venti/venti.html

? I tried playing with a ruby-based version of this a while ago, but couldn't
find a good way of chopping up files to store them efficiently.....
 
M

Mauricio Fernández

By 'index by digest', do you mean something like Venti:

http://www.cs.bell-labs.com/sys/doc/venti/venti.html

Yes, the fundamental idea is the same.
? I tried playing with a ruby-based version of this a while ago, but couldn't
find a good way of chopping up files to store them efficiently.....

A moving CRC will do, e.g.

if crc(buffer, offset, CRCLEN) % AVERAGE_LENGTH == 1
chop up to current offset
insert fragment
else
offset += 1
... logic if offset >= MAX_FRAGMENT_SIZE ...
end

that gives you chunks of length averaging AVERAGE_LENGTH, in most
cases. Lower values mean higher P(node reuse) but there's a per-chunk overhead
(key + pointer to it in a list, etc).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,159
Messages
2,570,879
Members
47,414
Latest member
GayleWedel

Latest Threads

Top