Store object in on disk / mini database

Kristian Sørensen · Sep 20, 2004

Hi!

Is there some way of writing e.g. a hash table to the filesystem, and
read it again? - Without having to parse in and output (and create the
table all over).

I need to store some information, which could be storred in a small
database (like sqlite - but I can't get ruby-sqlite installed proporly,
if sqlite is not installed in the default location). Is there an
interface to the Berkley DB (www.sleepycat.com)?

Best regards,
Kristian Sørensen.

Ara.T.Howard · Sep 20, 2004

Hi!

Is there some way of writing e.g. a hash table to the filesystem, and read
it again? - Without having to parse in and output (and create the table all
over).

I need to store some information, which could be storred in a small database
(like sqlite - but I can't get ruby-sqlite installed proporly, if sqlite is
not installed in the default location). Is there an interface to the Berkley
DB (www.sleepycat.com)?

Best regards,
Kristian Sørensen.

yes. yes.

jib:~ > cat a.rb
require 'pstore'

db = PStore::new 'db'

this_time = Time::now
last_time = nil

db.transaction do
if db.root? 'time'
last_time = db['time']
end
db['time'] = this_time
end

puts "this_time <#{ this_time }>"
puts "last_time <#{ last_time }>"

jib:~ > ruby a.rb
this_time <Mon Sep 20 13:05:29 MDT 2004>
last_time <>

jib:~ > ruby a.rb
this_time <Mon Sep 20 13:05:33 MDT 2004>
last_time <Mon Sep 20 13:05:29 MDT 2004>

jib:~ > ruby a.rb
this_time <Mon Sep 20 13:05:38 MDT 2004>
last_time <Mon Sep 20 13:05:33 MDT 2004>

jib:~ > cat b.rb
require 'bdb'

db = BDB::Btree.open "bdb", nil, BDB::CREATE, 0644

this_time = Time::now
last_time = nil

last_time = db['time']
db['time'] = this_time

puts "this_time <#{ this_time }>"
puts "last_time <#{ last_time }>"

db.close

jib:~ > ruby b.rb
this_time <Mon Sep 20 13:10:55 MDT 2004>
last_time <>

jib:~ > ruby b.rb
this_time <Mon Sep 20 13:10:56 MDT 2004>
last_time <Mon Sep 20 13:10:55 MDT 2004>

jib:~ > ruby b.rb
this_time <Mon Sep 20 13:11:01 MDT 2004>
last_time <Mon Sep 20 13:10:56 MDT 2004>

regards.

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| A flower falls, even though we love it;
| and a weed grows, even though we do not love it.
| --Dogen
===============================================================================

Lennon Day-Reynolds · Sep 20, 2004

Kristian,

If you're working with small datasets, you can use the built-in
'Marshal' module to persist data.

For example, if the variable 'data' contains your hash to be saved,
you can just do the following:

--
open('myapp.dat', 'wb') do |fh|
Marshal.dump(data, fh)
end
--

To load your data later, you can use 'Marshal.load', which will
restore an object from either an open filehandle or a string. If you
need transactions, take a look at the 'PStore' library, which is part
of the standard distribution; it wraps a convenient database-like
interface on top of the Marshal methods, complete with transactional
access.

If you don't want to keep everything in RAM, there are also DBM, GDBM,
and SDBM bindings in the standard distribution.

Kristian Sørensen · Sep 20, 2004

Thanks for both your suggestions! That was just what I needed!

trans. (T. Onoma) · Sep 20, 2004

Thanks for both your suggestions! That was just what I needed!

There's also YAML.

require 'yaml'

# save
open('myapp.dat', 'w') {|fh| fh << data.to_yaml }

# retrieve
data = YAML.load(File.open('myapp.dat'))

[Note: This is off th top of my head, so it's untested. But basically like
that.]

Nice thing about YAML is that the file it creates is human readable and
editable!

T.

Bill Kelly · Sep 20, 2004

From: "trans. (T. Onoma) said:
Thanks for both your suggestions! That was just what I needed!

Click to expand...

There's also YAML. [...]
Nice thing about YAML is that the file it creates is human readable and
editable!

Additionally YAML supports a drop-in PStore equivalent, so if your code
is already structured to use PStore, you can a YAML::Store the same way.

require 'yaml/store'

ystore = YAML::Store.new("my_datafile.ystore")

# use ystore just as you would a pstore:

my_hash = {"a"=>1, "b"=>2}
my_array = %w(dog cat elephant)

# store stuff in the database

ystore.transaction do
ystore["my_hash"] = my_hash
ystore["my_array"] = my_array
end

# print out all keys/values in database

ystore.transaction do
ystore.roots.each do |key|
puts ystore[key].inspect
end
end

# note the above code is untested

Regards,

Bill

Ara.T.Howard · Sep 20, 2004

From: "trans. (T. Onoma) said:
From: "trans. (T. Onoma) said:

Thanks for both your suggestions! That was just what I needed!

Click to expand...

There's also YAML. [...]
Nice thing about YAML is that the file it creates is human readable and
editable!

Click to expand...

Additionally YAML supports a drop-in PStore equivalent, so if your code
is already structured to use PStore, you can a YAML::Store the same way.

require 'yaml/store'

ystore = YAML::Store.new("my_datafile.ystore")

# use ystore just as you would a pstore:

my_hash = {"a"=>1, "b"=>2}
my_array = %w(dog cat elephant)

# store stuff in the database

ystore.transaction do
ystore["my_hash"] = my_hash
ystore["my_array"] = my_array
end

# print out all keys/values in database

ystore.transaction do
ystore.roots.each do |key|
puts ystore[key].inspect
end
end

# note the above code is untested

yes - yaml is very, very cool - i use it alot for my own code. a couple of
things to be aware of

- yaml is alot slower than marshal. if your db has only 10,000 entries or
so this no problem

- flock does not work on nfs filesystems (used by pstore an
yaml::store)

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| A flower falls, even though we love it;
| and a weed grows, even though we do not love it.
| --Dogen
===============================================================================

Mauricio Fernández · Sep 20, 2004

yes - yaml is very, very cool - i use it alot for my own code. a couple of
things to be aware of

- yaml is alot slower than marshal. if your db has only 10,000 entries or
so this no problem

- flock does not work on nfs filesystems (used by pstore an
yaml::store)

- syck crashes quite often :-(

Kristian SÃ¸rensen · Sep 20, 2004

Hi!

This sounds VERY cool! I will definitly have a look at this tomorrow!!
Thanks! :-D

Cheers, KS.

Bill said:
From: "trans. (T. Onoma) said:

Thanks for both your suggestions! That was just what I needed!

Click to expand...

There's also YAML.
[...]

Nice thing about YAML is that the file it creates is human readable and
editable!

Click to expand...

Additionally YAML supports a drop-in PStore equivalent, so if your code
is already structured to use PStore, you can a YAML::Store the same way.

require 'yaml/store'

ystore = YAML::Store.new("my_datafile.ystore")

# use ystore just as you would a pstore:

my_hash = {"a"=>1, "b"=>2}
my_array = %w(dog cat elephant)

# store stuff in the database

ystore.transaction do
ystore["my_hash"] = my_hash
ystore["my_array"] = my_array
end

# print out all keys/values in database

ystore.transaction do
ystore.roots.each do |key|
puts ystore[key].inspect
end
end

# note the above code is untested

Regards,

Bill

why the lucky stiff · Sep 20, 2004

Mauricio said:
- syck crashes quite often :-(

Are you refering to the bug you found while working on rpa?
[ruby-core:02729] Or are you alluding to other bugs?

_why

Mauricio Fernández · Sep 20, 2004

Mauricio said:
Mauricio said:

- syck crashes quite often :-(

Click to expand...

Are you refering to the bug you found while working on rpa?
[ruby-core:02729] Or are you alluding to other bugs?

Other bugs that look similar (assuming you fixed that one). And I've
had syck-related bugs with rpa-base quite recently (with some 1.8.2
CVS version).

I also have a proof of concept for a versioned FS datastore that
has the very nice property of crashing syck in no time

It's been a few weeks since I last tested it, but I hope its magic
still works -- if so, you can expect a copy in short.

Ara.T.Howard · Sep 20, 2004

I also have a proof of concept for a versioned FS datastore that has the
very nice property of crashing syck in no time It's been a few weeks
since I last tested it, but I hope its magic still works -- if so, you can
expect a copy in short.

have you seen this?

http://repetae.net/~john/computer/vsdb/

super cool idea - but crashes alot. i have a little c binding for testing
only if you are interested. what's the concept of your fs db?

cheers.

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| A flower falls, even though we love it;
| and a weed grows, even though we do not love it.
| --Dogen
===============================================================================

James Britt · Sep 20, 2004

trans. (T. Onoma) said:
Thanks for both your suggestions! That was just what I needed!

Click to expand...

There's also YAML.

require 'yaml'

# save
open('myapp.dat', 'w') {|fh| fh << data.to_yaml }

# retrieve
data = YAML.load(File.open('myapp.dat'))

[Note: This is off th top of my head, so it's untested. But basically like
that.]

Nice thing about YAML is that the file it creates is human readable and
editable!

But you still reparse the data, which the OP wanted to avoid.

James

trans. (T. Onoma) · Sep 20, 2004

But you still reparse the data, which the OP wanted to avoid.

Ah, shucks!!

Although, I imagine you reparse at some level no matter what.
But certainly Marshal is closer to the metal.

--
( o _ ã‚«ãƒ©ãƒ
// trans.
/ \ (e-mail address removed)

I don't give a damn for a man that can only spell a word one way.
-Mark Twain

Mauricio Fernández · Sep 20, 2004

have you seen this?

http://repetae.net/~john/computer/vsdb/

super cool idea - but crashes alot. i have a little c binding for testing

heh looks like http://cr.yp.to/cdb.html with rewrite-on-update (have to
read the code to make sure but it's 1am)
I believe something like rdbm would be better (http://www.fefe.de/rdbm/).

only if you are interested. what's the concept of your fs db?

I first learned about this approach via Eivind Eklund when talking about
OVCS. It's the method used by Subversion and monotone (AFAIR): index
data by its digest. A number of interesting things happen when you do so:
* full-tree versioning
* "implicit deltas" and fairly efficient compression of the data
* ...

I implemented a toy version control system on top of that which could host
itself in a couple days and ~500LoCs; it had O(1) branching, could manage
renaming, used implicit deltas and transparent compression of the data.

This can work on top of any structure able to hold key -> value
associations (where both are strings), so you can use any of the dbs
(gdbm, ndbm, sdbm, bdb, etc) or even a full-fledged rdbms if you want
(as done by monotone), but it could also work in-mem with a simple Hash
and serialization via Marshal, etc...

Bill Kelly · Sep 20, 2004

From: "James Britt said:
But you still reparse the data, which the OP wanted to avoid.

I'd thought the OP didn't want to manually write the code to parse keys
and values from a text file....

(As opposed to behind-the-scenes
parsing going on in a library...)

But IANTOP ;-D

Regards,

Bill

Ara.T.Howard · Sep 20, 2004

heh looks like http://cr.yp.to/cdb.html with rewrite-on-update (have to read
the code to make sure but it's 1am)

yes - true.

I believe something like rdbm would be better (http://www.fefe.de/rdbm/).

perhaps not as nfs safe...

I first learned about this approach via Eivind Eklund when talking about
OVCS. It's the method used by Subversion and monotone (AFAIR): index
data by its digest. A number of interesting things happen when you do so:
* full-tree versioning
* "implicit deltas" and fairly efficient compression of the data
* ...

I implemented a toy version control system on top of that which could host
itself in a couple days and ~500LoCs; it had O(1) branching, could manage
renaming, used implicit deltas and transparent compression of the data.

sound very cool.

This can work on top of any structure able to hold key -> value
associations (where both are strings), so you can use any of the dbs (gdbm,
ndbm, sdbm, bdb, etc) or even a full-fledged rdbms if you want (as done by
monotone), but it could also work in-mem with a simple Hash and
serialization via Marshal, etc...

any pointers to read about? sounds like a very interesting concept.

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| A flower falls, even though we love it;
| and a weed grows, even though we do not love it.
| --Dogen
===============================================================================

Dick Davies · Sep 21, 2004

* Mauricio Fern?ndez said:
heh looks like http://cr.yp.to/cdb.html with rewrite-on-update (have to
read the code to make sure but it's 1am)
I believe something like rdbm would be better (http://www.fefe.de/rdbm/).

I first learned about this approach via Eivind Eklund when talking about
OVCS. It's the method used by Subversion and monotone (AFAIR): index
data by its digest. A number of interesting things happen when you do so:
* full-tree versioning
* "implicit deltas" and fairly efficient compression of the data
* ...

By 'index by digest', do you mean something like Venti:

http://www.cs.bell-labs.com/sys/doc/venti/venti.html

? I tried playing with a ruby-based version of this a while ago, but couldn't
find a good way of chopping up files to store them efficiently.....

Mauricio Fernández · Sep 21, 2004

By 'index by digest', do you mean something like Venti:

http://www.cs.bell-labs.com/sys/doc/venti/venti.html

Yes, the fundamental idea is the same.

? I tried playing with a ruby-based version of this a while ago, but couldn't
find a good way of chopping up files to store them efficiently.....

A moving CRC will do, e.g.

if crc(buffer, offset, CRCLEN) % AVERAGE_LENGTH == 1
chop up to current offset
insert fragment
else
offset += 1
... logic if offset >= MAX_FRAGMENT_SIZE ...
end

that gives you chunks of length averaging AVERAGE_LENGTH, in most
cases. Lower values mean higher P(node reuse) but there's a per-chunk overhead
(key + pointer to it in a list, etc).

Messages don't show on the website, only stores in Firestore Database	1	Jan 22, 2023
key/value store optimized for disk storage	23	May 2, 2012
Fastest way to store ints and floats on disk	2	Aug 7, 2008
Perl to SQLite bridge is not working, database connect fails ....	3	Apr 8, 2014
create, read, update an sqlite database	2	Nov 19, 2007
I need help in understanding these files on my phone, Could someone help me understand these files? Urgent help needed. Please help.	3	Jun 4, 2023
Looking for a Fast Persistent Store	43	Aug 9, 2006
store encrypted data in sqlite ?	1	Oct 2, 2009

Store object in on disk / mini database

Kristian Sørensen

Ara.T.Howard

Lennon Day-Reynolds

Kristian Sørensen

trans. (T. Onoma)

Bill Kelly

Ara.T.Howard

Mauricio Fernández

Kristian SÃ¸rensen

why the lucky stiff

Mauricio Fernández

Ara.T.Howard

James Britt

trans. (T. Onoma)

Mauricio Fernández

Bill Kelly

Ara.T.Howard

Dick Davies

Mauricio Fernández

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads