Database vs Filesystem

S

Sven Schott

I am writing a very simple web based file/asset manager. I was
initially thinking of simply creating a database with the info on each
file and keeping the files in the filesystem. However, I have begun to
wonder if having the files in a database would be faster(or slower).
Has anybody had any experience with this sort of thing? Would large
databases slow things down? Should I just keep stuff on the filesystem?
I have had some issues on the DB with encoding types. I probably
wouldn't want to base64 encode everything. ;)

Any suggestions would be appreciated.

Regards


Sven Schott
 
G

gabriele renzi

il Tue, 1 Jun 2004 09:46:22 +0900, Sven Schott
Any suggestions would be appreciated.

did you considered using PStore (a simple file based DB that workds
like an hash, included in ruby)?
What about madeleine?
 
R

Robert Klemme

Sven Schott said:
I am writing a very simple web based file/asset manager. I was
initially thinking of simply creating a database with the info on each
file and keeping the files in the filesystem. However, I have begun to
wonder if having the files in a database would be faster(or slower).
Has anybody had any experience with this sort of thing? Would large
databases slow things down? Should I just keep stuff on the filesystem?
I have had some issues on the DB with encoding types. I probably
wouldn't want to base64 encode everything. ;)

Any suggestions would be appreciated.

Generally speaking a database might be the better solution if you otherwise
ended up with thousands of files per single directory. In this scenario,
file systems typically don't perform well. I'm not sure about more recent
developments of file systems, since they evolve towards database
functionality (transactions, support for many small objects (i.e. files) and
the like).

You could have a database table like [fileName VARCHAR(1024), fileData
BLOB)] with an index on "fileName" for fast access. Maybe you need an
additional ordering column if your files are so big that they don't fit a
single blob field of the database you are using. If you want to store meta
data (file creation and modification time, size etc.) you'll need even more
columns (and / or tables).

Kind regards

robert
 
M

Michael Neumann

I am writing a very simple web based file/asset manager. I was
initially thinking of simply creating a database with the info on each
file and keeping the files in the filesystem. However, I have begun to
wonder if having the files in a database would be faster(or slower).
Has anybody had any experience with this sort of thing? Would large
databases slow things down? Should I just keep stuff on the filesystem?
I have had some issues on the DB with encoding types. I probably
wouldn't want to base64 encode everything. ;)

No, you would want to store the file in a BLOB column of your database.
There are no encodings defined on BLOB columns.

To get your files out of the database, you need a CGI script (or
something equivalent), which is for sure slower that direct file access
(especially when written in Ruby). But that depends on what you prefer,
flexibilty or performance, and the size of your files. For many "small"
files, with attached meta-data I'd prefer a database over the file
system.

For example, I stored the article images for an online shop inside a
postgres database. But I havn't measured the performance. But I can give
you hints how to write the CGI that gets the file out of the db.

Regards,

Michael
 
O

Osuka Adartse

Sven said:
I am writing a very simple web based file/asset manager. I was initially
thinking of simply creating a database with the info on each file and
keeping the files in the filesystem. However, I have begun to wonder if
having the files in a database would be faster(or slower). Has anybody
had any experience with this sort of thing? Would large databases slow
things down? Should I just keep stuff on the filesystem? I have had some
issues on the DB with encoding types. I probably wouldn't want to base64
encode everything. ;)

Any suggestions would be appreciated.

Regards


Sven Schott
Having done a gallery-like webpage, both using FS and DB(mysql), I would
say that using a DB to hold blobs/binaries would be slower, quite
slower, even adding caching and other optims of mysql, I have keep doing
it 'cos this way I hold all the data in single db(metadata and binaries)
but I plan on figuring a way to render the "site" to FS and keep the
metadata in a db(maybe using sqlite for non-binary-data).

The FS has 6000+ files and I didn't see any slowdown from the server,
now trying to navigate those on the explorer...ouch, on my experience:
*[speed] Keep the binaries on FS and other data on a DB.
*[convenience] keep it all on DB(more and easier "control").

Having said so I'm a *hobbyist*, take that into account, not a pro here,
just my limited experience, besides I read somewhere that in such case
the Db would be the bottleneck to move all the binary data, FS being
more efficient. I use Win2k,Apache,Eruby,Mysql with blobs for binaries

Best regards
Adartse
 
G

Greg Brondo

I would suggest using a database and going with sqlite (www.sqlite.org).
Very fast, very easy to use. Single file database.

Also, in the future if need be you could easily migrate to another db system
if need be.

Greg B.
 
S

Sven Schott

Pstore is a bit too simple as I need something that handles multiple
clients and I don't really want to write a transaction engine(I'm
hardly qualified), so MySQL(or Post) would be my first choice. I just
really wanted to know if using the MySQL ruby libraries would be faster
or slower than the File ruby library. How well would ruby handle
passing binary data to MySQL? How fast would it be? Should I just stick
to file operations?

Sven

P.S. I had a look at Madeleine but I don't have the memory size for it(
some of the files can be a few hundred MB and when you have several
thousand files... :) )
 
J

Joel VanderWerf

Sven said:
Pstore is a bit too simple as I need something that handles multiple
clients and I don't really want to write a transaction engine(I'm hardly
qualified), so MySQL(or Post) would be my first choice. I just really
wanted to know if using the MySQL ruby libraries would be faster or
slower than the File ruby library. How well would ruby handle passing
binary data to MySQL? How fast would it be? Should I just stick to file
operations?

As an alternative to PStore, there is my FSDB library[1]. Like PStore,
it's pure ruby. But instead of putting everything in one file, it makes
multiple files look like a database, of sorts. Also, it's thread and
process safe and has simple transactions. It allows multiple back-end
formats for objects: strings (binary or ascii), marshal, yaml, etc.

I don't expect it will be as fast as as a true database, but it is
convenient to be able to access your "database" as a normal file tree.

[1] http://redshift.sourceforge.net/fsdb-0.4
 
G

Gavin Sinclair

Pstore is a bit too simple as I need something that handles multiple
clients and I don't really want to write a transaction engine(I'm
hardly qualified), so MySQL(or Post) would be my first choice. I just
really wanted to know if using the MySQL ruby libraries would be faster
or slower than the File ruby library. How well would ruby handle
passing binary data to MySQL? How fast would it be? Should I just stick
to file operations?

Sven

I'd be extremely surprised if any database could give you the data faster
than a filesystem can. Certainly not a DB server like MySQL.

SQLite would give you the data quite fast if it could find it quickly
(e.g. with good indexes), since it's basically a file anyway. This would
be a good trade-off between performance and convenience, as the database
*is* a single file so you can take it anywhere.

If you go with the filesystem, be careful not to put too many files in one
directory. If each file is named something benign like

D0342345.dat
D0343351.dat
...

then a good strategy might be to arrange them like this:

D/03/42/34/5/D0342345.dat
D/03/43/35/1/D0343351.dat
...

That way, the database just remembers "D0343351" and the path to that file
becomes completely predictable and easy for the filesystem to resolve.

This technique applies to some filesystems more than others and I haven't
used it myself, but know someone who did with thousands of user records,
"indexed" on user ID.

Cheers,
Gavin
 
W

William Park

Sven Schott said:
Pstore is a bit too simple as I need something that handles multiple
clients and I don't really want to write a transaction engine(I'm
hardly qualified), so MySQL(or Post) would be my first choice. I just
really wanted to know if using the MySQL ruby libraries would be faster
or slower than the File ruby library. How well would ruby handle
passing binary data to MySQL? How fast would it be? Should I just stick
to file operations?

Sven

P.S. I had a look at Madeleine but I don't have the memory size for it(
some of the files can be a few hundred MB and when you have several
thousand files... :) )

In that case, keep the files as files. Next issue is, why not put your
"info" in file instead of in database.
 
A

Ara.T.Howard

Sven said:
Pstore is a bit too simple as I need something that handles multiple
clients and I don't really want to write a transaction engine(I'm hardly
qualified), so MySQL(or Post) would be my first choice. I just really
wanted to know if using the MySQL ruby libraries would be faster or
slower than the File ruby library. How well would ruby handle passing
binary data to MySQL? How fast would it be? Should I just stick to file
operations?

As an alternative to PStore, there is my FSDB library[1]. Like PStore,
it's pure ruby. But instead of putting everything in one file, it makes
multiple files look like a database, of sorts. Also, it's thread and
process safe and has simple transactions. It allows multiple back-end
formats for objects: strings (binary or ascii), marshal, yaml, etc.

I don't expect it will be as fast as as a true database, but it is
convenient to be able to access your "database" as a normal file tree.

[1] http://redshift.sourceforge.net/fsdb-0.4

i was just checking this out today joel - very cool.

marshal aint that slow for medium sized data sets:

~ > ruby pstore_test.rb 8192
schema @ 0.0243198871612549
insert @ 0.0122055444226135
select @ 0.0250020936364308


~ > ruby sqlite_test.rb 8192
schema @ 0.0509798526763916
insert @ 0.0439234171062708
select @ 0.000174661865457892


~ > cat pstore_test.rb
#
# builtin
#
require 'pstore'

n = (n or ARGV.shift).to_i

#
# connect
#
db = PStore.new 'pstore.db'
#
# schema
#
a = Time.now
db.transaction do
db['table'] = []
db['pk_index'] = {}
end
b = Time.now
puts "schema @ #{ b.to_f - a.to_f }"
#
# insert
#
a = Time.now
n.times do |i|
db.transaction do
table = db['table']
table << []
db['pk_index'] = table.size - 1
db['table'] = table
end
end
b = Time.now
puts "insert @ #{ (b.to_f - a.to_f) / n }"
#
# select
#
a = Time.now
n.times do |i|
magic = rand n
table, pk_index = db.transaction{[db['table'], db['pk_index']]}
tuple = table[pk_index[magic]]
end
b = Time.now
puts "select @ #{ (b.to_f - a.to_f) / n }"



~ > cat sqlite_test.rb
#
# raa
#
require 'sqlite'

n = (n or ARGV.shift).to_i

#
# connect
#
db = SQLite::Database.new 'sqlite.db', 0
#
# schema
#
schema = <<-sql
create table foo(bar, primary key (bar));
sql
a = Time.now
db.execute schema
b = Time.now
puts "schema @ #{ b.to_f - a.to_f }"
#
# insert
#
a = Time.now
n.times do |i|
sql = <<-sql
insert into foo values ('#{ i }');
sql
db.execute sql
end
b = Time.now
puts "insert @ #{ (b.to_f - a.to_f) / n }"
#
# select
#
a = Time.now
n.times do |i|
magic = rand n
sql = <<-sql
select * from foo where bar = '#{ magic }'
sql
db.execute sql
end
b = Time.now
puts "select @ #{ (b.to_f - a.to_f) / n }"


i realize the test is neither accurate nor complete - but interesting
nonetheless.

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| A flower falls, even though we love it; and a weed grows, even though we do
| not love it. --Dogen
===============================================================================
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,146
Messages
2,570,832
Members
47,374
Latest member
anuragag27

Latest Threads

Top