Which non SQL Database ?

J

Jorge Biquez

Hello all.

Newbie question. Sorry.

As part of my process to learn python I am working on two personal
applications. Both will do it fine with a simple structure of data
stored in files. I now there are lot of databases around I can use
but I would like to know yoor advice on what other options you would
consider for the job (it is training so no pressure on performance).
One application will run as a desktop one,under Windows, Linux,
Macintosh, being able to update data, not much, not complex, not many
records. The second application, running behind web pages, will do
the same, I mean, process simple data, updating showing data. not
much info, not complex. As an excersice it is more than enough I
guess and will let me learn what I need for now.
Talking with a friend about what he will do (he use C only) he
suggest to take a look on dBase format file since it is a stable
format, fast and the index structure will be fine or maybe go with BD
(Berkley) database file format (I hope I understood this one
correctly) . Plain files it is not an option since I would like to
have option to do rapid searches.

What would do you suggest to take a look? If possible available under
the 3 plattforms.

Thanks in advance for your comments.

Jorge Biquez
 
A

Arnaud Delobelle

Jorge Biquez said:
Hello all.

Newbie question. Sorry.

As part of my process to learn python I am working on two personal
applications. Both will do it fine with a simple structure of data
stored in files. I now there are lot of databases around I can use but
I would like to know yoor advice on what other options you would
consider for the job (it is training so no pressure on
performance). One application will run as a desktop one,under Windows,
Linux, Macintosh, being able to update data, not much, not complex,
not many records. The second application, running behind web pages,
will do the same, I mean, process simple data, updating showing
data. not much info, not complex. As an excersice it is more than
enough I guess and will let me learn what I need for now.
Talking with a friend about what he will do (he use C only) he suggest
to take a look on dBase format file since it is a stable format, fast
and the index structure will be fine or maybe go with BD (Berkley)
database file format (I hope I understood this one correctly) . Plain
files it is not an option since I would like to have option to do
rapid searches.

What would do you suggest to take a look? If possible available under
the 3 plattforms.

Have you considered sqlite3? It is part of the Python standard library.
It'll work under GNU/Linux, Windows and Mac OSX. For more details see:

http://docs.python.org/library/sqlite3.html
 
J

Jorge Biquez

Have you considered sqlite3? It is part of the Python standard library.
It'll work under GNU/Linux, Windows and Mac OSX. For more details see:

http://docs.python.org/library/sqlite3.html

Hello all.

Ok. sqlite3 seems like it is the best option since it is part of
python already.... you are right of course.

I do not see a good reason for not using Sqlite3 BUT if for some
reason would not be an option.... what plain schema of files would
you use? I am sorry to insist. I do not know much about the size tha
using Sqlite adds to the application but the idea is that the ,
application, single user for desktop yes, will be the smallest it can
be since the idea is to distribute the executable only. (all this is
to have a prototype I have done in other language)

For the web part, yes, of course would be multiple users.

Thanks to all.

Jorge Biquez
 
C

CM

At 05:02 p.m. 04/12/2010, you wrote:





Hello all.

Ok. sqlite3 seems like it is the best option since it is part of
python already.... you are right of course.

I do not see a good reason for not using Sqlite3 BUT if for some
reason would not be an option.... what plain schema of files would
you use? I am sorry to insist. I do not know much about the size tha
using Sqlite adds to the application but the idea is that the ,
application, single user for desktop yes, will be the smallest it can
be since the idea is to distribute the executable only. (all this is
to have a prototype I have done in other language)

SQlite itself is around 300 kilobytes. That's negligible. It is also
already in Python, so you'd have to purposefully exclude it in
creating your executable to save those 300 kb and thus the 1/13th of a
second additional time it would take average (3.9 MB/s) users to
download your app if it were included.
 
J

Jorge Biquez

Hello all.

Understood perfectly.

Will forget other alternatives. Sqlite3 is the
best option. Thanks for the explanation and time.

Sqlite for single user and Postgresql will be the choice.

Thanks all.

Take care

Jorge Biquez
 
M

Monte Milanuk

I do not see a good reason for not using Sqlite3 BUT if for some reason
would not be an option.... what plain schema of files would you use?

Would shelve work?
 
M

Monte Milanuk

I do not see a good reason for not using Sqlite3 BUT if for some reason
would not be an option.... what plain schema of files would you use?

Would shelve work?
 
J

John Nagle

Would shelve work?

There are some systems for storing key-value pairs in files.

Underneath "shelve" is some primitive database, dbm, gdbm or bsddb.
"bsddb" is deprecated and was removed from Python 3.x. "dbm" has
some classic problems. "gdbm" is an improved version of "dbm".
None of these handle access from multiple processes, or crash
recovery. We're looking at 1979 technology here.

SQLite works right when accessed from multiple processes. SQLite
is the entry-level database technology for Python today. It handles
the hard cases, like undoing transactions after a crash and
locking against multiple accesses. Lookup performance is good;
simultaneous update by multiple processes, though, is not so
good. When you have a web site that has many processes hitting
the same database, it's time to move up to MySQL or Postgres.

There's a lot of interest in "non-SQL" databases for very
large distributed systems. You worry about this if you're Facebook
or Google, or are running a big game server farm.

John Nagle
 
L

Lie Ryan

I do not see a good reason for not using Sqlite3 BUT if for some reason
would not be an option.... what plain schema of files would you use?

Assuming you don't want SQL, you can use filesystem-based database. Most
people doesn't realize that a filesystem is essentially a database (of
files) and a file explorer is a (sort of) DBMS. It is relatively easy to
create a robust (as robust as the filesystem) and fast (as fast as the
filesystem) database system by using folders and files (and optionally
hard and symbolic links) to store data in hierarchical topology.
 
R

Roy Smith

Lie Ryan said:
Assuming you don't want SQL, you can use filesystem-based database. Most
people doesn't realize that a filesystem is essentially a database (of
files) and a file explorer is a (sort of) DBMS. It is relatively easy to
create a robust (as robust as the filesystem) and fast (as fast as the
filesystem) database system by using folders and files (and optionally
hard and symbolic links) to store data in hierarchical topology.

Another possibility is one of the new breed of non-relational databases.
We've been using MongoDB (http://www.mongodb.org/) and so far are happy
with it. You can find a bunch of other alternatives in Wikipedia's
NoSQL article.
 
D

Dan Stromberg

   There are some systems for storing key-value pairs in files.

   Underneath "shelve" is some primitive database, dbm, gdbm or bsddb..
"bsddb" is deprecated and was removed from Python 3.x.  "dbm" has
some classic problems.  "gdbm" is an improved version of "dbm".
None of these handle access from multiple processes, or crash
recovery.  We're looking at 1979 technology here.

  SQLite works right when accessed from multiple processes.  SQLite
is the entry-level database technology for Python today.  It handles
the hard cases, like undoing transactions after a crash and
locking against multiple accesses.  Lookup performance is good;
simultaneous update by multiple processes, though, is not so
good.  When you have a web site that has many processes hitting
the same database, it's time to move up to MySQL or Postgres.

  There's a lot of interest in "non-SQL" databases for very
large distributed systems.  You worry about this if you're Facebook
or Google, or are running a big game server farm.

SQLite isn't exactly no SQL.

I've used the bsddb and gdbm modules quite a bit. I've found that
bsddb tables tend to get corrupted (whether used from CPython or C),
EG when a filesystem fills up. I quite like the gdbm module though,
and have been using it in my current project.

If you find that converting your database keys and values to/from
strings is expensive, you could check out
http://stromberg.dnsalias.org/~dstromberg/cachedb.html which is a
caching wrapper around gdbm and other single-table database interfaces
supporting the same API.

As far as multiple processes, IINM, gdbm supports a single writer and
multiple readers.
 
S

Steve Holden

SQlite itself is around 300 kilobytes. That's negligible. It is also
already in Python, so you'd have to purposefully exclude it in
creating your executable to save those 300 kb and thus the 1/13th of a
second additional time it would take average (3.9 MB/s) users to
download your app if it were included.

Just as a matter of interest where do you get the information that the
average user has a 3.9 MB/s path to the Internet?

regards
Steve
 
D

drygal

Hello all.

Newbie question. Sorry.

As part of my process to learn python I am working on two personal
applications. Both will do it fine with a simple structure of data
stored in files. I now there are lot of databases around I can use
but I would like to know yoor advice on what other options you would
consider for the job (it is training so no pressure on performance).
One application will run as a desktop one,under Windows, Linux,
Macintosh, being able to update data, not much, not complex, not many
records. The second application, running behind  web pages, will do
the same, I mean, process simple data, updating showing data. not
much info, not complex. As an excersice it is more than enough I
guess and will let me learn what I need for now.
Talking with a friend about what he will do (he use C only) he
suggest to take a look on dBase format file since it is a stable
format, fast and the index structure will be fine or maybe go with BD
(Berkley) database file format (I hope I understood this one
correctly) . Plain files it is not an option since I would like to
have option to do rapid searches.

What would do you suggest to take a look? If possible available under
the 3 plattforms.

Thanks in advance for your comments.

Jorge Biquez

Pickle dictionary object perhaps?
http://docs.python.org/library/pickle.html
 
C

CM

Just as a matter of interest where do you get the information that the
average user has a 3.9 MB/s path to the Internet?

First, I should have specified the average *U.S.* user, and in so
doing realized how dumbly "own-nation-centric" that was. My
apologies. But I got it from this:
http://arstechnica.com/telecom/news/2010/01/us-broadband-still-lagging-in-speed-and-penetration.ars

No idea how accurate that is, and I didn't break that into download
vs. upload speeds. It puts the U.S. 18th in the world rankings of
average connection speed.

Also, for what it's worth, I noticed in making an .exe yesterday that
sqlite3.dll is listed as 843 kB. That may be because it includes
pysqlite as well, I don't know. In any case, that is at least
appreciably higher than the 300 kb I listed above, though not too
bad. If someone is trying to squeak out every last bit of savings in
making their .exe small, then sqlite.dll is something they could
consider excluding, if they can. But for most cases, no user will
care about the extra size.

Che
 
D

Dave Angel

On Dec 6, 2:17 am, Steve Holden<[email protected]> wrote:

First, I should have specified the average *U.S.* user, and in so
doing realized how dumbly "own-nation-centric" that was. My
apologies. But I got it from this:
http://arstechnica.com/telecom/news/2010/01/us-broadband-still-lagging-in-speed-and-penetration.ars
<snip>

You're confusing megabits with megabytes. The article says the average
in the US is 3.9Mb/s, which is only 0.41MB/s. Still seems high to me,
but I have no statistics.

DaveA
 
D

Deadly Dirk

Hello all.

Newbie question. Sorry.

As part of my process to learn python I am working on two personal
applications. Both will do it fine with a simple structure of data
stored in files. I now there are lot of databases around I can use but I
would like to know yoor advice on what other options you would consider
for the job (it is training so no pressure on performance). One
application will run as a desktop one,under Windows, Linux, Macintosh,
being able to update data, not much, not complex, not many records. The
second application, running behind web pages, will do the same, I mean,
process simple data, updating showing data. not much info, not complex.
As an excersice it is more than enough I guess and will let me learn
what I need for now. Talking with a friend about what he will do (he use
C only) he suggest to take a look on dBase format file since it is a
stable format, fast and the index structure will be fine or maybe go
with BD (Berkley) database file format (I hope I understood this one
correctly) . Plain files it is not an option since I would like to have
option to do rapid searches.

What would do you suggest to take a look? If possible available under
the 3 plattforms.

Thanks in advance for your comments.

Jorge Biquez

Well, two NoSQL databases that I have some experience with are MongoDB
and CouchDB. The choice among them depends on your application. CouchDB
is an extremely simple to set up, it is all about the web interface, as a
matter of fact it communicates with the outside world using HTTP
protocol, returning JSON objects. You can configure it using curl. It is
also extremely fast but it doesn't allow you to run ad hoc queries. You
have to create something called a "view". This is more akin to what
people in the RDBMS world call a "materialized view". Views are created
by running JavaScript function on every document in the database. Results
are stored in B*Tree index and then modified as documents are being
inserted, updated or deleted. It is completely schema free, there are no
tables, collections or "shards". The primary language for programming
Couch is JavaScript.
The same thing applies to MongoDB which is equally fast but does allow ad
hoc queries and has quite a few options how to do them. It allows you to
do the same kind of querying as RDBMS software, with the exception of
joins. No joins. It also allows map/reduce queries using JavaScript and
is not completely schema free. Databases have sub-objects called
"collections" which can be indexed or partitioned across several machines
("sharding"), which is an excellent thing for building shared-nothing
clusters. Collections can be indexed and can be aggregated using
JavaScript and Google's map/reduce. Scripting languages like Python are
very well supported and linked against MongoDB, which tends to be faster
then communicating using HTTP. I find MongoDB well suited for what is
traditionally known as data warehousing.
Of course, traditional RDBMS specimens like MySQL, PostgreSQL, Firebird,
Oracle, MS SQL Server or DB2 still rule supreme and most of the MVC tools
like Django or Turbo Gears are made for RDBMS schemas and can read things
like the primary or foreign keys and include that into the application.
In short, there is no universal answer to your question. If prices are a
consideration, Couch, Mongo, MySQL, PostgreSQL, Firebird and SQL Lite 3
all cost about the same: $0. You will have to learn significantly less
for starting with a NoSQL database, but if you need to create a serious
application fast, RDBMS is still the right answer. You may want to look
at this Youtube clip entitled "MongoDB is web scale":

 
R

Roy Smith

Deadly Dirk said:
The same thing applies to MongoDB which is equally fast but does allow ad
hoc queries and has quite a few options how to do them. It allows you to
do the same kind of querying as RDBMS software, with the exception of
joins. No joins.

Well, sort of. You can use forEach() to get some join-like
functionality. You don't get the full join optimization that SQL gives
you, but at least you get to do some processing on the server side so
you don't have to ship 40 gazillion records over the network to pick the
three you wanted.
It also allows map/reduce queries using JavaScript and
is not completely schema free.

What do you mean by "not completely schema free"?
Databases have sub-objects called "collections" which can be indexed
or partitioned across several machines ("sharding"), which is an
excellent thing for building shared-nothing clusters.

We've been running Mongo 1.6.x for a few months. Based on our
experiences, I'd say sharding is definitely not ready for prime time.
There's two issues; stability and architecture.

First, stability. We see mongos (the sharding proxy) crash a couple of
times a week. We finally got the site stabilized by rigging upstart to
monitor and automatically restart mongos when it crashes. Fortunately,
mongos crashing doesn't cause any data loss (at least not that we've
noticed). Hopefully this is something the 10gen folks will sort out in
the 1.8 release.

The architectural issues are more complex. Mongo can enforce uniqueness
on a field, but only on non-sharded collection. Security (i.e. password
authentication) does not work in a sharded environment. If I understand
the release notes correctly, that's something which may get fixed in
some future release.
Scripting languages like Python are
very well supported and linked against MongoDB

The Python interface is very nice. In some ways, the JS interface is
nicer, only because you can get away with less quoting, i.e.

JS: find({inquisition: {$ne: 'spanish'}}
Py: find({'inquisition': {'$ne': 'spanish'}}

The PHP interface is (like everything in PHP), sucky:

PHP: find(array('inquisition' => array('$ne' => 'spanish'))

The common thread here is that unlike SQL, you're not feeding the
database a string which it parses, you're feeding it a data structure.
You're stuck with whatever data structure syntax the host language
supports. Well, actually, that's not true. If you wanted to, you could
write a front end which lets you execute:

"find where inquisition != spanish"

and have code to parse that and turn it into the required data
structure. The odds of anybody doing that are pretty low, however. It
would just feel wrong. In much the same way that SQLAlchemy's
functional approach to building a SQL query just feels wrong to somebody
who knows SQL.
I find MongoDB well suited for what is
traditionally known as data warehousing.

I'll go along with that. It's a way to build a fast (possibly
distributed, if they get sharding to work right) network datastore with
some basic query capability. Compared to SQL, you end up doing a lot
more work on the application side, and take on a lot more of the
responsibility to enforce data integrity yourself.
You may want to look
at this Youtube clip entitled "MongoDB is web scale":


That's the funniest thing I've seen in a long time. The only sad part
is that it's all true.

There are some nice things to NO-SQL databases (particularly the
schema-free part). A while ago, we discovered that about 200 of the
300,000 documents in one of our collections were effectively duplicates
of other documents ("document" in mongo-speak means "record" or perhaps
"row" in SQL-speak). It was trivial to add "is_dup_of" fields to just
those 200 records, and a little bit of code in our application to check
the retrieved documents for that field and retrieve the pointed-to
document. In SQL, that would have meant adding another column, or
perhaps another table. Either way would have been far more painful than
the fix we were able to do in mongo.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,185
Members
46,736
Latest member
AdolphBig6

Latest Threads

Top