S
Sam
Is MySQLdb the recommended python module for SQL database access? Are there other modules? What I want in a module is to be able to write readable and maintainable code.
Is MySQLdb the recommended python module for SQL database access? Are there other modules? What I want in a module is to be able to write readable and maintainable code.
As Chris said, if your needs are simple, use SQLite back-end. It's probably
already installed on your computer and Python has a nice interface to it in
its standard library.
I just checked in the Python sources and apparently you're right about
SQLite3. The Python distribution includes pysqlite which seems to be a
self-contained SQLite engine. No external dependencies. Sorry for the
confusion.
The biggest downside of SQLite3 is concurrency. I haven't dug into the
exact details of the pager system and such, but it seems to be fairly
coarse in its locking. Also, stuff gets a bit complicated when you do
a single transaction involving multiple files. So if you have lots of
processes writing to the same set of SQLite tables, you'll see pretty
poor performance.
ChrisA
i simply tested running 2 independent processes started at same time in
parallel towards same sqlite database and never get 20000 in that row
though used exclusive lock on DB. might be i did something wrong.
c.execute("UPDATE msgid SET val={0} WHERE id = 0".format(tint))
conn = sqlite3.connect('example.db', 10.0, True, "EXCLUSIVE")
c = conn.cursor()
c.execute("SELECT val FROM msgid WHERE id = 0")
tint = int(c.fetchone()[0]) + 1
c.execute("UPDATE msgid SET val={0} WHERE id = 0".format(tint))
conn.commit()
conn.close()
Thanks
Also, you're connecting and disconnecting repeatedly... oh, I see why
it didn't work when I tried. You're also using two completely
different database names: 'messageid.db' which is named in a constant
and in the default argument, and 'example.db' which is what you
actually use. Should have a single source of truth, otherwise you
confuse the people who might otherwise be able to test your code
Anyway. This code is extremely inefficient:
conn = sqlite3.connect('example.db', 10.0, True, "EXCLUSIVE")
c = conn.cursor()
c.execute("SELECT val FROM msgid WHERE id = 0")
tint = int(c.fetchone()[0]) + 1
c.execute("UPDATE msgid SET val={0} WHERE id = 0".format(tint))
conn.commit()
conn.close()
Much more common would be to retain a connection and repeatedly
perform queries. Then you won't need to open it EXCLUSIVE, and you can
simply query, update, and then commit (all your locks should be
released at the commit). Do that, and you should see at least
reasonable performance, plus everything should work correctly.
ChrisA
i did it just to test sqlite3 behavior and actually test was related to
simulation of unique incremental sequence number/counter for
independently spawned tasks accessing counter in non deterministic manner.
Already installed? I thought the point of SQLite3 being in the
Python stdlib was that Python actually included the entire engine
Broad recommendation: Single application, tiny workload, concurrency
not an issue, simplicity desired? Go SQLite. Big complex job, need
performance, lots of things reading and writing at once, want
networked access? Go PGSQL. And don't go MySQL if PG is an option.
And definitely don't go for a non-free option (MS-SQL, DB2, etc)
unless you've looked into it really closely and you are absolutely
thoroughly *sure* that you need that system (which probably means you
need your app to integrate with someone else's, and that other app
demands one particular database).
Any opinion on Firebird? Just curiosity given how often the advice
seems to be "start with SQLite, avoid MySQL, end with PostgreSQL"
No, because I've never used it. Has anyone here? What are its
strengths and weaknesses?
I agree 100% with this. And speaking as an ex Oracle and DB2 DBA - not to mention MS-SQL (spit), with which I occasionally had to dabble, avoid them like the plague unless circumstances dictate.Chris said:Broad recommendation: Single application, tiny workload, concurrency
not an issue, simplicity desired? Go SQLite. Big complex job, need
performance, lots of things reading and writing at once, want
networked access? Go PGSQL. And don't go MySQL if PG is an option.
And definitely don't go for a non-free option (MS-SQL, DB2, etc)
unless you've looked into it really closely and you are absolutely
thoroughly *sure* that you need that system (which probably means you
need your app to integrate with someone else's, and that other app
demands one particular database).
I agree 100% with this. And speaking as an ex Oracle and DB2 DBA -
not to mention MS-SQL (spit), with which I occasionally had to dabble,
avoid them like the plague unless circumstances dictate.
I agree 100% with this. And speaking as an ex Oracle and DB2 DBA - not to mention MS-SQL (spit), with which I occasionally had to dabble, avoid them like the plague unless circumstances dictate.
What is about clustering? Do we have such option for free alternatives?
Thanks
PostgreSQL has replication in-built now, which will do most forms of
clustering. With some third-party software like Slony (also free), you
can do even more (including replicating between different PostgreSQL
versions, so you can upgrade progressively without any downtime; PG's
internal replication has tight restrictions on that). I've used PG's
streaming replication to fairly good effect. You do need some kind of
system to decide when to promote a slave to master, though - my boss
had this weird idea that each node had to be a perfect peer with no
external authority [1], which led to unsolvable problems, but if you
have an external system that declares which of several slaves should
be promoted, it's pretty easy to do. I could whip you up a
proof-of-concept in an hour, probably; just needs a heartbeat script
and some way of signalling them to fail over to the new master.
Clustering for performance, as opposed to reliability, is a bit
trickier. You can do read-only queries on slaves (so if you have a
many-readers-few-writers model, this can work nicely), but otherwise,
you probably need some third-party middleware. I haven't looked into
that side of things. Ultimately your biggest bottleneck is going to be
locking, which fundamentally has to be done in one place... or else
you have to deal with merge conflicts (the bane of true multi-master
replication).
So, it all depends on what you need to accomplish, and how much work
you're willing to do. Postgres offers a particular set of primitives
(including replication, promotion of a slave to master, etc), and lets
you trigger things from scripts (execute "pg_ctl promote" to make this
node become master). Advanced logic can be done by writing a Python
script that edits config files, runs programs, sends Unix signals,
whatever. There are pay-for Postgres support companies, too, if you
need that sort of thing.
tl;dr: Yeah, you can do that too.
ChrisA
[1] He had a weird issue with the concept of authority, actually. I
think his dislike of any form of government polluted his thinking so
he wouldn't accept even the IT sense of the word "authority". Never
mind that that's the best way to solve a lot of problems. But I
digress.
Want to reply to this thread or ask your own question?
You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.