DBAPI Paramstyle

B

Bob Parnes

The following script is a one person's comparison of three methods for
accessing a postgresql database using psycopg on a debian computer
running python2.3. Following it are the results of running it six
times.

===========================
from time import time, clock
import psycopg

MAX_COUNT = 50000

def pyMethod():
for n in range(MAX_COUNT):
curs.execute('''SELECT %s;''' % n)

def formatMethod():
for n in range(MAX_COUNT):
curs.execute('''SELECT %s;''', [n])

def pyformatMethod():
for n in range(MAX_COUNT):
curs.execute('''SELECT %(n)s;''', {'n':n})

conn = psycopg.connect(host='localhost', database='template1')
curs = conn.cursor()

for method, func in (('Python method: %f, %f', pyMethod),
('Format method: %f, %f', formatMethod),
('Pyformat method: %f, %f', pyformatMethod)):
startTime = time()
startClock = clock()
func()
print method % ((time() - startTime), (clock() - startClock))
===========================

bp@debian:~/demo$ ./pyformatTst.py
Python method: 9.288770, 3.55000
Format method: 9.457663, 3.820000
Pyformat method: 9.446390, 3.700000

bp@debian:~/demo$ ./pyformatTst.py
Python method: 9.152173, 3.400000
Format method: 9.314743, 3.760000
Pyformat method: 9.329343, 3.840000

bp@debian:~/demo$ ./pyformatTst.py
Python method: 9.262013, 3.490000
Format method: 9.344197, 3.570000
Pyformat method: 9.402157, 3.500000

bp@debian:~/demo$ ./pyformatTst.py
Python method: 9.170817, 3.860000
Format method: 9.509313, 3.260000
Pyformat method: 9.380756, 3.770000

bp@debian:~/demo$ ./pyformatTst.py
Python method: 9.271831, 3.540000
Format method: 9.375170, 3.650000
Pyformat method: 9.426898, 3.780000

bp@debian:~/demo$ ./pyformatTst.py
Python method: 9.192097, 3.720000
Format method: 9.244554, 3.690000
Pyformat method: 9.368582, 3.760000

Similar results occurred with an actual database table.

I must be missing something, so perhaps someone can explain
the benefit of a paramstyle over the usual Python formatting
style and maybe suggest a test to show it. Thanks.

Bob Parnes
 
F

Fredrik Lundh

Bob said:
I must be missing something, so perhaps someone can explain
the benefit of a paramstyle over the usual Python formatting
style and maybe suggest a test to show it. Thanks.

set the parameter to "0; DROP DATABASE template1;" and see what
happens.

or set it to os.urandom(1000) and run your test a couple of times to see
what happens.

</F>
 
B

Bob Parnes

set the parameter to "0; DROP DATABASE template1;" and see what
happens.

or set it to os.urandom(1000) and run your test a couple of times to see
what happens.

Thanks for the suggestion. My system does not appear to contain an
os.urandom() method. It has a /dev/urandom device, but I don't know how to
use it for this purpose, except perhaps to select the first byte that it
produces.

I have a mediocre talent at programming, which is why I chose python.
For me it was a good choice. I note this so that I hope you understand why
I say that I don't know what you are driving at. My understanding is that a
paramstyle is more efficient than the traditional python approach for repeated
use of a query. If so, then I do not see how the choice of a parameter is
relevant. If it is more efficient only in a specific abstract case, then
one would have to look for other reasons to use it in a practical application.

Bob Parnes
 
T

Tim Roberts

Bob Parnes said:
I have a mediocre talent at programming, which is why I chose python.
For me it was a good choice. I note this so that I hope you understand why
I say that I don't know what you are driving at. My understanding is that a
paramstyle is more efficient than the traditional python approach for repeated
use of a query. If so, then I do not see how the choice of a parameter is
relevant. If it is more efficient only in a specific abstract case, then
one would have to look for other reasons to use it in a practical application.

In theory, using a paramstyle allows the query to be sent to the SQL
database backend and compiled like a program. Then, successive uses of the
same query can be done by sending just the parameters, instead of sending
the entire query string to be parsed and compiled again and again. This is
commonly done with large production databases like SQL Server and Oracle.
For a complicated query, it can be a significant time savings.

However, to the best of my knowledge, none of the Python dbabi
implementations actually do that. So, the primary benefit of the
paramstyle method is that the database provider inserts whatever quoting is
required; you don't have to remember to put single quotes around the
arguments, and protect single quotes within the arguments by doubling them,
and so on.
 
W

woodsplitter

Tim said:
In theory, using a paramstyle allows the query to be sent to the SQL
database backend and compiled like a program. Then, successive uses of the
same query can be done by sending just the parameters, instead of sending
the entire query string to be parsed and compiled again and again. This is
commonly done with large production databases like SQL Server and Oracle.
For a complicated query, it can be a significant time savings.

However, to the best of my knowledge, none of the Python dbabi
implementations actually do that.

kinterbasdb does.
 
S

Scott David Daniels

Tim said:
In theory, using a paramstyle allows the query to be sent to the SQL
database backend and compiled like a program. Then, successive uses of the
same query can be done by sending just the parameters, instead of sending
the entire query string to be parsed and compiled again and again.
> This is commonly done with large production databases like SQL Server
> and Oracle. For a complicated query, it can be a significant time
> savings.

Actually, the slow part is building a query execution plan (deciding
which indices to use, join orders, and such). Identifying what is
being asked for is trivially fast (compared to any I/O). The DB
optimizer works with estimates of running time for many of the possible
query plans, and chooses the cheapest of those -- that combinatorial
problem is how a DB can chew up CPU time (or save it in some cases).
However, to the best of my knowledge, none of the Python dbabi
implementations actually do that.
This cacheing need not occur anywhere before the DB. There are database
systems (DB2 is one) that cache plans by the text of the query. If your
query is an exact match with a previously optimized plan (and it hasn't
been chucked out of the cache), the plan is retrieved.
So, the primary benefit of the paramstyle method is that the database
> provider inserts whatever quoting is required; you don't have to
> remember to put single quotes around the arguments, and protect
> single quotes within the arguments by doubling them, and so on.

This _is_ a benefit.

Another benefit (greater in my estimation) ls that you are separating
"code" and data to make a more readable (and malleable) query. If
things get slow and database access is "at fault", you can carry
your SQL to a DB expert (sometimes one on site) who can check it to
help speed up your system. The DB expert is an SQL dweeb, and will
appreciate seeing the SQL done in a straightforward manner. She or he
may be able to rewrite the query to improve your access. Perhaps
several related queries can be effectively combined (but again, you
are talking about an expert in SQL/DB -- they'll need to see all
the queries easily in order to help. Perhaps the data access
pattern will suggest a change in DB indices (in which case the
database administrator can speed up your program without your
changing any of your code).

You could also be changing the format used to send the data to the
database (of that I'm not quite as sure).

--Scott David Daniels
(e-mail address removed)
 
A

Andy Dustman

Tim said:
In theory, using a paramstyle allows the query to be sent to the SQL
database backend and compiled like a program. Then, successive uses of the
same query can be done by sending just the parameters, instead of sending
the entire query string to be parsed and compiled again and again. This is
commonly done with large production databases like SQL Server and Oracle.
For a complicated query, it can be a significant time savings.

However, to the best of my knowledge, none of the Python dbabi
implementations actually do that.

mx.ODBC does, since it is an ODBC implementation. I would be very
surprised if the Oracle adapter did not. MySQLdb does not yet, but
probably will by the end of summer (with MySQL-4.1 or newer).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,818
Latest member
Brigette36

Latest Threads

Top