sqlstring -- a library to build a SELECT statement

G

grunar

After some thought on what I need in a Python ORM (multiple primary
keys, complex joins, case statements etc.), and after having built
these libraries for other un-named languages, I decided to start at the
bottom. What seems to plague many ORM systems is the syntactic
confusion and string-manipulation required to build the SQL Statements.
If you want to do a Left Outer Join, support nested functions, and a
nested conditional clause, you'd be building more of a string library
than an ORM. Let's not even mention syntactical differences between
databases, data types, and such.

My solution is sqlstring. A single-purpose library: to create SQL
statement objects. These objects (such as sqlstring.Select), represent
complex SQL Statements, but as Python objects. The benefit is that you
can, at run-time, "build" the statement pythonically, without
getting bogged down in String Manipulation. The theory is that once in
use, things that were complex (string magic) become simpler, and allow
the program to worry about higher-level issues.

An Example:
SELECT
person.*
FROM
[person] personSELECT
person.*
FROM
[person] person
WHERE person.first_name = 'Smith'< < model.address ** (\
model.address.party_id == model.person.party_id)
SELECT
party_id party_id,
first_name first_name,
address.*
FROM
[person] person
LEFT OUTER JOIN [address] address ON
address.party_id = person.party_id
WHERE person.first_name = 'Smith'

Things can get much more interesting than this, including nested
sub-selects (anywhere), conditional groups (and/or groups nested using
& and |) and even CASE statements.

Some of this stuff has been around for a while (using "magic" objects
to build where clauses, etc.). But I'm trying to take it all the
way--to a legit Select statement.

While still in the early stages, it does work with a great many sql
statements, as seen in the test suite. Currently supported are CASE
statements, Nested conditional clauses, nested queries and most join
types. At this point, I'm interested in getting feedback from the
community on several fronts:

1. The Operator Overload model. I've chosen to overload Python's
operators to give a short-hand syntax to most of the things you'd
want to do with a select statement. The rest are accessable via
methods. Currently ** is the "where" operator, // is the "in"
operator, % the "like" operator and ^ aliases columns. Other
overloads are as you'd expect- + / - * == all result in Expression
Objects that dish out the right SQL string. The question is, is the
"leap" in syntax to confusing? Is there a cleaner way to do this?
(Functions for example)
2. How to best add further sql function support? Adding magic
callable objects to columns came to mind, but this has it's own set
of issues. I'm leaning towards a magic object in the sqlstring
module. For example:

sqlstring.F.substring(0, 4, person.first_name)

would result in: substring(0, 4, person.first_name). the F object
could be put in the local scope for short-hand.
3. I'm undecided on how best to handle database specific
overwrites. I want this to be as easy as possible. I'm thinking about
subclassing Expressions with a naming scheme on the Sub-Class (such as
CaseExpression_oracle). Then the __init__ factory could dish out the
right version of the object based on the the requestor. This brings up
lots of questions, such as how to support multiple types of databases
at the same time.

Eventually the library should include all of the basic SQL
Statements, including UPDATE, INSERT and CREATE statements. This is
mostly for completeness, though. SELECT statements tend to be the most
complex.

The library can be downloaded at http://betur.net/download.php

Any other thoughts or comments are very much appreciated
 
J

Jason Stitt

On Oct 19, 2005, at 9:18 PM, (e-mail address removed) wrote:

My solution is sqlstring. A single-purpose library: to create SQL
statement objects. These objects (such as sqlstring.Select), represent
complex SQL Statements, but as Python objects.

<snip>

First of all, I like this idea. I've been thinking about doing
something similar but am stuck with SQLObject for the moment. The
ability to construct complex expressions in pieces and then mix and
match them would be killer.

I think some operator overloading, especially the obvious cases like
==, is cleaner than using only functions because it lets you order
things normally. But some of the operator choices are non-intuitive.
Personally, I would make something like 'alias' a function or class,
rather than overloading XOR. Not sure about ** for where.

Using // for 'in' looks really weird, too. It's too bad you can't
overload Python's 'in' operator. (Can you? It seems to be hard-coded
to iterate through an iterable and look for the value, rather than
calling a private method like some other builtins do.)

Also, you have to be really careful of operator precedence.

Have you ever used Pyparsing? Its use of a combination of classes and
operator overloading works pretty well.

For SQL function calls, don't you also want to support stored
procedures? In that case, you don't want pre-set magic functions so
much as a function that takes a string parameter for a function name
and then a list of function parameters.
< < model.address ** (\
model.address.party_id == model.person.party_id)
SELECT
party_id party_id,
first_name first_name,
address.*
FROM
[person] person
LEFT OUTER JOIN [address] address ON
address.party_id = person.party_id
WHERE person.first_name = 'Smith'

See, this is where I'm not sure about operator precedence. If <<
binds tighter than **, it works, because the << operator would make a
'join' object and then the ** could be interpreted as an 'on' clause.
But if ** binds tighter you get an extra 'where' clause on the
address table, and then a join...

Some functions might be more workable.

- Jason
 
G

grunar

Jason said:
I think some operator overloading, especially the obvious cases like
==, is cleaner than using only functions because it lets you order
things normally. But some of the operator choices are non-intuitive.
Personally, I would make something like 'alias' a function or class,
rather than overloading XOR. Not sure about ** for where.
My strategy is to do both. Have a "where" method that could be called
instead of ** (ie. person.where(person.last_name=="smith"), but also
allow for the ** syntax. After using it for a while, I'm finding the
** and ^ for alias very clear. Alias function brings up the issue of
name-space (since table.alias could be a column if not a special
method.) I'm assuming people don't use where, select, or other SQL key
words as column names, and if they do, they have a table["column"]
syntax to fall back on. But I digress. A method makes sense, though.
Using // for 'in' looks really weird, too. It's too bad you can't
overload Python's 'in' operator. (Can you? It seems to be hard-coded
to iterate through an iterable and look for the value, rather than
calling a private method like some other builtins do.)

// was a bit of a stretch. I'd initially thought it for the "where"
clause, becuase it's lower precedence than ** (I think), and really
late at night // kind of looks like a W. I decided against it because
it looks to close to a comment in some other languages.

Python "in" clause doesn't seem exploitable in any way--probably a good
thing. I did add a "in_" method (name is arguable), which does the
same thing, also a not_in.
Have you ever used Pyparsing? Its use of a combination of classes and
operator overloading works pretty well.
I took a look at it. Seems like a happy balance there--with the
overloading. <a
href="http://www.aminus.org/blogs/index.php/fumanchu/2005/08/11/where_dejavu_fits_in_the_orm_cosmos">Dejavu</a>
is another cool solution--using Lambda expressions. But it goes into
scary bytecode stuff (though it's well contained).
For SQL function calls, don't you also want to support stored
procedures? In that case, you don't want pre-set magic functions so
much as a function that takes a string parameter for a function name
and then a list of function parameters.
This is what I had in mind (echo.py in the distribution). The only
issue becomes knowing when to quote the parameters (is a string always
a string?). I 've ended up quoting everything, and forcing the user to
supply table.column notation if they don't want it quoted.
print person_smith("party_id", "first_name") \
< < model.address ** (\
model.address.party_id == model.person.party_id)
SELECT
party_id party_id,
first_name first_name,
address.*
FROM
[person] person
LEFT OUTER JOIN [address] address ON
address.party_id = person.party_id
WHERE person.first_name = 'Smith'

See, this is where I'm not sure about operator precedence. If <<
binds tighter than **, it works, because the << operator would make a
'join' object and then the ** could be interpreted as an 'on' clause.
But if ** binds tighter you get an extra 'where' clause on the
address table, and then a join...
There's a little dark magic going on with the precedence here. The **
is higher precedence, so it happens first, and is applied to the
address table object, which is then joined into person (with the <<
operator), and pulling in the existing where on address. This sounds
bad, but it allows for where clauses to trickle up through the python
expression, which is how I get a + b >> c to behave as expected.


Thanks for the positive feedback!

Runar
 
S

Steven Bethard

Jason said:
Using // for 'in' looks really weird, too. It's too bad you can't
overload Python's 'in' operator. (Can you? It seems to be hard-coded
to iterate through an iterable and look for the value, rather than
calling a private method like some other builtins do.)
[snip]

Python "in" clause doesn't seem exploitable in any way

Sure it is. Just override __contains__.

STeVe
 
F

fumanchu

These objects (such as sqlstring.Select), represent
complex SQL Statements, but as Python objects. The benefit is that you
can, at run-time, "build" the statement pythonically, without
getting bogged down in String Manipulation. The theory is that once in
use, things that were complex (string magic) become simpler, and allow
the program to worry about higher-level issues.
...
Some of this stuff has been around for a while (using "magic" objects
to build where clauses, etc.). But I'm trying to take it all the
way--to a legit Select statement.

While still in the early stages, it does work with a great many sql
statements, as seen in the test suite. Currently supported are CASE
statements, Nested conditional clauses, nested queries and most join
types. At this point, I'm interested in getting feedback from the
community on several fronts:

1. The Operator Overload model. I've chosen to overload Python's
operators to give a short-hand syntax to most of the things you'd
want to do with a select statement. The rest are accessable via
methods. Currently ** is the "where" operator, // is the "in"
operator, % the "like" operator and ^ aliases columns. Other
overloads are as you'd expect- + / - * == all result in Expression
Objects that dish out the right SQL string. The question is, is the
"leap" in syntax to confusing? Is there a cleaner way to do this?
(Functions for example)

The big operator question will be: how will "and" and "or" be
implemented? This is always a sticking point because of Python's
short-circuiting behaviors regarding them (the resultant bytecode will
include a JUMP).

An alternative is to stuff the representation into a string, which can
then be parsed however one likes.

For Dejavu (http://projects.amor.org/dejavu), I didn't do either
one--instead I used lambdas to express the where clause, so that:

f = logic.Expression(lambda x: ('Rick' in x.Name) or
(x.Birthdate == datetime.date(1970, 1, 1)))
units = sandbox.recall(Person, f)

might produce, in the bowels of the ORM:

"SELECT * FROM [Person] WHERE [Person].[Name] Like '%Rick%' or
[Person].[Birthdate] = #1/1/1970#"

Note that the tablename is provided in a separate step. The translation
is based on the codewalk.py and logic.py modules, which are in the
public domain if you want to use any part of them. See
http://projects.amor.org/dejavu/svn/trunk/
2. How to best add further sql function support? Adding magic
callable objects to columns came to mind, but this has it's own set
of issues. I'm leaning towards a magic object in the sqlstring
module. For example:

sqlstring.F.substring(0, 4, person.first_name)

would result in: substring(0, 4, person.first_name). the F object
could be put in the local scope for short-hand.

This is a hard problem, since your sqlstring module doesn't control the
result sets, and so can't provide fallback mechanisms if a given
database does not support a given function (or operator, or minute
detail of how a function or operator works; for example, LIKE is
case-insensitive in MS SQL Server but case-sensitive in PostgreSQL). If
you're going to use subclasses to handle "database-specific overwrites"
(below), then you'll probably want to stick such functions in that base
class (and override them in subclasses), as well.
3. I'm undecided on how best to handle database specific
overwrites. I want this to be as easy as possible. I'm thinking about
subclassing Expressions with a naming scheme on the Sub-Class (such as
CaseExpression_oracle). Then the __init__ factory could dish out the
right version of the object based on the requestor. This brings up
lots of questions, such as how to support multiple types of databases
at the same time.

See the Adapter and SQLDecompiler classes in
http://projects.amor.org/dejavu/svn/trunk/storage/db.py (and the
store*.py modules) for some examples of using subclassing to produce
database-specific syntax. There, it's one Adapter class per supported
DB-type; you might consider keeping the Expression objects themselves
free from SQL, and transform the Expressions to SQL in a separate
class, which you could then subclass.

Just a couple of thoughts from someone who's done the
string-manipulation dance once before. ;) I must admit I've always
punted when it came time to produce complex joins or CASE
statements--Dejavu simply doesn't provide that level of expressivity,
preferring instead to hide it behind the object layer.


Robert Brewer
System Architect
Amor Ministries
(e-mail address removed)
 
S

Steve Holden

Jason said:
On Oct 19, 2005, at 9:18 PM, (e-mail address removed) wrote:




<snip>

First of all, I like this idea. I've been thinking about doing
something similar but am stuck with SQLObject for the moment. The
ability to construct complex expressions in pieces and then mix and
match them would be killer.

I think some operator overloading, especially the obvious cases like
==, is cleaner than using only functions because it lets you order
things normally. But some of the operator choices are non-intuitive.
Personally, I would make something like 'alias' a function or class,
rather than overloading XOR. Not sure about ** for where.

Using // for 'in' looks really weird, too. It's too bad you can't
overload Python's 'in' operator. (Can you? It seems to be hard-coded
to iterate through an iterable and look for the value, rather than
calling a private method like some other builtins do.)
.... def __contains__(self, thing):
.... print "Do I have a", thing, "?"
.... return True
....Do I have a Steev ?
True[...]

regards
Steve
 
J

Jason Stitt

Jason Stitt wrote:
... def __contains__(self, thing):
... print "Do I have a", thing, "?"
... return True
...

I stand corrected. <excuse>Python.org was intermittently down
yesterday</excuse> so I was trying to play around with the
interactive interpreter and missed it.

For future reference:
http://www.python.org/doc/ref/specialnames.html

However 'in' seems to coerce the return value of __contains__ to True
or False, even if you return an object reference.

- Jason
 
J

Jason Stitt

// was a bit of a stretch. I'd initially thought it for the "where"
clause, becuase it's lower precedence than ** (I think), and really
late at night // kind of looks like a W. I decided against it because
it looks to close to a comment in some other languages.

Python "in" clause doesn't seem exploitable in any way--probably a
good
thing. I did add a "in_" method (name is arguable), which does the
same thing, also a not_in.

What about modifying the overloaded == to produce 'in' if the right-
hand side is a list? Then you can more easily generate statements
dynamically:

def makeCond(name):
return someOtherCond & (model.table.name == name)

makeCond("foo")
makeCond(["foo", "bar"])

And it doesn't require two different functions.

As long as there is no case where you might actually want to test if
a column value equals a list, it should work. Is there? Some DBs
support an Array type, but in general that might be better handled
with an Array class, anyway.

- Jason, fingers crossed that all this black magic doesn't affect
one's chances in the afterlife ;)
 
G

grunar

The big operator question will be: how will "and" and "or" be
implemented? This is always a sticking point because of Python's
short-circuiting behaviors regarding them (the resultant bytecode will
include a JUMP).

I'm using the Boolean | and & operators for logical groups, eg (a | b |
(b & c)). This might seem ugly to pureists, but solves all of the
short-circuit issues. It does require the user to use excessive
parentheses, becuase | evaluates before ==. Another option is to use
functions-- AND(EQ(a, 1), OR(IN(B,(1,2,3)))) -- But I find this hard to
read. But mixing the two is sometimes clean: EQ(a,1) & LT(b,2). But
having too many ways of doing things doesn't seem very pythonic.
An alternative is to stuff the representation into a string, which can
then be parsed however one likes.

For Dejavu (http://projects.amor.org/dejavu), I didn't do either
one--instead I used lambdas to express the where clause, so that:

f = logic.Expression(lambda x: ('Rick' in x.Name) or
(x.Birthdate == datetime.date(1970, 1, 1)))
units = sandbox.recall(Person, f)

might produce, in the bowels of the ORM:

"SELECT * FROM [Person] WHERE [Person].[Name] Like '%Rick%' or
[Person].[Birthdate] = #1/1/1970#"

Note that the tablename is provided in a separate step. The translation
is based on the codewalk.py and logic.py modules, which are in the
public domain if you want to use any part of them. See
http://projects.amor.org/dejavu/svn/trunk/

This is a very elegant solution, so much so that I almost didn't go
down the path of sqlstring. Having support for lambda expressions is
still an option, though I wanted to try object operator
overloading/methods first--too see if I could avoid the Bytecode issue.
This is a hard problem, since your sqlstring module doesn't control the
result sets, and so can't provide fallback mechanisms if a given
database does not support a given function (or operator, or minute
detail of how a function or operator works; for example, LIKE is
case-insensitive in MS SQL Server but case-sensitive in PostgreSQL). If
you're going to use subclasses to handle "database-specific overwrites"
(below), then you'll probably want to stick such functions in that base
class (and override them in subclasses), as well.
Good point. These things should be able to be "intercepted" in the
database specific modules, so the library has a documented way
functions should be used (ANSI if applicable), but database specific
overwrites allow us to deal with issues or hacks (to emulate a
function) in databases.

See the Adapter and SQLDecompiler classes in
http://projects.amor.org/dejavu/svn/trunk/storage/db.py (and the
store*.py modules) for some examples of using subclassing to produce
database-specific syntax. There, it's one Adapter class per supported
DB-type; you might consider keeping the Expression objects themselves
free from SQL, and transform the Expressions to SQL in a separate
class, which you could then subclass.
Thanks. Your approach here had already inspired me, I'll take a look
at it again. Pulling the SQL out of the Expression objects is double
sided, but might be a way to cleanly support db syntax nuances. I'll
keep you posted.

Runar
 
G

grunar

Using // for 'in' looks really weird, too. It's too bad you can't
// was a bit of a stretch. I'd initially thought it for the "where"
clause, becuase it's lower precedence than ** (I think), and really
late at night // kind of looks like a W. I decided against it because
it looks to close to a comment in some other languages.

Python "in" clause doesn't seem exploitable in any way--probably a
good
thing. I did add a "in_" method (name is arguable), which does the
same thing, also a not_in.

What about modifying the overloaded == to produce 'in' if the right-
hand side is a list? Then you can more easily generate statements
dynamically:

def makeCond(name):
return someOtherCond & (model.table.name == name)

makeCond("foo")
makeCond(["foo", "bar"])

And it doesn't require two different functions.

As long as there is no case where you might actually want to test if
a column value equals a list, it should work. Is there? Some DBs
support an Array type, but in general that might be better handled
with an Array class, anyway.

This is a great idea, and should be the default behaviour for lists.
It does present a problem if the right hand expression is a SELECT
object, though. Both of these are valid syntax:

id = (select max(id) from table)
id in (select id from table)

Also, SQLite allows for column in table_name syntax. I've never seen
that before, but I wanted to support that, there'd be no way of knowing
in vs. ==.

On this line of thought, what about the += operator? That might be
more intuative than //. I could even use -= for not in.

Runar
 
P

Pierre Quentel

(e-mail address removed) a écrit :
My solution is sqlstring. A single-purpose library: to create SQL
statement objects. These objects (such as sqlstring.Select), represent
complex SQL Statements, but as Python objects. The benefit is that you
can, at run-time, "build" the statement pythonically, without
getting bogged down in String Manipulation. The theory is that once in
use, things that were complex (string magic) become simpler, and allow
the program to worry about higher-level issues.
With the same starting point - I don't like writing SQL strings inside
Python code either - I have tested a different approach : use the Python
list comprehension / generator expression syntax for the select requests

I have published a recipe on the Python Cookbook :
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/442447

For instance :

s = query(r.name for r in planes if r.speed > 500)
for item in s:
print s

query is a class whose instances are created with the generator
expression as argument. The matching SQL request is built in the
__init__ method, here :

SELECT r.name FROM planes AS r WHERE r.speed > 500

On two tables :

s=query(r.name for r in planes for c in countries if r.country ==
c.country and c.continent == 'Europe')

is translated into :

SELECT r.name FROM countries AS c ,plane AS r WHERE (r.country =
c.country AND c.continent = 'Europe')

For the moment the implementation is not very elegant, especially for
getting the source code of the generator expression (it would be nice if
they had an attribute for that !), and I'm not sure if it could work for
all the forms of the SELECT syntax. But it should cover at least the
most usual kinds of requests, with a Pythonic syntax

Regards,
Pierre
 
T

Tom Anderson

On this line of thought, what about the += operator? That might be more
intuative than //. I could even use -= for not in.

You're going to have to explain to me how using an assignment operator for
something other than assignment is intuitive!

-1 on this one from me, i'm afraid.

Using 'in' would be good. It does require some truly puke-inducing
contortions, though; since 'in' calls __contains__ on the right-hand
operand, and that's likely to be a list, or some other type that's not
under your control, you have to cross your fingers and hope that whatever
it is implements __contains__ with equality tests with the probe object on
the left-hand side and the candidates on the right (as lists do, at least
in 2.4.1). then, you just have to make your table names do the right thing
when compared to strings.

It's a shame (sort of) that you can't define entirely new operators in
python. What we need is a __operate__(self, op, arg) special method, so
you could do:
.... def __operate__(self, op, arg):
.... print "operating with", op, "on", arg
.... operating with <~> on foo

I'm sure that would do *wonders* for program readability :).

tom
 
T

Tom Anderson

(e-mail address removed) a écrit :


With the same starting point - I don't like writing SQL strings inside Python
code either - I have tested a different approach : use the Python list
comprehension / generator expression syntax for the select requests

For instance :

s = query(r.name for r in planes if r.speed > 500)
for item in s:
print s

query is a class whose instances are created with the generator
expression as argument. The matching SQL request is built in the
__init__ method, here :

SELECT r.name FROM planes AS r WHERE r.speed > 500

That, sir, is absolute genius.

Evil as ****, but still absolute genius.

tom
 
G

grunar

Tom said:
You're going to have to explain to me how using an assignment operator for
something other than assignment is intuitive!

-1 on this one from me, i'm afraid.
Point. I do think it looks strange, because we're used to seeing += in
code. But the concept is more along the lines of the == and !=
comparison operators.
Python does expose other nice things, such as &= and %=, which (since
people aren't used to seeing them used much), might be better
candidates. Does %= seem more agreeable? (I'm already using % for a
like statement).

So, a statement could look like this:

person ** (
(person.type_id == 'customer')
& (person.id %= phone(phone.person_id)))
)

becomes:

select * from person
where person.type_id = 'customer'
and person.id in (select person_id from phone)

Using 'in' would be good. It does require some truly puke-inducing
contortions, though; since 'in' calls __contains__ on the right-hand
operand, and that's likely to be a list, or some other type that's not
under your control, you have to cross your fingers and hope that whatever
it is implements __contains__ with equality tests with the probe object on
the left-hand side and the candidates on the right (as lists do, at least
in 2.4.1). then, you just have to make your table names do the right thing
when compared to strings.
__contains__, while allowing side-effects on the object in question
(even if though it's on the right), only returns true/false (not a
custom object) afaik, so it breaks in a complex expression -- (a ==
b) & (c in d), won't work. You could modify D, but you can't pass that
value to the whole Condition Expression.
 
G

grunar

person ** (
(person.type_id == 'customer')
& (person.id %= phone(phone.person_id)))
)
Nevermind. This doesn't work because all of the X= operators in
question are assignment operators, and therefore generate a Syntax
Error if in a nested expression. I think I've settled on just doing a
table.column.IN(blah) syntax. This should be obvious to anyone reading
the code, and doesn't require mangling of the name (since it's
capitalized). Then we'd have similar functions for other non intuitive
things, such as LIKE, EXISTS (on the table) and even a WHERE:

person.WHERE(
(person.type_id == 'customer')
& (person.id.IN(phone(phone.person_id)))
)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,996
Messages
2,570,238
Members
46,826
Latest member
robinsontor

Latest Threads

Top