A
Aaron Brady
Hi all,
I think Python should have a relation class in the standard library.
Fat chance. I want to write a recipe for it, but I don't know how. I
want your advice on some of the trade-offs, what it should look like,
what the pitfalls are, different strengths and weaknesses, etc.
Fundamentally, a relation is a set of tuples. A simple generator
expression or itertools filter can fulfill the requirements of a
query. But the fun stops there unless you want faster-than-linear
lookup times. I do.
For matching equality, you need per-column hash tables
(dictionaries). If I have a "WHERE lastname= 'Newton'" clause, I can
get all the matching records in constant time. For matching
inequality, you need a per-column sorted list (balanced tree). If I
have a "WHERE sales> 400" clause, I can get one matching record in log-
n time, and all of them in (log-n)+k time.
That's not the fun part either. What is the fun part, is the exact
nuance of query structure in a select statement. I want to return an
iterator from the following function calls.
recordset= Parts.select( [ "part" ], model== 'foopad' )
recordset= Sales.select( [ "model" ], sales>400 and sales<600 )
recordset= (Parts+Sales).select( [ "part" ], sales>400 and sales<600 )
The third of these is a join statement. It selects every part which
was in at least one model between 400 and 600 of which were sold. It
might need something more explicit, especially for the different types
of joins: 'Parts.join( Sales ).select' and 'Parts.innerjoin
( Sales ).select', or 'relation.innerjoin( Parts, Sales ).select'.
Unfortunately, even in this form, it won't work. While the statements
are valid expressions, the second argument is evaluated too soon. If
I put a lambda in front of it, I lose the ability to beat linear-time
lookups, and I might as well just use 'ifilter'. I want live Python
objects in the tuples and queries, so I can't just convert everything
to a string. What are my options? This is Python, so they can't all
be bad.
P.S. Recurring topic!
I think Python should have a relation class in the standard library.
Fat chance. I want to write a recipe for it, but I don't know how. I
want your advice on some of the trade-offs, what it should look like,
what the pitfalls are, different strengths and weaknesses, etc.
Fundamentally, a relation is a set of tuples. A simple generator
expression or itertools filter can fulfill the requirements of a
query. But the fun stops there unless you want faster-than-linear
lookup times. I do.
For matching equality, you need per-column hash tables
(dictionaries). If I have a "WHERE lastname= 'Newton'" clause, I can
get all the matching records in constant time. For matching
inequality, you need a per-column sorted list (balanced tree). If I
have a "WHERE sales> 400" clause, I can get one matching record in log-
n time, and all of them in (log-n)+k time.
That's not the fun part either. What is the fun part, is the exact
nuance of query structure in a select statement. I want to return an
iterator from the following function calls.
recordset= Parts.select( [ "part" ], model== 'foopad' )
recordset= Sales.select( [ "model" ], sales>400 and sales<600 )
recordset= (Parts+Sales).select( [ "part" ], sales>400 and sales<600 )
The third of these is a join statement. It selects every part which
was in at least one model between 400 and 600 of which were sold. It
might need something more explicit, especially for the different types
of joins: 'Parts.join( Sales ).select' and 'Parts.innerjoin
( Sales ).select', or 'relation.innerjoin( Parts, Sales ).select'.
Unfortunately, even in this form, it won't work. While the statements
are valid expressions, the second argument is evaluated too soon. If
I put a lambda in front of it, I lose the ability to beat linear-time
lookups, and I might as well just use 'ifilter'. I want live Python
objects in the tuples and queries, so I can't just convert everything
to a string. What are my options? This is Python, so they can't all
be bad.
P.S. Recurring topic!