How to make Ruby _THE_ scripting language of choice, fold in SQLite

J

John Carter

Much has been written about the OOP / RDBMS impedance mismatch....

But when it comes down to it, SQL wins for certain tasks, and
ruby/perl/python wins for other tasks.

Sure we have adapters that provide access from Ruby to SQL databases,
but at some primitive level... the query is written as a string, the
string is fed to a SQL interpreter, the query is optimized by the SQL
engine and the results returned.

In some cases the results are returned as strings in other cases
various levels of conversion can be ommitted.

The fact is the function call notation is insufficiently rich to
specify an arbitary SQL statement without reducing to the (very)
lowest common denominator (string) and then invoking a (heavyweight)
SQL interpretor.

Suggestion to make Ruby the undisputed leader of the scripting languages...
1) Extend Ruby syntax to accept a useful subset of SQLite SQL.

2) incorporate the appropriate bits of SQLITE into the Ruby
interpretor to evaluate it.

3) and perhaps optionally generate string queries for
(mysql/postgres/oracle...) backends.

4) provide a means of transparently mapping certain classes to
tables so that they are automagically persisted at commit points,
on garbage collection or program exit, and automagically
revivified on reference.

5) Hide the distinction between Object "handles" and foreign
keys. ie. If instance variable @c of an object of class A is
a reference to an instance of class / table C, then
following that reference may cause that row of table C to be
looked up and revivified as an instance of class C.

# Attaches, unless told otherwise to sqlite db __FILE__.db
require 'rubysql'

class Customer
attr_primary_key :name
attr_reader :street, :city

# Map this class to a sql table called Customer with primary key :name and fields street and city.
include persistent
end

class Invoice
attr_foreign :customer, Customer

end

# It's persisted, so even if this appears a useless and can be garbage
# collected, it will at least be added to Customer table
Customer.new( "Joe Blogs", "52 hell", "Doggcity")

jane = Customer.new( "Jane Blag", "52 Sell", "Dodgcity")

a = Invoice.new

# The following three lines should be equivalent...
a.customer = "Jane Blag"
a.customer = jane
a.customer = select * from Customer where name == "Jane Blag"

# An both should transparently and automagically allow ...
assert_equal( "Dodgcity", a.customer.city)

and a.customer.city be the same as...
select city from Customer where name == a.customer;
 
A

ara.t.howard

Much has been written about the OOP / RDBMS impedance mismatch....

But when it comes down to it, SQL wins for certain tasks, and
ruby/perl/python wins for other tasks.

Sure we have adapters that provide access from Ruby to SQL databases,
but at some primitive level... the query is written as a string, the
string is fed to a SQL interpreter, the query is optimized by the SQL
engine and the results returned.

In some cases the results are returned as strings in other cases
various levels of conversion can be ommitted.

The fact is the function call notation is insufficiently rich to
specify an arbitary SQL statement without reducing to the (very)
lowest common denominator (string) and then invoking a (heavyweight)
SQL interpretor.

Suggestion to make Ruby the undisputed leader of the scripting
languages...
1) Extend Ruby syntax to accept a useful subset of SQLite SQL.

2) incorporate the appropriate bits of SQLITE into the Ruby
interpretor to evaluate it.

3) and perhaps optionally generate string queries for
(mysql/postgres/oracle...) backends.

4) provide a means of transparently mapping certain classes to
tables so that they are automagically persisted at commit points,
on garbage collection or program exit, and automagically
revivified on reference.

5) Hide the distinction between Object "handles" and foreign
keys. ie. If instance variable @c of an object of class A is
a reference to an instance of class / table C, then
following that reference may cause that row of table C to be
looked up and revivified as an instance of class C.

# Attaches, unless told otherwise to sqlite db __FILE__.db
require 'rubysql'

class Customer
attr_primary_key :name
attr_reader :street, :city

# Map this class to a sql table called Customer with primary
key :name and fields street and city.
include persistent
end

class Invoice
attr_foreign :customer, Customer

end

# It's persisted, so even if this appears a useless and can be garbage
# collected, it will at least be added to Customer table
Customer.new( "Joe Blogs", "52 hell", "Doggcity")

jane = Customer.new( "Jane Blag", "52 Sell", "Dodgcity")

a = Invoice.new

# The following three lines should be equivalent...
a.customer = "Jane Blag" a.customer = jane
a.customer = select * from Customer where name == "Jane Blag"

# An both should transparently and automagically allow ...
assert_equal( "Dodgcity", a.customer.city)

and a.customer.city be the same as...
select city from Customer where name == a.customer;

i think that's looking backwards. something like couchdb takes only a
few hundred lines of code to interface with and already does a ton of
that - the rest would be trivial. imho sql is a dead-end road,
nevertheless much of what you are describing is already done by sequel
(the ruby orm).

cheers.

a @ http://codeforpeople.com/
 
D

David Masover

But when it comes down to it, SQL wins for certain tasks, and
ruby/perl/python wins for other tasks.

The tasks SQL wins for, it wins mostly because it is executed inside the
database engine. I doubt very much that, given the choice between writing SQL
and writing Ruby, I would rather write SQL.

CouchDB will let me write custom queries in JavaScript.

And there was a similar, Ruby-based project (rddb?), but I haven't seen any
activity there in awhile. Still, the idea is the same -- I can write queries
in pure ruby.
at some primitive level... the query is written as a string, the
string is fed to a SQL interpreter, the query is optimized by the SQL
engine and the results returned.

In other words, you're wanting to fundamentally change the language -- syntax,
parser, interpreter, top to bottom -- for a performance hack, to make it
perform better with SQL databases.

SQL databases aren't the only kind of database, nor is a database the only way
to store data. Nor is SQLite the only SQL database, or necessarily the best
to optimize for.
Suggestion to make Ruby the undisputed leader of the scripting languages...

I don't think any of the other scripting languages are suffering greatly
because of lack of SQL.

In fact, Ruby has more often been slower than other languages, in my
experience. Often several orders of magnitude slower than Python, for
instance -- and Python does have Django.

Consider that -- Ruby sees plenty of production use _despite_ being slow.
1) Extend Ruby syntax to accept a useful subset of SQLite SQL.

Sequel does this, somewhat. So does ActiveRecord, for that matter.

Neither of which is bound to one database engine.

Now, yes, they currently generate strings, at some point. However, since you
don't have to directly interact with the strings, your step 2 is nothing more
than a performance hack, which only applies to SQLite.
4) provide a means of transparently mapping certain classes to
tables so that they are automagically persisted at commit points,
on garbage collection or program exit, and automagically
revivified on reference.

In other words, an ORM.

Like ActiveRecord, DataMapper, Og, Sequel, etc.

Only difference you're proposing is to make the persistence transparent.
Programmers working with ActiveRecord, in particular, don't seem to think
about concurrency issues enough -- how many of us actually use transactions?
But you're suggesting we bury that problem even further in abstractions.

What happens when two programs try to access the same object?
5) Hide the distinction between Object "handles" and foreign
keys.

Yeah, like ActiveRecord, DataMapper...

The foreign keys are there if you need them, but you don't have to deal with
them directly.
# It's persisted, so even if this appears a useless and can be garbage
# collected, it will at least be added to Customer table
Customer.new( "Joe Blogs", "52 hell", "Doggcity")

Great -- what happens if, somewhere very soon after this point, an exception
is raised? What about validations?

Never mind that it's kind of ugly that you're using positional arguments
there...

In a modern ORM, I'd do something like:

cust = Customer.new( :name => 'Joe blogs')
cust.save!

Maybe an address wasn't required, and this will work. Maybe an address was
required, in which case, an exception will be raised, and the record won't be
saved. It's useful to separate this from creation for other reasons, too --
suppose I did provide an address, and now want to remove it:

cust = Customer.find_by_name 'Joe blogs'
cust.address = nil
cust.save!

In this case, again, if address was required, I'll get a validation error, and
this particular change to the record won't be saved.
 
C

Clifford Heath

David said:
SQL databases aren't the only kind of database, nor is a database the only way
to store data. Nor is SQLite the only SQL database, or necessarily the best
to optimize for.

However, most of the non-SQL things that are included in such definitions
of "database" are not actually transactional, and hence don't qualify
as databases in any true sense of the word. The purpose of a database is
to do things that filesystems do not - ACID. Any "database" that doesn't
provide strong guarantees of ACID behaviours doesn't deserve the title.

SQL sucks, but it is strongly identified with true transactional behaviour,
an identification that only the dills who wrote MyISAM seem to have been
able to ignore and break down.
Programmers working with ActiveRecord, in particular, don't seem to think
about concurrency issues enough -- how many of us actually use transactions?

If by "transactions" you mean ActiveRecord::Base.transaction, possibly no-one.
Attempting rollback on exceptions from a user-mode process without using
a two-phase locking protocol (as many/most of AR's adapters do) is utterly
flawed and not transactional at all.

Transactional is when you can pull the power cord at any time and your data
is guaranteed consistent, and nothing that you committed has been lost.

Sorry to butt in like this, and you (David) probably know it all anyhow, but
I wanted to reinforce the true meaning of "database" and "transaction", since
they're so widely abused, berated and misunderstood.

Clifford Heath, Data Constellation.
Agile information Management and Design.
 
J

Jeff Davis

The fact is the function call notation is insufficiently rich to
specify an arbitary SQL statement without reducing to the (very)
lowest common denominator (string) and then invoking a (heavyweight)
SQL interpretor.

I suggest you look at Sequel http://sequel.rubyforge.org/

This isn't just an ORM or DBI style layer. This is using Ruby syntax to
form relational expressions with an SQL flavor.

The python equivalent is SQLAlchemy.

These are really using the right approach, they do not try to map some
not-very-well-defined object model onto SQL, they just offer a SQL-ish
language inside of Ruby that generates real SQL, and sends it off for
evaluation.
Suggestion to make Ruby the undisputed leader of the scripting languages...
1) Extend Ruby syntax to accept a useful subset of SQLite SQL.

SQLite is pretty far from the SQL standard, last time I checked. I don't
consider it the pinnacle of database semantics.
2) incorporate the appropriate bits of SQLITE into the Ruby
interpretor to evaluate it.

You haven't shown why _any_ of this needs to be in the interpreter
itself. What are you trying to accomplish, and why can't you accomplish
it through Ruby's rich extensibility?
3) and perhaps optionally generate string queries for
(mysql/postgres/oracle...) backends.

What is the language you're trying to use for this purpose, and how is
this simpler than current systems that can generate strings to send to
SQL DBMSs?
4) provide a means of transparently mapping certain classes to
tables so that they are automagically persisted at commit points,
on garbage collection or program exit, and automagically
revivified on reference.

The devil is in the details. "Automagically"? Is it more automagical
than current solutions like AR?

What about constraints and concurrency? These automagical persistence
engines tend to have difficulty in a centralized environment unless your
data is 100% semantically independent.
5) Hide the distinction between Object "handles" and foreign
keys. ie. If instance variable @c of an object of class A is
a reference to an instance of class / table C, then
following that reference may cause that row of table C to be
looked up and revivified as an instance of class C.

You're not the first person to have thought of this. The problem is,
there _is_ a distinction between those two concepts, and for good
reason.

What about a summary object, that represents the result of some GROUP BY
query? What is the object ID of that, and what is the database key?
# Attaches, unless told otherwise to sqlite db __FILE__.db

What about ruby code that's not stored in a file?

The code example you gave looks pretty much like every other ORM. How is
yours fundamentally different, aside from not being able to connect to a
remote server?

Regards,
Jeff Davis
 
J

Jeff Davis

i think that's looking backwards. something like couchdb takes only a
few hundred lines of code to interface with and already does a ton of
that - the rest would be trivial. imho sql is a dead-end road,
nevertheless much of what you are describing is already done by sequel
(the ruby orm).

Sequel is fundamentally different than a traditional ORM, and in my
opinion, much better. SQLAlchemy is a conceptually similar project for
python, and both are much more promising than any ORM.

As far as SQL being a "dead-end road", it needs to have a replacement,
first. Sequel/SQLAlchemy are a start in that direction, but the SQL
DBMSs provide a lot of things aside from just language.

Regards,
Jeff Davis
 
D

David Masover

However, most of the non-SQL things that are included in such definitions
of "database" are not actually transactional, and hence don't qualify
as databases in any true sense of the word.

However, many things that I would consider are, in fact, quite transactional.
Not as much as they could be, perhaps, but close enough.

First example: Filesystems. With ordered write mode, you can simply write to a
temporary file, then rename said temporary file into place. With unordered
writes, add an fsync between those two steps.

Any modern, journaled filesystem will make sure that the rename either
succeeds or doesn't, and you've made sure all data is successfully on disk
before you rename.

Implementation: Maildir.



Second example: Amazon's Dynamo. This powers S3 and SimpleDB, among other
things. They've got a paper on it. Some implementations can, indeed, be
considered purely-ACID.

What sets it apart from (most) SQL is, conflicts are expected as part of
normal operation, and are left to the application to sort out. That is:
Rather than preventing anyone else from modifying a record while I update it,
simply allow two versions of a record to exist, and provide common algorithms
for either choosing which version "wins", or for combining the two versions
into a third.

Thus, inconsistency is allowed, but only temporarily. It is never exposed to
the end-user, or even to (most of) the application.



Third example: Distributed version control, like Git. I mention this mostly
because it reflects the same philosophy as Dynamo above, albeit with more
human intervention -- but also because this is the one most developers are
likely to be intimately familiar with (or should be).


If by "transactions" you mean ActiveRecord::Base.transaction, possibly
no-one.

Precisely so.

I remember that one of the first things I wrote, when developing a new Rails
app, was this method:

def double_save
transaction do
save(false)
yield
save!
end
end

This to allow the creation of circular record structures -- for example, every
domain must have an SOA record, and every record (of any kind) must have a
domain. Since I'd made these fields NOT NULL on the database, I would often
do this hack with new records of this kind:

d = Domain.new(...)
d.soa_id = -1
d.double_save do
d.soa = Soa.create!(...)
end

That -1, of course, would've killed my validations.
Attempting rollback on exceptions from a user-mode process without using
a two-phase locking protocol (as many/most of AR's adapters do) is utterly
flawed and not transactional at all.

I actually don't see a lot of that -- more simple assumptions that nothing
will go wrong, or that dangling records aren't a problem.

Quite a lot of defensive programming, too, based on the idea that the data
WILL get corrupted somehow, someday, and being able to handle corrupt data is
more important than avoiding the corruption in the first place.
 
A

ara.t.howard

On Sep 25, 2008, at 9:34 PM, David Masover wrote:

not trying to be difficult but...
However, many things that I would consider are, in fact, quite
transactional.
Not as much as they could be, perhaps, but close enough.

First example: Filesystems. With ordered write mode, you can simply
write to a
temporary file, then rename said temporary file into place. With
unordered
writes, add an fsync between those two steps.

Any modern, journaled filesystem will make sure that the rename either
succeeds or doesn't, and you've made sure all data is successfully
on disk
before you rename.

Implementation: Maildir.

rename is not atomic on NFS, nor many other network filesystems. on
NFS you have only softlink and rmdir as atomic operations.

a @ http://codeforpeople.com/
 
C

Clifford Heath

ara.t.howard said:
rename is not atomic on NFS, nor many other network filesystems. on
NFS you have only softlink and rmdir as atomic operations.

Yes, the Walter Tichy method, first used (to my knowledge) by him in RCS.

However, without the fsync, its still not transactional, as you can pull
the power cord after the operation and there's no guarantee that the disk
blocks that make up the file content are actually on disk. All the FS
journalling does is to ensure that the block allocation tables and inodes
are clean.

Also I don't know the defined POSIX semantics now, but the system-wide
"sync" call would return as soon as every dirty block was in the disk I/O
queues, and before they were on disk. You had to call it twice before
being sure, as the scan for dirty blocks couldn't complete until the
queues were empty. (I don't believe fsync has this flaw).

Nevertheless, though this method can solve the (D)urability problem of
ACID, it offers no solution to A, C, or I, so is still not transactional.

Neither is Amazon's approach, and neither is DVCS, despite their success
in solving *different* problems.

Clifford Heath.
 
D

David Masover

However, without the fsync, its still not transactional, as you can pull
the power cord after the operation and there's no guarantee that the disk
blocks that make up the file content are actually on disk. All the FS
journalling does is to ensure that the block allocation tables and inodes
are clean.

Unless you've got data journaling and ordered writes.
I mentioned the ordered writes; I forgot to mention the data journaling.

Nevertheless, though this method can solve the (D)urability problem of
ACID, it offers no solution to A, C, or I, so is still not transactional.

Definitions paraphrased from Wikipedia:

(A)tomicity: Either all of the tasks are performed or none of them are.

In this case, I'm assuming that the temp directory is regularly cleaned, often
on boot. But I'm not sure that's relevant -- extra tempfiles waste space, but
they don't constitute a half-finished transaction, if the actual _data_ being
operated on is the target file/directory.

Counting a dangling tempfile as a point against atomicity is a bit like
counting an unfinished journal (which must be replayed and likely rolled
back) the same way.


(C)onsistency: the database remains in a consistent state after the
transaction is over (whether successful or not).

Since only completely-flushed (fsynced) files are renamed, assuming rename is
atomic, the directory in question will always be consistent -- you'll never
have half an email actually committed in maildir.

Again, I'm not counting a dangling tempfile towards consistency any more than
I would count a dirty journal which must be played back / cleaned up after.


(I)solation: other operations cannot see or access the data in an intermediate
state during the transaction.

Granted, if you really wanted to, you could access anothers' tempfile -- but
then, if you really wanted to, you could read the database/filesystem's
journal. But this would have to be deliberate -- it is not difficult to
generate unique tempfile names, without knowledge of what other processes are
doing. (See Maildir.)


I realize it's an ugly, brutal hack, and I realize it's ancient. I wanted to
demonstrate that relational databases do not have a monopoly on transactional
properties -- not only that, such transactions are not actually all that
difficult to get right.

Neither is Amazon's approach, and neither is DVCS, despite their success
in solving *different* problems.

I'm up too late to try to figure out what you mean here.
It's entirely possible I'm wrong.

I suspect a more rigorous definition is in order, though. After all, from
Wikipedia, Durability implies:

"once the user has been notified of success, the transaction will persist, and
not be undone."

DVCS and Amazon both allow for such transactions to be "undone", in a sense,
by further changes overwriting the result of the transaction. But the
transaction itself is never forgotten.
 
C

Clifford Heath

I might be labouring a point here, and I really don't mean to
give you a hard time. However, perhaps I can clarify further...

Most or all transactional systems rely on the ability to perform
some finite-sized atomic operation on persistent storage. By finite
size, I mean more than just flipping a single bit. Often it's
expected that if a disk block is written, the whole block is written.
Some systems put a LSN on the start and end of the block, and if
they're both intact on read, they assume that all the stuff in
between got written too. If you have the memory bandwidth for it,
you can also use "torn page detection", where the contents is
checksummed and written in a header, which does check every bit
to whatever level of confidence your checksum warrants.

This ability to perform single operations with ACID semantics is
assumed - because without it we can't do any more - and hence
isn't interesting. If the chunk you choose is a whole file, it
still isn't interesting, because it's still all or nothing on a
single object. No-one rewrites an entire database to save one
record.

The interesting cases arise when you must coordinate updates to
more than one such object. It's *then* that isolation, consistency
and atomicity actually seriously mean something.

The examples you gave do not provide a use case where all 4 ACID
properties are supported in such multi-atom situations - they only
support or require some subset. There are interesting applications
that can live with such less-than-transactional behaviour, but
don't confuse those implementations with databases. They're useful,
but they're not transactional in any non-trivial sense.

Clifford Heath.
 
D

David Masover

This ability to perform single operations with ACID semantics is
assumed - because without it we can't do any more - and hence
isn't interesting. If the chunk you choose is a whole file, it
still isn't interesting, because it's still all or nothing on a
single object. No-one rewrites an entire database to save one
record.

No, the point here is that a file is not a whole database, but is rather one
record. In this way, Maildir is a step up from mbox, in that maildir allows
single emails to be created, destroyed, or moved, without rewriting the
entire database.
The interesting cases arise when you must coordinate updates to
more than one such object.

That invalidates the filesystem example, and (to some extent) S3, but not
Dynamo or DVCS, as I understand it.
The examples you gave do not provide a use case where all 4 ACID
properties are supported in such multi-atom situations - they only
support or require some subset.

Again, it's difficult to argue against this when you haven't told me which
ones aren't supported, and why.

I'll go read the Dynamo paper again, then...
 
C

Clifford Heath

David said:
That invalidates the filesystem example, and (to some extent) S3, but not
Dynamo or DVCS, as I understand it.
Again, it's difficult to argue against this when you haven't told me which
ones aren't supported, and why.

DVCS exists because isolation isn't needed, since tools like patch
can often resolve conflicts and when they can't, a human can.
I'll go read the Dynamo paper again, then...

From <http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html>:
"To achieve this level of availability, Dynamo sacrifices consistency under certain failure scenarios."

Clifford Heath.
 
D

David Masover

=20
DVCS exists because isolation isn't needed, since tools like patch
can often resolve conflicts and when they can't, a human can.

DVCS does, in fact, provide isolation (most checkouts are not public until=
=20
pushed to a public server).

And tools like patch are severely limited by not actually knowing much abou=
t=20
the application at hand. Most newer VCSes are much smarter about resolving=
=20
conflicts, but at a certain point, since the VCS doesn't actually know what=
a=20
certain line of code does, it can't necessarily know which version of that=
=20
line is the correct one -- it can only guess from context.

Compare this to application-specific conflict resolution -- it's quite=20
possible that simply examining the two conflicting states is enough to deci=
de=20
which is correct.

=20
From <http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html>:
"To achieve this level of availability, Dynamo sacrifices consistency und=
er=20
certain failure scenarios."

=46rom the same article:

"Dynamo is used to manage the state of services that have very high=20
reliability requirements and need tight control over the tradeoffs between=
=20
availability, consistency, cost-effectiveness and performance."

But you're right here, and I withdraw any claim of solid ACID properties of=
=20
Dynamo. I do agree with this, though:

"Experience at Amazon has shown that data stores that provide ACID guarante=
es=20
tend to have poor availability. This has been widely acknowledged by both t=
he=20
industry and academia."
 
E

Erik Hollensbe

John said:
Sure we have adapters that provide access from Ruby to SQL databases,
but at some primitive level... the query is written as a string, the
string is fed to a SQL interpreter, the query is optimized by the SQL
engine and the results returned.

I think the fundamental problem with changing this (and this applies to
most of the arguments I've seen in this thread) is that if you put the
burden of getting the data to disk on ruby's doorstep, you're begging
for incompatibility with other implementations (not everyone is going
to, or should have to, use ruby for everything).

The alternative is that ruby generates some kind of intermediate
language that's sent to the database... We *already* have this, SQL. SQL
has fundamental problems that are largely borne by the competition in
RDBMSs out there, but it works, it's not *that* hard to generate and
it's *nice* to ensure your database interface isn't causing you trouble.
Anyone that's attempted to write bindings against libdb[2-4]? will
understand that very well. Additionally, while many think of ORMs when
tackling this problem, there are plenty of (perhaps advanced) reasons to
skip the ORM and write the query manually. Not everything in a database
is or even can be represented by the relationships that most ORMs
provide, and "not everything" gets considerably larger in size when
performance is an issue.

I guess what I'm saying is, the problems really don't surround the
relationship between ruby and sql, or even ruby and the database, as
much as they involve the inherent inability of SQL to handle new
features and remain compatible with the specification that defines it.
It's probably better handled there.
 
C

Clifford Heath

David said:
DVCS does, in fact, provide isolation (most checkouts are not public until
pushed to a public server).

That's not the definition of isolation. Isolation means that
two transactions that update the same piece of data will only
ever do so in a way that guarantees that the resultant state
*of the whole store* is valid, assuming that the two transactions
were valid when run independently. DVCS cannot do this, and most
importantly, do not even attempt to, though it's remarkably
effective at getting close most of the time.

Go check Wikipedia or anywhere else to verify this definition
before trying to create your own personal definition.
"Experience at Amazon has shown that data stores that provide ACID guarantees
tend to have poor availability. This has been widely acknowledged by both the
industry and academia."

This doesn't bear on the ACID properties, but for the record, many
DBMSs sustain high transaction loads with astonishing availability.
Consider for example the availability of most major stock exchanges,
which run transaction loads in the many hundreds per second, every
trading day of the year, with almost no lack of availability. Many
other databases run high loads 24x7 for years without being restarted.
That's the sort of market where Tandem made their cash.

There's no chance in a million years that Amazon will ever achieve
the same availability measures, and their claim is either radically
out of context or just plain lies.

Clifford Heath.
 
J

John Carter

Thanks for the many responses, all made Good and perfectly valid
points... alas, mostly in the sense of very astute folk standing a
little too close to an elephant... :)

So rather going into a point by point rebuttal (since, as I said,
everybodies responses were perfectly valid and hence I have no
objection) I rather highlight the response that glimpsed the full
Elephant.

As far as SQL being a "dead-end road", it needs to have a replacement,
first. Sequel/SQLAlchemy are a start in that direction, but the SQL
DBMSs provide a lot of things aside from just language.

Let's face it there are a _lot_ more SQL users out there than Ruby
users...pity SQL is _so_ ugly.

Big Money brooks no dead ends. SQL is a huge, very useful, successful
money spinner. Hence it will be and _is_ being extended both on the
client _and_ server side to do what scripting languages like Ruby does.

No sane language designer would choose such a path, the SQL extended
languages I have seen are as ugly as sin... but as I say, nothing
stops the money.

So tell me, assuming you, like millions of developers worldwide, have
to use some form of SQL database.

Which would you rather code in?

One of the SQLextended (distended)? abortions that comes with the DB?

Or Ruby?

Silly question (in this forum):). The answer obviously is Ruby.

Is the integration between Ruby and SQL as slick as you could imagine?

The current adapters are damn Good, but not as slick as I could imagine.

Neither in terms of syntax, Object / Relational integration, exception
handling nor performance.

Can you use Ruby in your server side stored procedures? Probably not.

So let's talk a bit about what SQL gives, that I'd like to see cleanly
and comfortably and slickly integrated in ruby...

* Relational Data model. Yup. I'm aware of the perfectly valid
criticisms that SQL doesn't go anywhere near close enough to Codd's
Laws... but it's one of the closest practical things that does. And
yes, I'd love it if Ruby had a true relational algebra embedded within it.

Despite what various flavours of Object Oriented Analysis guru's
will tell you....

A class is _not_ a relation, an Object _is_ not a row, a Hash map
from primary key to class instances is _not_ a table. So ORM / Db
adapters try mash these notions together, but the semantics aren't
quite the same.

I bet we could use Ruby Mixins to designate and extend certain
classes as tables.

I bet we could find a really Good Ruby way of defining the Data
Model that would be better than "create table". Indeed some of the
ORM libraries mention are excellent examples of this.

* Data Persistence. Yip, marshal and restore gives us that. But is
that enough?
- Indexes / updates and deletes?
- Concurrent access?
- Even if an application can persist it's object model it's
incredibly useful to query, and access that same data model from a
query tool, or (perhaps indirectly via view or a projection) by
another (loosely related) application.

* Data Model / Database Management. Whoops. Adding / deleting /
renaming columns to a data table created via marshal / restore is
painful.

* Set operations (on things too big to fit in ram). We're Ruby
programmers. We think procedurally. Can I recommend the book "The
Art of SQL"? http://books.google.co.nz/books?id=HfcMDvxb43AC

It is a Good Reminder that screens and screens of procedural code can be
rewritten as set op one liners.

* ACID - Well, actually that would be kind of handy in a number of
other areas not traditionally "Database".

* Query Language - selects, joins, group by's, sorts. Rich and
expressive stuff indeed.

* Client / Server / Concurrency - Actually Ruby is very Good at that
already, except there is no support for concurrency at the marshal
and restore level.

Does this involve a multiyear rewrite of the core of Ruby?

Nah.

It involves folding in preexisting public domain code. Sqlite. No,_not_
at the C API level...
http://sqlite.org/capi3.html

but folding in the appropriate parts of (select,update,delete,insert) of Sqlite's parse.y
http://www.sqlite.org/br3317/artifact/405
into
http://svn.ruby-lang.org/cgi-bin/viewvc.cgi/trunk/parse.y?view=markup


John Carter Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : (e-mail address removed)
New Zealand
 
D

David Masover

Big Money brooks no dead ends. SQL is a huge, very useful, successful
money spinner. Hence it will be and _is_ being extended both on the
client _and_ server side to do what scripting languages like Ruby does.

No sane language designer would choose such a path, the SQL extended
languages I have seen are as ugly as sin... but as I say, nothing
stops the money.

In other words: Much like COBOL.

COBOL is, in fact, very successful. There's a huge amount of old COBOL code
lying around, and there's quite a lot of high-paying jobs for maintaining it.
Because of the sheer size of these codebases, it's been extended to do all
kinds of things.

But it's still COBOL, and still ugly as sin.

So tell me, assuming you, like millions of developers worldwide, have
to use some form of SQL database.

To continue the analogy, suppose I had to code in COBOL.

What are the reasons I would have to use COBOL?
Most likely, it's because I'm supporting a legacy app.

When was the last time you had to maintain a legacy Ruby app?
Generally, if I'm being paid to write Ruby, it's because I'm developing a
relatively new app, and with a corporation that has the presence of mind to
choose the best tool for the job -- not the most popular tool for the job.

So, if I'm being paid to write a brand-new Ruby app, don't you think there'd
be discussion of whether or not to use a SQL database? Of whether a
document-based database might be a better choice?

In my discussion with Clifford Heath on this thread, I learned that there are
applications which are suitable to a traditional transactional database
(probably SQL), and not to a document-based database. I would guess that such
applications are still the vast minority.

Is the integration between Ruby and SQL as slick as you could imagine?

Define "slick".

The current adapters are damn Good, but not as slick as I could imagine.

Neither in terms of syntax, Object / Relational integration, exception
handling nor performance.

Extraordinary claims require extraordinary proof.

So let's talk a bit about what SQL gives, that I'd like to see cleanly
and comfortably and slickly integrated in ruby...

* Relational Data model. Yup. I'm aware of the perfectly valid
criticisms that SQL doesn't go anywhere near close enough to Codd's
Laws... but it's one of the closest practical things that does. And
yes, I'd love it if Ruby had a true relational algebra embedded within
it.

Google isn't giving me a lot of help here. What's this about, exactly? What
would it look like in Ruby?

So ORM / Db
adapters try mash these notions together, but the semantics aren't
quite the same.

Which differences in semantics are bothering you?

I bet we could use Ruby Mixins to designate and extend certain
classes as tables.

Or maybe inheritance.

I know! We can make all classes that want to be tables inherit from a common
base class! I think I'll call it ActiveRecord::Base.

I bet we could find a really Good Ruby way of defining the Data
Model that would be better than "create table". Indeed some of the
ORM libraries mention are excellent examples of this.

In other words, an ORM library could do this -- nothing needs to be changed in
Ruby itself.

* Data Persistence. Yip, marshal and restore gives us that.

Marshal and restore aren't generally used for persistence. Generally, the ORM
in question provides its own semantics for that -- certain properties of it
correspond directly to columns, and telling the object to persist updates the
row with those values.

But, suppose we're talking about something like CouchDB, or RDDB:
- Indexes / updates and deletes?

Done, done, and done. Calling "save" on any model will update the record with
the values which have changed, or insert a new record if one doesn't exist.
Calling "destroy" will delete it.
- Concurrent access?

This is a more fundamental problem, and it's not unique to Ruby. But there is
native (and easy) support for optimistic and pessimistic locking.
- Even if an application can persist it's object model it's
incredibly useful to query, and access that same data model from a
query tool, or (perhaps indirectly via view or a projection) by
another (loosely related) application.

Or within the same. But yes, that's kind of inherent in the fact that a SQL
database is used here:

* Data Persistence. Yip, marshal and restore gives us that. But is
that enough?

Suppose we split it into one record per file. I'm not advocating this -- I
would much rather
- Indexes / updates and deletes?

Indexes: Yes, maintained separately.

Updates: Unmarshal the record you need, change it, then re-marshal and save.

Deletes: Delete
- Concurrent access?

Can be done on a filesystem. Maildir gives a very simple example of how this
might be done.

For that matter, concurrent access in most databases cannot be done more
finely than by table or by row. A system like flock should be more than
adequate for this purpose.
- Even if an application can persist it's object model it's
incredibly useful to query, and access that same data model from a
query tool, or (perhaps indirectly via view or a projection) by
another (loosely related) application.

Define "loosely related" -- why not simply load up the model in question into
this other application?
* Data Model / Database Management. Whoops. Adding / deleting /
renaming columns to a data table created via marshal / restore is
painful.

Not really. Maybe if you're marshalling/restoring the objects directly...

But consider: Adding a "column" is trivial; you need only provide a default
value, and adjust the model class to know about this. You don't actually have
to change any data at all, until people start setting values for
that "column". Existing document-based databases work exactly this way.

Deleting a "column" could be done the same way -- let the model class know it
should be deleted.

Renaming would be the most difficult. Here, you'd have to version your data,
and give the model class a procedure to run when loading records older than a
certain version.

* ACID - Well, actually that would be kind of handy in a number of
other areas not traditionally "Database".

Not all apps need or want ACID. In fact, it's much easier to scale most apps
if you don't make them ACID-compliant.

* Query Language - selects, joins, group by's, sorts. Rich and
expressive stuff indeed.

And confusing, archaic, and fundamentally different than how we program
everything else.

For that matter, we do have this, somewhat -- we have Enumerable#select and
Enumerable#sort.

* Client / Server / Concurrency - Actually Ruby is very Good at that
already, except there is no support for concurrency at the marshal
and restore level.

No, it's done by serializing with something else -- generally, an ORM like
ActiveRecord. But as I've shown above, you can, in fact, have concurrency
with marshal and restore.

Does this involve a multiyear rewrite of the core of Ruby?

Nah.

It involves folding in preexisting public domain code. Sqlite.

If it is so trivial, fork Ruby and do it yourself. Show us that it can be
done, and that it is useful, and that it's worth polluting Ruby's core
namespace with SQL junk.

And don't be surprised if, when you're done, nobody's interested, or nobody's
willing to take the risk of breaking old code. After all, Ruby 1.9 barely
changes syntax at all, and look how long it's taken -- it's more than just a
technical problem. It's a design problem.

but folding in the appropriate parts of (select,update,delete,insert) of
Sqlite's parse.y

Ruby is already a powerful and expressive language for creating new domain
specific languages. The Sequel library already provides a Ruby DSL for SQL
queries.

There are many domains which might benefit from customizing Ruby's parser. I
imagine that Rake wouldn't mind being able to do this:

foo.o: foo.c
system 'gcc foo.c -o foo.o'

Not the best Make example, but you get the idea. Instead, they've chosen to
leave it as Ruby, which means it looks like this:

file 'foo.c' => 'foo.o' do
system 'gcc foo.c -o foo.o'
end

Much cleaner, and actually valid Ruby.



And you still haven't addressed perhaps the most disturbing part of your
proposal: Even if we agree with you that SQL is so important that it needs to
be part of the language, you're essentially dictating one true SQL engine and
flavor of SQL. Why should Ruby embed SQLite, when almost no Ruby programs use
SQLite in production? What does this mean for users of other databases?

And what does this mean for those who don't want to use SQL at all?
When I call Array#select, will it do what I expect?



Oh, and for what it's worth: There's a language called Judo, which includes
SQL. I'm curious what you think of it:

http://judoscript.org/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,810
Latest member
Kassie0918

Latest Threads

Top