N
Nobody
Except nobody has yet shown an alternative which is easier to get right.
For SQL, use stored procedures or prepared statements. For HTML/XML, use a
DOM (or similar) interface.
Except nobody has yet shown an alternative which is easier to get right.
Jorgen Grahn said:I thought it was well-known that the solution is *not* to try to
sanitize the input -- it's to switch to an interface which doesn't
involve generating an intermediate executable. In the Python example,
that would be something like os.popen2(['zcat', '-f', '--', untrusted]).
That???s what I mean. Why do people consider input sanitization so hard?
Yes. And then some.
Among web developers, the median level of programming knowledge amounts to
the first 3 chapters of "Learn PHP in 7 Days".
It doesn't help the the guy who wrote PHP itself wasn't much better.
Yep. Search the BugTraq archives for "SQL injection". And most of those
are for widely-deployed middleware; the zillions of bespoke site-specific
scripts are likely to be worse.
Also: http://xkcd.com/327/
Well known by anyone with a reasonable understanding of the principles of
programming, but somewhat less well known by the other 98% of web
developers.
There's a world of difference between a skilled chef and the people
flipping burgers for a minimum wage. And between a chartered civil
engineer and the people laying the asphalt. And between what you
probably consider a programmer and the people doing most web development.
A long while ago I came up with this macro:
#define Descr(v) &v, sizeof v
making the correct version of the above become
snprintf(Descr(buf), foo);
I get worried when people talk about how easy it is to do something
safely. Let me suggest a couple of things you might not have considered:
1) Somebody is running your application (or the database server) with
the locale set to something unexpected. This might change how numbers,
dates, currency, etc, get formatted, which could change the meaning of
your constructed SQL statement.
2) Somebody runs your application with a different PYTHONPATH, which
causes a different (i.e. malicious) urllib module to get loaded, which
makes urllib.quote() do something you didn't expect.
Carl Banks said:Seriously, almost every other kind of library uses a binary API. What
makes databases so special that they need a string-command based API?
How about this instead (where this a direct binary interface to the
library):
results = rdb_query(table = model,
columns = [model.name, model.number])
results = rdb_inner_join(tables = [records,tags],
joins = [(records.id,tags.record_id)]),
columns = [record.name, tag.name])
Well, we know the real reason is that C, Java, and friends lack
expressiveness and so constructing a binary query is an ASCII
nightmare. Still, it hasn't stopped binary APIs in other kinds of
libraries.
That’s the same fallacious argument I pointed out earlier.
Except for the huge number that deal with text protocols or languages.
Because SQL is a text language.
Seriously, almost every other kind of library uses a binary API. What
makes databases so special that they need a string-command based API?
How about this instead (where this a direct binary interface to the
library):results = rdb_query(table = model,
columns = [model.name, model.number])results = rdb_inner_join(tables = [records,tags],
joins = [(records.id,tags.record_id)]),
columns = [record.name, tag.name])Well, we know the real reason is that C, Java, and friends lack
expressiveness and so constructing a binary query is an ASCII
nightmare. Still, it hasn't stopped binary APIs in other kinds of
libraries.
Well, the answer to that one is simple. SQL, in the hands of somebody
like me, can be used to express a few pathetic joins and what I do with
it could probably be handled with the kind of API you're describing.
But, the language has far more expressivity than that, and a
domain-specific language is really a good fit for what it can do.
The problem is not so much that SQL queries are described as text
strings,
but that the distinction between program and data gets lost if
you build the query as one big string.
Seriously, almost every other kind of library uses a binary API. What
makes databases so special that they need a string-command based API?
How about this instead (where this a direct binary interface to the
library):results = rdb_query(table = model,
columns = [model.name, model.number])results = rdb_inner_join(tables = [records,tags],
joins = [(records.id,tags.record_id)]),
columns = [record.name, tag.name])Well, we know the real reason is that C, Java, and friends lack
expressiveness and so constructing a binary query is an ASCII
nightmare. Still, it hasn't stopped binary APIs in other kinds of
libraries.
Well, the answer to that one is simple. SQL, in the hands of somebody
like me, can be used to express a few pathetic joins and what I do with
it could probably be handled with the kind of API you're describing.
But, the language has far more expressivity than that, and a
domain-specific language is really a good fit for what it can do.
I'm not the biggest expert on SQL ever, but the only thing I can think
of is expressions. Statements don't express anything very complex,
and could straightforwardly be represented by function calls. But
it's a fair point.
Seriously, almost every other kind of library uses a binary API. What
makes databases so special that they need a string-command based API?
How about this instead (where this a direct binary interface to the
library):results = rdb_query(table = model,
columns = [model.name, model.number])results = rdb_inner_join(tables = [records,tags],
joins = [(records.id,tags.record_id)]),
columns = [record.name, tag.name])Well, we know the real reason is that C, Java, and friends lack
expressiveness and so constructing a binary query is an ASCII
nightmare. Still, it hasn't stopped binary APIs in other kinds of
libraries.
Well, the answer to that one is simple. SQL, in the hands of somebody
like me, can be used to express a few pathetic joins and what I do with
it could probably be handled with the kind of API you're describing.
But, the language has far more expressivity than that, and a
domain-specific language is really a good fit for what it can do.
I'm not the biggest expert on SQL ever, but the only thing I can think
of is expressions. Statements don't express anything very complex,
and could straightforwardly be represented by function calls. But
it's a fair point.
Seriously, almost every other kind of library uses a binary API. What
makes databases so special that they need a string-command based API?
How about this instead (where this a direct binary interface to the
library):
results = rdb_query(table = model,
columns = [model.name, model.number])
results = rdb_inner_join(tables = [records,tags],
joins = [(records.id,tags.record_id)]),
columns = [record.name, tag.name])
Well, we know the real reason is that C, Java, and friends lack
expressiveness and so constructing a binary query is an ASCII
nightmare. Still, it hasn't stopped binary APIs in other kinds of
libraries.
Well, the answer to that one is simple. SQL, in the hands of somebody
like me, can be used to express a few pathetic joins and what I do with
it could probably be handled with the kind of API you're describing.
But, the language has far more expressivity than that, and a
domain-specific language is really a good fit for what it can do.I'm not the biggest expert on SQL ever, but the only thing I can think
of is expressions. Statements don't express anything very complex,
and could straightforwardly be represented by function calls. But
it's a fair point.
Off the top of my head, I can think of a few things that would be
tricky to turn into an API:
* Aggregation (GROUP BY, aggregate functions over arbitrary
expressions, HAVING clauses).
* CASE expressions.
* Subqueries.
* Recursive queries (in DBMSes that support them).
* Window clauses (likewise).
* Set operations between queries (UNION, DIFFERENCE, INTERSECT).
* A surprisingly rich set of JOIN clauses beyond the obvious inner
natural joins.
* Various DBMS-specific locking hints.
* Computed inserts and updates.
* Updates and deletes that include joins.
* RETURNING lists on modification queries.
* Explicit (DBMS-side) cursors.
This is by no means an exhaustive list.
Of course, it's possible to represent all of this via an API rather
than a language, and libraries like SQLAlchemy make a reasonable
attempt at doing just that. However, not every programming language has
the kind of structural flexibility to do that well: a library similar
to SQLalchemy would be incredibly clunky (if it worked at all) in, say,
Java or C#, and it'd be nearly impossible to pull off in C.
I'm not the biggest expert on SQL ever, but the only thing I can think
of is expressions. Statements don't express anything very complex,
and could straightforwardly be represented by function calls.
I don't know the exact details of all of these, but I'm going to opine
that at least some of these are easily expressible with a function
call API. Perhaps more naturally than with string queries. For
instance, set operations:
query1 = rdb_query(...)
query2 = rdb_query(...)
final_query = rdb_union(query1,query2)
or
final_query = query1& query2
Then there's the type of SQL that results in DBA's having jobs-- and
deservedly so. Its *really* a very flexible and powerful language
capable of doing quite a lot to bend, flex, twist, and interleave that
data in the server while building up a result set for you.
But, see, that's not actually what's going on behind the scenes in the
database. Unless your "query1" and "query2" objects are opaque
pseudo-objects which do not actually represent results -- the query
planners do a *lot* of stuff by looking at the whole query and
computing just how to go about executing all of the instructions.
Kindly stop inventing straw men to attack; I deny the position you're
painting for me.
But, see, that's not actually what's going on behind the scenes in the
database. Unless your "query1" and "query2" objects are opaque
pseudo-objects which do not actually represent results
What am I passing, then?
This is off-topic, but I believe snprintf() in C can *never* safely be
the only thing you do to the buffer: you also have to NUL-terminate it
manually in some corner cases. See the documentation.
Want to reply to this thread or ask your own question?
You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.