Prepare Statements VS Statements

V

Vince

Hi everybody

for one of my customer I need to program an adapter that basically
receives about 25'000 MQ Messages, parses them and then finally writes
the data into an Oracle DB.

I must say at this point that every message has a variable length and
thus variable amount of data/fields that must be returned to the DB.

During a workshop I got an argue with another developer because he said
we should use PrepareStatements instead of normal Statements, they've
got a better performance - compiling once on DB Server. I generally
agreed, but said that in our case the PrepareStatements won't help in
terms of performance since we have a variable amount of columns for each
SQL Insert Statement. What IMHO would lead to an overhead because we
would have to concatenate the PrepareStatements individually (25000
times) plus we would then have to assign a value for each question
mark.. We basically better concatenate the Insert Statement once for
each row of data and then use a batch and commit it to the DB...

My question now: How would you solve it? Would the PrepareStatement
still improve the performance, considering though the variable amount of
columns?

Thanks for your opinion and help

Vince
 
A

Arne Vajhøj

Vince said:
for one of my customer I need to program an adapter that basically
receives about 25'000 MQ Messages, parses them and then finally writes
the data into an Oracle DB.

I must say at this point that every message has a variable length and
thus variable amount of data/fields that must be returned to the DB.

During a workshop I got an argue with another developer because he said
we should use PrepareStatements instead of normal Statements, they've
got a better performance - compiling once on DB Server. I generally
agreed, but said that in our case the PrepareStatements won't help in
terms of performance since we have a variable amount of columns for each
SQL Insert Statement. What IMHO would lead to an overhead because we
would have to concatenate the PrepareStatements individually (25000
times) plus we would then have to assign a value for each question
mark.. We basically better concatenate the Insert Statement once for
each row of data and then use a batch and commit it to the DB...

My question now: How would you solve it? Would the PrepareStatement
still improve the performance, considering though the variable amount of
columns?

Some comments:

1) PreparedStatement usually but not always performs better than
Statement went executed multiple times.

2) My experience is that the gain for Oracle of using prepared
statements can be really huge.

3) I will assume that you can reuse some prepared statements. 25000
distinct INSERT statements does not sound likely.

4) Besides the performance issue PreparedStatement is also much
better to handle date formats and strings with quotes in.

5) Multiple prepare statement calls can be batched just like
multiple ordinary statement calls can.

So based on the little info available I will recommend you to
look into prepared statements.

Arne
 
L

Lew

1) PreparedStatement usually but not always performs better than
Statement [when] executed multiple times.

2) My experience is that the gain for Oracle of using prepared
statements can be really huge.

3) I will assume that you can reuse some prepared statements. 25000
distinct INSERT statements does not sound likely.

All statements get "prepared" by the database engine, in the sense that the
SQL string gets sort-of-compiled by the engine just to run it. The
performance advantage of PreparedStatement comes perhaps from the ability of
the JDBC driver to understand that a statement will be reused, and certainly
from any ability the DBMS engine has to reuse its sort-of-compiled version of
the statement.

Some engines, e.g., Postgres, only take full advantage of a prepared (in its
sense) statement within a single "session", which I believe corresponds to the
lifetime of the Java connection to the database. If understand the
implications correctly, closing a connection would flush Postgres's cache of
prepared statements from that connection. This is likely not a problem if the
JDBC connections are pooled, as they should be in production. Pooled
connections rarely close from the DBMS's point of view. The point of this
whole paragraph is twofold: that getting the full performance benefit of
PreparedStatement depends on the tuning of a lot of operational factors, and
that operations is an art unto itself.
4) Besides the performance issue PreparedStatement is also much
better to handle date formats and strings with quotes in.

Often it seems that performance is pursued before correctness is assured.
Arne makes the point here that PreparedStatement has advantages irrespective
of its performance comparison to other kinds of Statement.
5) Multiple prepare statement calls can be batched just like
multiple ordinary statement calls can.

So based on the little info available I will recommend you to
look into prepared statements.

In addition to Arne's excellent advice, and expanding on point 4 above,
PreparedStatements provide a degree of type safety to run-time data in SQL
actions. This is particularly important for data provided by the public -
well-known "SQL injection" attacks will not work on PreparedStatement forms
because the text is better encapsulated than in the more casual forms.

Even in more internal usage, it's a lot harder to insert, say, a misconfigured
date to a query when the argument must be of type java.sql.Date to start.
PreparedStatement lets you enforce certain guarantees on dynamic data.
 
M

Martin Gregorie

Lew said:
1) PreparedStatement usually but not always performs better than
Statement [when] executed multiple times.

2) My experience is that the gain for Oracle of using prepared
statements can be really huge.

3) I will assume that you can reuse some prepared statements. 25000
distinct INSERT statements does not sound likely.

All statements get "prepared" by the database engine, in the sense that
the SQL string gets sort-of-compiled by the engine just to run it. The
performance advantage of PreparedStatement comes perhaps from the
ability of the JDBC driver to understand that a statement will be
reused, and certainly from any ability the DBMS engine has to reuse its
sort-of-compiled version of the statement.

Some engines, e.g., Postgres, only take full advantage of a prepared (in
its sense) statement within a single "session", which I believe
corresponds to the lifetime of the Java connection to the database. If
understand the implications correctly, closing a connection would flush
Postgres's cache of prepared statements from that connection. This is
likely not a problem if the JDBC connections are pooled, as they should
be in production. Pooled connections rarely close from the DBMS's point
of view. The point of this whole paragraph is twofold: that getting the
full performance benefit of PreparedStatement depends on the tuning of a
lot of operational factors, and that operations is an art unto itself.
4) Besides the performance issue PreparedStatement is also much
better to handle date formats and strings with quotes in.

Often it seems that performance is pursued before correctness is
assured. Arne makes the point here that PreparedStatement has advantages
irrespective of its performance comparison to other kinds of Statement.
5) Multiple prepare statement calls can be batched just like
multiple ordinary statement calls can.

So based on the little info available I will recommend you to
look into prepared statements.

In addition to Arne's excellent advice, and expanding on point 4 above,
PreparedStatements provide a degree of type safety to run-time data in
SQL actions. This is particularly important for data provided by the
public - well-known "SQL injection" attacks will not work on
PreparedStatement forms because the text is better encapsulated than in
the more casual forms.

Even in more internal usage, it's a lot harder to insert, say, a
misconfigured date to a query when the argument must be of type
java.sql.Date to start. PreparedStatement lets you enforce certain
guarantees on dynamic data.
Adding to the Arne and Lew's good advice, I'd add that the issue of NULL
columns is not a problem for PreparedStatement usage because it provides
a setNull() method.

As a table has a fixed number of columns, you can declare a
PreparedStatement for each table that stores MQ messages. As each
message is stored you:
- select the appropriate table's PreparedStatement
- put each field in the message into the appropriate column
- set the rest to NULL
- use executeUpdate() to store the row.

Apart from that, its up to you, your system recovery requirements and
performance considerations whether you let transaction commitment
default to automatic (commit after every SQL statement) or setAutoCommit
off and handle commits yourself. Committing batches of inserts should be
faster than committing every transaction.

Database committal can be tied to MQ's transactional capability, which
should centralise application restart & recovery in the program sending
the MQ messages.
 
R

Roedy Green

My question now: How would you solve it? Would the PrepareStatement
still improve the performance, considering though the variable amount of
columns?

Presumably there would be a number of common patterns.
Each one could have its own prepared statement.
 
R

Roedy Green

Hi everybody

for one of my customer I need to program an adapter that basically
receives about 25'000 MQ Messages, parses them and then finally writes
the data into an Oracle DB.

I must say at this point that every message has a variable length and
thus variable amount of data/fields that must be returned to the DB.

During a workshop I got an argue with another developer because he said
we should use PrepareStatements instead of normal Statements, they've
got a better performance - compiling once on DB Server. I generally
agreed, but said that in our case the PrepareStatements won't help in
terms of performance since we have a variable amount of columns for each
SQL Insert Statement. What IMHO would lead to an overhead because we
would have to concatenate the PrepareStatements individually (25000
times) plus we would then have to assign a value for each question
mark.. We basically better concatenate the Insert Statement once for
each row of data and then use a batch and commit it to the DB...

My question now: How would you solve it? Would the PrepareStatement
still improve the performance, considering though the variable amount of
columns?

You might invent a PreparedStatementCache. You can lookup using a key
based on the string you would feed to construct the Prepared
Statement. You would also maintain an LRU chain of the
PreparedStatements.. When you used a prepared stament you would
migrate it to the head of the chain. When the cache got too fat, you
would prune the least recently used element.

If it turned out using PreparedStatement had a higher overhead that
plain, you could avoid creating a PreparedStatement until you had seen
n instances where optimal n is to be found by experiment. Until
then, you use a regular Stament and record the usage.
 
V

Vince

Martin said:
Lew said:
Vince said:
My question now: How would you solve it? Would the PrepareStatement
still improve the performance, considering though the variable
amount of columns?
1) PreparedStatement usually but not always performs better than
Statement [when] executed multiple times.

2) My experience is that the gain for Oracle of using prepared
statements can be really huge.

3) I will assume that you can reuse some prepared statements. 25000
distinct INSERT statements does not sound likely.

All statements get "prepared" by the database engine, in the sense
that the SQL string gets sort-of-compiled by the engine just to run
it. The performance advantage of PreparedStatement comes perhaps from
the ability of the JDBC driver to understand that a statement will be
reused, and certainly from any ability the DBMS engine has to reuse
its sort-of-compiled version of the statement.

Some engines, e.g., Postgres, only take full advantage of a prepared
(in its sense) statement within a single "session", which I believe
corresponds to the lifetime of the Java connection to the database.
If understand the implications correctly, closing a connection would
flush Postgres's cache of prepared statements from that connection.
This is likely not a problem if the JDBC connections are pooled, as
they should be in production. Pooled connections rarely close from
the DBMS's point of view. The point of this whole paragraph is
twofold: that getting the full performance benefit of
PreparedStatement depends on the tuning of a lot of operational
factors, and that operations is an art unto itself.
4) Besides the performance issue PreparedStatement is also much
better to handle date formats and strings with quotes in.

Often it seems that performance is pursued before correctness is
assured. Arne makes the point here that PreparedStatement has
advantages irrespective of its performance comparison to other kinds
of Statement.
5) Multiple prepare statement calls can be batched just like
multiple ordinary statement calls can.

So based on the little info available I will recommend you to
look into prepared statements.

In addition to Arne's excellent advice, and expanding on point 4
above, PreparedStatements provide a degree of type safety to run-time
data in SQL actions. This is particularly important for data provided
by the public - well-known "SQL injection" attacks will not work on
PreparedStatement forms because the text is better encapsulated than
in the more casual forms.

Even in more internal usage, it's a lot harder to insert, say, a
misconfigured date to a query when the argument must be of type
java.sql.Date to start. PreparedStatement lets you enforce certain
guarantees on dynamic data.
Adding to the Arne and Lew's good advice, I'd add that the issue of NULL
columns is not a problem for PreparedStatement usage because it provides
a setNull() method.

As a table has a fixed number of columns, you can declare a
PreparedStatement for each table that stores MQ messages. As each
message is stored you:
- select the appropriate table's PreparedStatement
- put each field in the message into the appropriate column
- set the rest to NULL
- use executeUpdate() to store the row.

Apart from that, its up to you, your system recovery requirements and
performance considerations whether you let transaction commitment
default to automatic (commit after every SQL statement) or setAutoCommit
off and handle commits yourself. Committing batches of inserts should be
faster than committing every transaction.

Database committal can be tied to MQ's transactional capability, which
should centralise application restart & recovery in the program sending
the MQ messages.

I'm not sure if I'll be capable to prepare a preparedStatement for each
type of product that goes through this adapter. One requirement of the
adapter is that whenever a new product is introduced, the application
must react dynamically. Basically no change to the code (thus no new
release)is allowed.

Nevertheless, I'm convinced now that the preparedStatement will be the
best choice. Not because of the performance but much more because of the
accuracy of the data (setString, setInt, setNull a.s.o). Latter will
simplify the process distinctively..

Thanks for your more than valuable opinions guys!
 
L

Lew

Vince said:
I'm not sure if I'll be capable to prepare a preparedStatement for each
type of product that goes through this adapter. One requirement of the
adapter is that whenever a new product is introduced, the application
must react dynamically. Basically no change to the code (thus no new
release)is allowed.

A new product should be only a data change, therefore no need to change code.
The same PreparedStatement should work for any products, new, former,
current, whatever - the data structure should not change with the new
products, and it's the data structure that is reflected in the
PreparedStatement, not the data content.

So feel free to use PreparedStatement - it should not need any modification as
you add new rows to your tables.
 
A

Arne Vajhøj

Lew said:
A new product should be only a data change, therefore no need to change
code. The same PreparedStatement should work for any products, new,
former, current, whatever - the data structure should not change with
the new products, and it's the data structure that is reflected in the
PreparedStatement, not the data content.

"should" does not always mean "will".

Arne
 
L

Lew

Arne said:
"should" does not always mean "will".

And thus we provide the OP an opportunity to explain why data changes would
require code changes in their particular scenario.

They shouldn't, so if they do there is probably a mistake there.
 
V

Vince

Not true in my case, I've got 1600 possible attributes. A product
(financial transactions like options, swaps, futures etc) will only use
a subset of the 1600 possible attributes. Thus the data structure will
change for each product (~400 different product available). The point is
that I don't get a package of product X first and then Y and then Z. The
products will come through MQ fully mixed up.
 
M

Mark Rafn

Vince said:
I must say at this point that every message has a variable length and
thus variable amount of data/fields that must be returned to the DB.

During a workshop I got an argue with another developer because he said
we should use PrepareStatements instead of normal Statements, they've
got a better performance - compiling once on DB Server.

He's right. The other MAJOR advantage you get is automatic quoting. If
you're executing statments via string concatenation, you have to deal with crap
like quotes and SQL reserved words in the data.
We basically better concatenate the Insert Statement once for
each row of data and then use a batch and commit it to the DB... ....
My question now: How would you solve it? Would the PrepareStatement
still improve the performance, considering though the variable amount of
columns?

I'd still plan on using as few distinct statements, even if it means you have
a number of messages that are explicitly inserting nulls as opposed to
generating a custom insert that ignores irrelevant columns (which still
inserts nulls). Even if you generate them by concatenation, you can make use
of caching - Oracle and some others will only have to do a soft-parse rather
than a full parse if the text of the statement is identical to one in
the cache. C3p0 or other pooling system will do client-side caching based
on string comparisons.
 
M

Martin Gregorie

Vince said:
Not true in my case, I've got 1600 possible attributes. A product
(financial transactions like options, swaps, futures etc) will only use
a subset of the 1600 possible attributes. Thus the data structure will
change for each product (~400 different product available). The point is
that I don't get a package of product X first and then Y and then Z. The
products will come through MQ fully mixed up.
We never said you needed a PreparedStatement per product - just one per
table which gets re-used for each message that's stored in the table it
accesses.

We never mentioned any dependence on message/product ordering either.

Maybe some pseudo code will make things a bit clearer:

INITIALIZE
FOR EACH TABLE
CREATE PreparedStatement
LABEL with the associated product code(s) for this table
END-FOR
SET AUTO-COMMIT off
END-INITIALIZE

FOR EACH MESSAGE
SELECT PreparedStatement USING product code
PARSE message
FOR EACH FIELD
ATTACH it to the prepared statement column
END-PARSE
FOR EACH UNUSED COLUMN
SET prepared statement column to null
END-FOR
INSERT ROW
COMMIT
END-FOR
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top