Recordsets vs. Arrays?

B

Bob Barrows [MVP]

Chris said:
Can you post your code?

Sure. It's pretty raw, no comments ...


Public Declare Function GetTickCount& Lib "kernel32" ()

Dim cn As New ADODB.Connection
Dim sSQL As String

Private Sub Form_Load()
Dim rs As New Recordset

sSQL = "Select CompanyName,ContactName,City FROM Northwind..Suppliers"
cn.Open "provider=sqloledb;data source=clnsqldev7;" & _
integrated security=SSP1;initial catalog=northwind"

'use disconnected recordset
'rs.CursorLocation = adUseClient
'rs.Open sSQL, cn, adOpenStatic, adLockReadOnly, 1
'Set rs.ActiveConnection = Nothing

TestRSNewRS
TestArNewRS

cn.Close: Set cn = Nothing

'TestArray rs
'rs.MoveFirst
'TestRSloop rs
'Set rs = Nothing
End Sub
Sub TestArray(pRS As Recordset)
Dim arData As Variant
Dim i As Integer
Dim loops As Long
Dim ticks As Long
Dim s As String
ticks = GetTickCount
If Not pRS.EOF Then arData = pRS.GetRows
If IsArray(arData) Then
For loops = 0 To 10000
For i = 0 To UBound(arData, 2)
s = arData(0, i)
Next
Next
End If
txtArray.Text = GetTickCount - ticks
End Sub

Sub TestRSloop(pRS As Recordset)
Dim loops As Long
Dim ticks As Long
Dim s As String
Dim fld As ADODB.Field
Set fld = pRS.Fields(0)
ticks = GetTickCount
For loops = 0 To 1000000
Do Until pRS.EOF
s = fld.Value 'pRS ("CompanyName")
pRS.MoveNext
Loop
Next
txtRsLoop.Text = GetTickCount - ticks
End Sub

Sub TestArNewRS()
Dim rs As New Recordset
Dim arData As Variant
Dim i As Integer
Dim loops As Long
Dim ticks As Long
Dim s As String
ticks = GetTickCount
'rs.CursorLocation = adUseClient
rs.Open sSQL, cn, adOpenForwardOnly, adLockReadOnly, 1
'Set rs.ActiveConnection = Nothing
If Not rs.EOF Then arData = rs.GetRows
rs.Close: Set rs = Nothing
If IsArray(arData) Then
For loops = 0 To 100000
For i = 0 To UBound(arData, 2)
s = arData(0, i)
Next
Next
End If
txtArray.Text = GetTickCount - ticks


End Sub
Sub TestRSNewRS()
Dim rs As New Recordset
Dim loops As Long
Dim ticks As Long
Dim s As String
Dim fld As ADODB.Field

ticks = GetTickCount
rs.Open sSQL, cn, adOpenForwardOnly, adLockReadOnly, 1
Set fld = rs.Fields(0)
For loops = 0 To 100000
Do Until rs.EOF
s = rs("CompanyName")
rs.MoveNext
Loop
Next
txtRsLoop.Text = GetTickCount - ticks

End Sub
 
C

Chris Hohmann

Bob Barrows said:
Sure. It's pretty raw, no comments ...


Public Declare Function GetTickCount& Lib "kernel32" ()

Dim cn As New ADODB.Connection
Dim sSQL As String

Private Sub Form_Load()
Dim rs As New Recordset

sSQL = "Select CompanyName,ContactName,City FROM Northwind..Suppliers"
cn.Open "provider=sqloledb;data source=clnsqldev7;" & _
integrated security=SSP1;initial catalog=northwind"

'use disconnected recordset
'rs.CursorLocation = adUseClient
'rs.Open sSQL, cn, adOpenStatic, adLockReadOnly, 1
'Set rs.ActiveConnection = Nothing

TestRSNewRS
TestArNewRS

cn.Close: Set cn = Nothing

'TestArray rs
'rs.MoveFirst
'TestRSloop rs
'Set rs = Nothing
End Sub
Sub TestArray(pRS As Recordset)
Dim arData As Variant
Dim i As Integer
Dim loops As Long
Dim ticks As Long
Dim s As String
ticks = GetTickCount
If Not pRS.EOF Then arData = pRS.GetRows
If IsArray(arData) Then
For loops = 0 To 10000
For i = 0 To UBound(arData, 2)
s = arData(0, i)
Next
Next
End If
txtArray.Text = GetTickCount - ticks
End Sub

Sub TestRSloop(pRS As Recordset)
Dim loops As Long
Dim ticks As Long
Dim s As String
Dim fld As ADODB.Field
Set fld = pRS.Fields(0)
ticks = GetTickCount
For loops = 0 To 1000000
Do Until pRS.EOF
s = fld.Value 'pRS ("CompanyName")
pRS.MoveNext
Loop
Next
txtRsLoop.Text = GetTickCount - ticks
End Sub

Sub TestArNewRS()
Dim rs As New Recordset
Dim arData As Variant
Dim i As Integer
Dim loops As Long
Dim ticks As Long
Dim s As String
ticks = GetTickCount
'rs.CursorLocation = adUseClient
rs.Open sSQL, cn, adOpenForwardOnly, adLockReadOnly, 1
'Set rs.ActiveConnection = Nothing
If Not rs.EOF Then arData = rs.GetRows
rs.Close: Set rs = Nothing
If IsArray(arData) Then
For loops = 0 To 100000
For i = 0 To UBound(arData, 2)
s = arData(0, i)
Next
Next
End If
txtArray.Text = GetTickCount - ticks


End Sub
Sub TestRSNewRS()
Dim rs As New Recordset
Dim loops As Long
Dim ticks As Long
Dim s As String
Dim fld As ADODB.Field

ticks = GetTickCount
rs.Open sSQL, cn, adOpenForwardOnly, adLockReadOnly, 1
Set fld = rs.Fields(0)
For loops = 0 To 100000
Do Until rs.EOF
s = rs("CompanyName")
rs.MoveNext
Loop
Next
txtRsLoop.Text = GetTickCount - ticks

End Sub

Here are some preliminary observations:
1. Your retrieving 3 columns but only using 1.
2. In TestArNewRS you only need to call UBound once, assign it to a local
variable then use that value in the limiting clause of the For Loop.
3. In TestRSNewRS, you went through the trouble of explicitly creating a
reference to the field object, but then you forget to use it in the loop.
4. In TestArNewRS , you could close and deallocate the connection object
immediately after the call to GetRows. That's one of the key benefits of
GetRows.
5. What kind of numbers are you getting from this? Does recordset iteration
beat GetRows? In the tests I've done, GetRows continues to consistently beat
Recordset Iteration. In fact, I've identified a flaw in my original model.
Namely, there's a call to Server.HTMLEncode inside the loops for both
GetRows and Recordset iteration, but only one call in GetString. In my new
model, I've eliminate the Server.HTMLEncode calls and the results are
surprising. It turns out GetRows is just as fast and sometimes faster than
GetString. Both GetString/GetRows perform about three times (3x) as fast as
Recordset Iteration. I do want to tweak it some more. I like the idea of
eliminating the Response.Writes altogether like you did above. I'm going to
modify my code accordingly and then I'll post what I come up with.
 
C

Chris Hohmann

Here's my modified test code:

<%
Option Explicit
Sub BaseLine(db,sql)
Dim cn, rs
Set cn = CreateObject("ADODB.Connection")
cn.CursorLocation = 3
cn.Open db
Set rs = cn.Execute(sql,,1)
rs.Close: Set rs = Nothing
cn.Close: Set cn = Nothing
End Sub

Sub SaveXML(db,sql)
Dim s, cn, rs
Set cn = CreateObject("ADODB.Connection")
cn.CursorLocation = 3
cn.Open db
Set rs = cn.Execute(sql,,1)
rs.Save Response, 1 '1 = adPersistXML
rs.Close: Set rs = Nothing
cn.Close: Set cn = Nothing
End Sub

Sub GetString(db,sql)
Dim cn, rs, s
Set cn = CreateObject("ADODB.Connection")
cn.CursorLocation = 3
cn.Open db
Set rs = cn.Execute(sql,,1)
Do While NOT rs.EOF
s = rs.GetString(2,2000)
Loop
rs.Close: Set rs = Nothing
cn.Close: Set cn = Nothing
End Sub

Sub GetRows(db,sql)
Dim cn, rs, arr, j, jMax, s
Set cn = CreateObject("ADODB.Connection")
cn.CursorLocation = 3
cn.Open db
Set rs = cn.Execute(sql,,1)
arr = rs.GetRows
rs.Close: Set rs = Nothing
cn.Close: Set cn = Nothing
jMax = UBound(arr,2)
For j = 0 To jMax
s = arr(0,j)
Next
End Sub

Sub Recordset(db,sql)
Dim cn, rs, field0, s
Set cn = CreateObject("ADODB.Connection")
cn.CursorLocation = 3
cn.Open db
Set rs = cn.Execute(sql,,1)
Set field0 = rs.Fields(0)
Do While Not rs.EOF
s = field0.Value
rs.MoveNext
Loop
rs.Close: Set rs = Nothing
cn.Close: Set cn = Nothing
End Sub

Dim db, sql, i, iMax, start, bl, sx, gs, gr, rs
db = <<Your Connection String Here>>
sql = "SELECT TOP 1000000 C1.dt FROM Calendar AS C1, Calendar C2"
iMax = 1

start = Timer : For i = 1 To iMax : Baseline db, sql : Next : bl = Timer -
start
start = Timer : For i = 1 To iMax : SaveXML db, sql : Next : sx = Timer -
start
start = Timer : For i = 1 To iMax : GetString db, sql : Next : gs = Timer -
start
start = Timer : For i = 1 To iMax : GetRows db, sql : Next : gr = Timer -
start
start = Timer : For i = 1 To iMax : Recordset db, sql : Next : rs = Timer -
start

Response.Clear
Response.Write "<br>Baseline: " & bl
Response.Write "<br>SaveXML: " & sx
Response.Write "<br>GetString: " & gs
Response.Write "<br>GetRows: " & gr
Response.Write "<br>Recordset: " & rs
%>

And here are the results:
Baseline: 1.34375
SaveXML: 6.328125
GetString: 4.984375
GetRows: 3.734375
Recordset: 9.75

Notes:
1. I used a Cartesian product of my Calendar table which has 1000+ rows as
the datasource. This allows me to range the result set from 1 to 1 million+
rows. Any table will do.
2. I've added a Baseline procedure to calculate the cost of the data
retrieval common to all methods.
3. I also added a SaveXML procedure that persists the recordset to the
Response stream. I've wanted to incorporate this method into the mix for
sometime. For very large resultsets (1 million+), with all methods
outputting to the Response object, the SaveXML method was actually the best
performer. I think this has to do with the fact that the Recordset.Save
method sends the data directly to the IStream interface implemented by the
Response object.
4. Most surprisingly, the GetRows method is now the best overall performer.
I guess in hindsight, this is not that surprising. The GetString method has
to deal with the performance issued associated with string concatenation. I
implemented the burst output approach in the GetString procedure to mitigate
this performance issue. GetString and GetRows now perform equally well for
small to medium resultsets, but GetRows starts to pull away for larger
resultsets. However the tradeoff is that GetRows requires more memory than
GetString.
 
B

Bob Barrows [MVP]

Chris said:
Here are some preliminary observations:
1. Your retrieving 3 columns but only using 1.

True, but I wanted to cause some "overhead" for the field access.
2. In TestArNewRS you only need to call UBound once, assign it to a
local variable then use that value in the limiting clause of the For
Loop.

I think the compiler does that for you, but I could be wrong ... I'll check
it both ways tomorrow.
3. In TestRSNewRS, you went through the trouble of explicitly
creating a reference to the field object, but then you forget to use
it in the loop.

Oops - told you the code was raw ...
4. In TestArNewRS , you could close and deallocate the connection
object immediately after the call to GetRows. That's one of the key
benefits of GetRows.

True, but that's not what I was testing. :)
5. What kind of numbers are you getting from this? Does recordset
iteration beat GetRows? In the tests I've done, GetRows continues to
consistently beat Recordset Iteration. In fact, I've identified a
flaw in my original model. Namely, there's a call to
Server.HTMLEncode inside the loops for both GetRows and Recordset
iteration, but only one call in GetString. In my new model, I've
eliminate the Server.HTMLEncode calls and the results are surprising.
It turns out GetRows is just as fast and sometimes faster than
GetString. Both GetString/GetRows perform about three times (3x) as
fast as Recordset Iteration. I do want to tweak it some more. I like
the idea of eliminating the Response.Writes altogether like you did
above. I'm going to modify my code accordingly and then I'll post
what I come up with.

I was seeing faster results from the recordset loops, which definitely
surprised me. Maybe it's a VB vs vbscript difference ... ?

Bob Barrows
 
M

Mark J. McGinty

I modified Chris' ASP script to call my timer object (with a few interim
timings added in.) Below the results is the ASP script as modified.
Needless to say, you'd need to download (and build, if you wish) and
register the high-res timer component to run it.

These times are in milliseconds (which is all that ResetTimer returns --
thinking about adding a property to set the increment it returns, and then
adding an Elapsed property to take the place of Seconds, Milliseconds,
MicroSeconds and NanoSeconds.)

-Mark


GetString
4.17743301330265
5.79479685138404
2.40294423462406
10.5239937256359
3.04610371280849
0.701303824289221

27.6612958205446


--------------------------------------------------------------------------------

GetRows
0.610937259932325
0.580072819783542
1.86451804236982
5.30469612568738
3.30435596654149

12.9090753173806


--------------------------------------------------------------------------------

Recordset
0.67432548799617
1.19991424559347
22.4348305814506

25.5832393012445


total: 67.1136596050295



''''''''''''''''''''''''''''''''''''''''''''


<%

Dim hrt, hrt2, hrt3, sConn
Set hrt = CreateObject("HighResTimer.HRTimer")
Set hrt3 = CreateObject("HighResTimer.HRTimer")
sConn =
"Provider=SQLOLEDB;Server=(local);Database=misc;Trusted_Connection=Yes;"

Function GetString
Dim cn, rs, res, tdat(5)

hrt.StartTimer
Set cn = CreateObject("ADODB.Connection")
Set rs = CreateObject("ADODB.Recordset")

tdat(0) = hrt.ResetTimer

With cn

.Open sConn
tdat(1) = hrt.ResetTimer

cn.TheBigList rs
tdat(2) = hrt.ResetTimer

res = rs.GetString(2,," ",vbCRLF,"Category:")

tdat(3) = hrt.ResetTimer

Response.Write Server.HTMLEncode(res)
rs.Close: Set rs = Nothing

tdat(4) = hrt.ResetTimer

End With
cn.Close: Set cn = Nothing

tdat(5) = hrt.ResetTimer

GetString = tdat
End Function

Function GetRows
Dim cn, rs, res, tdat(4)

hrt.StartTimer
Set cn = CreateObject("ADODB.Connection")
Set rs = CreateObject("ADODB.Recordset")

tdat(0) = hrt.ResetTimer

With cn
.Open sConn
tdat(1) = hrt.ResetTimer

cn.TheBigList rs
tdat(2) = hrt.ResetTimer

Dim arr: arr = rs.GetRows
rs.Close: Set rs = Nothing
tdat(3) = hrt.ResetTimer

End With
cn.Close: Set cn = Nothing
jMax = UBound(arr,2)
For j = 0 To jMax
If vartype(arr(0,j)) <> 8 Then
Response.Write arr(0,j)
Response.Write " "
Response.Write Server.HTMLEncode(arr(1,j))
Else
Response.Write "Category: "
Response.Write arr(1,j)
End If
Response.Write vbCRLF
Next

tdat(4) = hrt.ResetTimer

GetRows = tdat
End Function

Function Recordset
Dim cn, rs, res, tdat(2)

hrt.StartTimer
Set cn = CreateObject("ADODB.Connection")
Set rs = CreateObject("ADODB.Recordset")

tdat(0) = hrt.ResetTimer

With cn
.Open sConn
cn.TheBigList rs
tdat(1) = hrt.ResetTimer
set fld0 = rs.Fields(0)
set fld1 = rs.Fields(1)
set fld2 = rs.Fields(2)

With rs
Do While Not .EOF
With .Fields
If vartype(fld0.Value) <> 8 Then
Response.Write fld0.Value
Response.Write " "
Res = fld1.Value
Response.Write Server.HTMLEncode(res)
Else
Response.Write "Category: "
Response.Write fld1.Value
End If
End With
Response.Write vbCRLF
.MoveNext
Loop
End With

tdat(2) = hrt.ResetTimer

rs.Close: Set rs = Nothing
End With
cn.Close: Set cn = Nothing
Recordset = tdat

End Function

Set hrt2 = CreateObject("HighResTimer.HRTimer")
hrt2.StartTimer

Response.write "<pre>"
hrt3.StartTimer
gs = GetString()
t1 = hrt3.ResetTimer
gr = GetRows()
t2 = hrt3.ResetTimer
rs = Recordset()
t3 = hrt3.ResetTimer

Response.Clear
Response.Write "<hr>GetString<br>" & join(gs, "<br>") & "<br><b>" & t1 &
"</b><br>"
Response.Write "<hr>GetRows<br>" & join(gr, "<br>") & "<br><b>" & t2 &
"</b><br>"
Response.Write "<hr>Recordset<br>" & join(rs, "<br>") & "<br><b>" & t3 &
"</b><br>"

Response.Write "<br><br>total: " & hrt2.MilliSeconds


Response.write "</pre>"
%>
</body>
</html>
 
M

Michael D. Kersey

I'm puzzled about some things:
- The benchmarks don't appear to consider that the database caches SQL
requests. The first SQL request in one benchmark is done by the
user-defined Function GetString (not to be confused with the GetString()
method) and thus that method may be penalized somewhat. The initial
request for a given SQL statement requires more processing time (for SQL
compilation) than subsequent requests. To balance this it is necessary
to either wait until the database de-caches (if that's a word) the
compiled SQL or, prior to the benchmark timings, perform an initial
set-up SQL query that caches the SQL statement. Once that set-up query
executes, successive calls using the various methods should be timed
without bias. Merely swapping the order of subroutine calls in the
benchmark (or taking the timings from only a second run of the
benchmark) might also show whether this is a problem.

- The result set (a million rows) seems awfully large: perhaps testing
also with a range of smaller values (and some variety of column counts,
also - in my experience most result sets are more than two columns)
would be useful?

- As one poster stated earlier, the size of the result set may push the
server to extremes (out of physical memory). This is a problem that is
easily fixed these days: memory is cheap, programming isn't. [Well, it
wasn't until recently!8-(] How much memory is on the IIS server? The
database server?

- One of the benchmarks had multiple rs.GetString() method calls, which
may bias the test results. While multiple GetString() method calls are
interesting, a second subroutine that performs only a single GetString()
method call (as is done in another of the benchmarks) would be nice for
comparison.
 
M

Mark J. McGinty

Bob Barrows said:
I tried to dl it, but IE never prompted for a destination folder. I'm not
sure why ..
I'll try again tomorrow.

My bad, the virtual dir was set for immediate expiry, it's been corrected.


I've added an alternative interface to it, that's more streamlined and more
functional, details are below. (I just updated the zip, latest DLL version
is 1.0.0.2) An updated copy of the ASP script is included in the zip.

I'm thinking if this new interface works out well, I'll copy the code into
another component with just one interface.


-Mark


---------------------------------------------------------------

Interface: HighResTimer.HiResTimer
Thread Model: Both

-----------------------

Methods:
-----------

HiResTimer.Start()

Internally stores start time, the reference point from which
time elapsed is calculated


Properties:
-----------

HiResTimer.Elapsed

This property returns the time elapsed since start time, in
increments as set in the Units property. (See Units below.)
Return value type is double.

Note that this property does not cause the start time to be reset.
Therefore it can be called more than once consecutively, to
return progressive time-elapsed values if desired.


HiResTimer.Reset

Calculates time elapsed between start time and current time, and
resets the start time, effectively restarting the timer. Results
are returned in increments as defined by the value of the
Units property (default is microseconds.)


HiResTimer.Units

Sets or retrieves the incremental units that are represented
by the results. The default value is 1000000, which causes
the elapsed time to be returned as microseconds. Valid values
must be even multiples of 10, up to and including 1000000000
(1 billion.) Below is a list of typical values (which is a
subset of all valid values:)

1 = Seconds
1,000 = Milliseconds
1,000,000 = Microseconds (default)
1,000,000,000 = Nanoseconds



HiResTimer.FrequencyMHz

Returns the frequency, in megahertz, upon which this timer is
based, thereby indicating its resolution.(Typically this will be
the CPU clock speed.)


HiResTimer.QueryCounter

Returns the raw value obtained by calling QueryPerformanceCounters
and casting the 64 bit int to a double.


---------------------------------------------------------------

Example (VBS):
-------------

set hrt = CreateObject("HighResTimer.HiResTimer")
WScript.Echo "Reality check (in seconds)"

' set to return seconds
hrt.Units = 1
hrt.Start : t = Timer
for i = 0 to 1000000 : Next
t2 = Timer
WScript.Echo hrt.Elapsed
WScript.Echo t2 - t

WScript.Echo "Timer Frequency: " & hrt.FrequencyMHz & " MHz"
 
C

Chris Hohmann

Bob Barrows said:
True, but I wanted to cause some "overhead" for the field access.
That may unfairly penalize the GetRows method since it is loading all three
fields. You could specify just the "CompanyName" field in the third
parameter of the GetRows method.
True, but that's not what I was testing. :)
Fair enough. :) Although it might be useful for us to come to a consensus on
what is being tested. All three of us are running separate test with varying
results. I'm reminder of the story of three blind men describing an
elephant.
I was seeing faster results from the recordset loops, which definitely
surprised me. Maybe it's a VB vs vbscript difference ... ?

Perhaps, but it seems unlikely to me. I'm interested to see your findings
when the code is a little less "raw". :)
 
C

Chris Hohmann

Michael D. Kersey said:
I'm puzzled about some things:
- The benchmarks don't appear to consider that the database caches SQL
requests. The first SQL request in one benchmark is done by the
user-defined Function GetString (not to be confused with the GetString()
method) and thus that method may be penalized somewhat. The initial
request for a given SQL statement requires more processing time (for SQL
compilation) than subsequent requests. To balance this it is necessary to
either wait until the database de-caches (if that's a word) the compiled
SQL or, prior to the benchmark timings, perform an initial set-up SQL
query that caches the SQL statement. Once that set-up query executes,
successive calls using the various methods should be timed without bias.
Merely swapping the order of subroutine calls in the benchmark (or taking
the timings from only a second run of the benchmark) might also show
whether this is a problem.

You make a very compelling point. It's one that echoes what Mark said
earlier about not clouding the issue by including object creation and
opening time in our metrics. I suggested that object creation and opening
time be measured separately and we've both adopted this in our respective
testing models. When you discount object creation and opening times, Marks
results are as follows (approximately)

GetString:
10.52
03.05
00.70
------
14.27


GetRows:
5.30
3.30
-----
8.60


Recordset:
22.43


This is roughly in line with what I observed in my new testing model,
although the GetString time seems a little high. It may be that the
GetString procedure is in fact being penalized for being first to the party
and GetRows/Recordset are getting a free ride on the coattails of GetStrings
hard work. :) It may be worthwhile to redesign the testing model to open the
recordset once and perform all three tests on the same recordset.

- The result set (a million rows) seems awfully large: perhaps testing
also with a range of smaller values (and some variety of column counts,
also - in my experience most result sets are more than two columns) would
be useful?

Whose test are you talking about? I believe Mark's test uses the same data
as the original article which uses approximately 400 rows and two columns.
As far as I can tell, the only changes he made was to replace the Timer
calls with call to his higher resolution timer component. And he used
explicit field object references which he discussed earlier in the thread.

- As one poster stated earlier, the size of the result set may push the
server to extremes (out of physical memory). This is a problem that is
easily fixed these days: memory is cheap, programming isn't. [Well, it
wasn't until recently!8-(] How much memory is on the IIS server? The
database server?

Yes, this is a valid point. Memory utilization should be taken into account
when deciding among these various methods. This is especially true when the
selected method will be scale for many concurrent users. I mention it in
passing in the original article but it could certainly do with a more
thorough treatment. However, for the purposes of the original test execution
speed was the only metric.
- One of the benchmarks had multiple rs.GetString() method calls, which
may bias the test results. While multiple GetString() method calls are
interesting, a second subroutine that performs only a single GetString()
method call (as is done in another of the benchmarks) would be nice for
comparison.

Oh, I see, you're talking about the revised code I posted in response to
Bob. That's what the million row reference was about as well? The purpose of
that code was to incorporate some of the observations that had been made
thus far in the thread. Specifically:
1. Mitigate the 15ms Timer() resolution issue by using a larger resultset. I
opted for this instead of an inner loop suggested by Mark, since it was more
representative of how one moves through a recordset. Namely, in one pass. It
also eliminates any possible contamination of results by introducing
addition loop variable costs.
2. Create a baseline measure for object creation and opening time.
3. Used explicit field object references in the recordset iteration method.
4. Added the SaveXML procedure to test the performance of persisting data
directly to Response using the Recordset.Save method. I've always been
curious to see how this would perform.
5. Use multiple calls to GetString to mitigate the effects of large string
concatenation. I did start out using a single call, but the script was
timing out. Large string concatenations in VBScript are terrible performers.
 
C

Chris Hohmann

Yes, I think creating a COM and calling it from ASP would be better.

You caught that I created and published this component, yeah?

http://www.databoundzone.com/HighResTimer.zip.

[Chris] Yes, but I had the same problem Bob had. Also, I don't have VB6 at
work and I use Linux at home.

You make a very interesting point about the memory allocation requirements
for GetRows. I had never considered the fact that the values themselves
are
variant and as such take up much more memory than would be expected.
However, the field values in a Recordset are also stored as variants. I
believe the reason a Recordset requires less memory is because only a
"page"
worth of data is loaded at any one time.

Server cursors yes, but client cursors read all the data into the recordset
at once, right? Given that, and looking back at my test results, something
doesn't quite add-up:

I reviewed the structure definition for variants, it looks to me that the
actual size of it (not counting string, array, object or other data to which
a variant may point is about 128 bits. (It's a structure with one member
that's a nested union.)

Anyway the point is, ADO must store data for a client cursor in a more
efficient block, and wrap variants around only the fields in the current
row, otherwise it would take 4x as much space.

Just a theory...

[Chris] I think the data is stored in it's native binary format, perhaps a
bytearray, and a PAGE worth of data is cast into variants as you move
through the recordset. I suspect that the difference between server-side and
client-side cursors is that with server-side cursors request multiple binary
chunks as need, whereas client-side cursors request one huge binary chunk.
 
C

Chris Hohmann

Mark J. McGinty said:
I modified Chris' ASP script to call my timer object (with a few interim
timings added in.) Below the results is the ASP script as modified.
Needless to say, you'd need to download (and build, if you wish) and
register the high-res timer component to run it.

These times are in milliseconds (which is all that ResetTimer returns --
thinking about adding a property to set the increment it returns, and then
adding an Elapsed property to take the place of Seconds, Milliseconds,
MicroSeconds and NanoSeconds.)

-Mark


GetString
4.17743301330265
5.79479685138404
2.40294423462406
10.5239937256359
3.04610371280849
0.701303824289221

27.6612958205446


--------------------------------------------------------------------------------

GetRows
0.610937259932325
0.580072819783542
1.86451804236982
5.30469612568738
3.30435596654149

12.9090753173806


--------------------------------------------------------------------------------

Recordset
0.67432548799617
1.19991424559347
22.4348305814506

25.5832393012445


total: 67.1136596050295

So if I'm interpreting these numbers correctly, and we discount the object
creation and opening times, the results are as follows:

GetString:
10.52
03.05
00.70
------
14.27


GetRows:
5.30
3.30
-----
8.60


Recordset:
22.43

This is in line with the result of my new testing model as well. To quote
from my response to Michael, It may be that the
GetString procedure is in fact being penalized for being first to the party
and GetRows/Recordset are getting a free ride on the coattails of
GetString's hard work. :) It may be worthwhile to redesign the testing model
to open the recordset once and perform all three tests on the same
recordset. I'd still like to see an example of recordset iteration
outperforming GetRows, but both our results seem to confirm that this is
unlikely.
 
B

Bob Barrows [MVP]

Chris said:
Fair enough. :) Although it might be useful for us to come to a
consensus on what is being tested.

Yes. Part of the reason the code was so raw (it really wasn't intended to be
distributed) was I kept doing new tests, without saving old ones.
Perhaps, but it seems unlikely to me.

Frankly, I'm somewhat baffled by what I was seeing.
I'm interested to see your
findings when the code is a little less "raw". :)

When I get a few minutes free, I'll revisit this. I need to clean things up,
especially regarding the reporting of results. I'll also utilize Mark's dll
instead of GetTickCount so that confounding factor will be eliminated. Then
I'll need to write a similar test for asp ... did I say I need to get a
"few" minutes free?
:)

Bob
 
B

Bob Barrows [MVP]

Chris said:
[Chris] I think the data is stored in it's native binary format,
perhaps a bytearray, and a PAGE worth of data is cast into variants
as you move through the recordset. I suspect that the difference
between server-side and client-side cursors is that with server-side
cursors request multiple binary chunks as need, whereas client-side
cursors request one huge binary chunk.

David Sceppa addresses this somewhat in his "Programming ADO" book. I'll
need to look it up and refresh my memory about it when I get home. I DO
remember that when a client-side cursor is constructed, a server-side
firehose cursor is used to populate it. Hiow does that information affect
your theory?

Bob Barrows
 
B

Bob Barrows [MVP]

Chris said:
It may be that the
GetString procedure is in fact being penalized for being first to the
party and GetRows/Recordset are getting a free ride on the coattails
of GetString's hard work. :) It may be worthwhile to redesign the
testing model to open the recordset once and perform all three tests
on the same recordset.

Or randomize the order of tests to remove that confounding factor ...

Bob Barrows
 
M

Mark J. McGinty

Bob Barrows said:
Or randomize the order of tests to remove that confounding factor ...

Worth trying but I doubt it'll make a lot of difference, the recordset is
created by calling a stored procedure, so it's query plan is already
compiled and cached.

To eliminate conjecture, I'm using the same table schema, and a stored
procedure with the same name, but I had no access to Chris' original data,
so I just grabbed 500 rows I knew would be non-repetitive and roughly the
same size to stuff that table with. So the SP is just a select statement,
and its query plan is very uncomplicated.

(btw, I forget who said it -- Chris I think -- but I like the analogy of 3
blind guys describing an elephant... we've definitely got a huge beast
before us, and we've all got quite a bit invested in feeling around it...
funny.) :)

-Mark
 
C

Chris Hohmann

Mark J. McGinty said:
To eliminate conjecture, I'm using the same table schema, and a stored
procedure with the same name, but I had no access to Chris' original data,
so I just grabbed 500 rows I knew would be non-repetitive and roughly the
same size to stuff that table with. So the SP is just a select statement,
and its query plan is very uncomplicated.

http://aspfaq.com/inserts_2467.txt
 
M

Michael D. Kersey

Mark said:
Worth trying but I doubt it'll make a lot of difference, the recordset is
created by calling a stored procedure, so it's query plan is already
compiled and cached.

I don't know which database you're using but IIRC with SQL Server 7 and
2000 Microsoft changed to a scheme whereby stored procedures are no
longer precompiled. Instead both dynamic SQL and stored procedures are
dynamically compiled and cached. Here's a URL that references the change:
http://weblogs.asp.net/fbouma/archive/2003/11/18/38178.aspx

[I'm somewhat curious to know the reasons this change was made. All I
can think of is that it simplified compilation (now done in one place
instead of two) and always produces an up-to-date execution plan.]
 
M

Mark J. McGinty

Michael D. Kersey said:
I don't know which database you're using but IIRC with SQL Server 7 and
2000 Microsoft changed to a scheme whereby stored procedures are no longer
precompiled. Instead both dynamic SQL and stored procedures are
dynamically compiled and cached. Here's a URL that references the change:
http://weblogs.asp.net/fbouma/archive/2003/11/18/38178.aspx

That guy is something of a zealot, yeah? :) (I'm running 2K.) He raises
some interesting points however, not all cache objects are equal. I've seen
some instances of query plan cache pollution that get pretty severe, to the
extent that the overhead involved in looking for a cached plan exceeds
recompiling. I've also seen many, many quasi-redundant cache entries, that
properly should've reused the same plan, but had discrete lexical
differences that would be ignored by a query plan, but are separately
distinguished in plan cache. The query plan for a SP has got to be a simple
object id lookup

SELECT objtype, cacheobjtype, COUNT(*) AS Expr1
FROM syscacheobjects
GROUP BY objtype, cacheobjtype
ORDER BY objtype

Adhoc Executable Plan 15
Adhoc Compiled Plan 24
Default Parse Tree 1
Prepared Compiled Plan 47
Prepared Executable Plan 45
Proc Executable Plan 41
Proc Compiled Plan 43
Proc Extended Proc 3
SysTab Parse Tree 17
View Parse Tree 7

Also his rant about role security is a little out there, how are you going
to implement impersonation via roles? A context that executes ad hoc sql
(even dynamic sql in a SP) must have access privileges to all db objects
referenced. If that context provides a way for the user to compose and
execute SQL, it allows the user access to anything in those db objects. (For
purposes of this rant let's say that row-level security is impractical,
because really it sort-of is.)

With stored procedures using inheritance, precisely restricted access to
data can be provided without having to grant blanket access to those
underlying db objects. The user with exec permissions can only access data
as provided by the SP, s/he neither needs not has direct explicit access to
the tables.

Plus there are developmental issues, like reuse and consistency... they
surely still have their advantageous places... to me the prospect of "going
stored procedure free" seems a little whacked, but then again that might
just be me. :)
[I'm somewhat curious to know the reasons this change was made. All I can
think of is that it simplified compilation (now done in one place instead
of two) and always produces an up-to-date execution plan.]

The BOL topics about sp_recompile and WITH RECOMPILE are interesting... I'd
say they did it to blur the lines, but even so, the lines are still there to
some extent.

Anyway, struggling to get back to the topic, what I was trying to say is,
after the first pass through the tests, wouldn't the advantage be gone? And
also, at least on my end, that query plan consists of one step, a clustered
index scan... is that even worth caching?


-Mark
 
M

Michael D. Kersey

Mark said:
I don't know which database you're using but IIRC with SQL Server 7 and
2000 Microsoft changed to a scheme whereby stored procedures are no longer
precompiled. Instead both dynamic SQL and stored procedures are
dynamically compiled and cached. Here's a URL that references the change:
http://weblogs.asp.net/fbouma/archive/2003/11/18/38178.aspx


That guy is something of a zealot, yeah? :) (I'm running 2K.) He raises
some interesting points however, not all cache objects are equal. I've seen
some instances of query plan cache pollution that get pretty severe, to the
extent that the overhead involved in looking for a cached plan exceeds
recompiling. I've also seen many, many quasi-redundant cache entries, that
properly should've reused the same plan, but had discrete lexical
differences that would be ignored by a query plan, but are separately
distinguished in plan cache. The query plan for a SP has got to be a simple
object id lookup

SELECT objtype, cacheobjtype, COUNT(*) AS Expr1
FROM syscacheobjects
GROUP BY objtype, cacheobjtype
ORDER BY objtype

Adhoc Executable Plan 15
Adhoc Compiled Plan 24
Default Parse Tree 1
Prepared Compiled Plan 47
Prepared Executable Plan 45
Proc Executable Plan 41
Proc Compiled Plan 43
Proc Extended Proc 3
SysTab Parse Tree 17
View Parse Tree 7

Also his rant about role security is a little out there, how are you going
to implement impersonation via roles? A context that executes ad hoc sql
(even dynamic sql in a SP) must have access privileges to all db objects
referenced. If that context provides a way for the user to compose and
execute SQL, it allows the user access to anything in those db objects. (For
purposes of this rant let's say that row-level security is impractical,
because really it sort-of is.)

With stored procedures using inheritance, precisely restricted access to
data can be provided without having to grant blanket access to those
underlying db objects. The user with exec permissions can only access data
as provided by the SP, s/he neither needs not has direct explicit access to
the tables.

Plus there are developmental issues, like reuse and consistency... they
surely still have their advantageous places... to me the prospect of "going
stored procedure free" seems a little whacked, but then again that might
just be me. :)

[I'm somewhat curious to know the reasons this change was made. All I can
think of is that it simplified compilation (now done in one place instead
of two) and always produces an up-to-date execution plan.]


The BOL topics about sp_recompile and WITH RECOMPILE are interesting... I'd
say they did it to blur the lines, but even so, the lines are still there to
some extent.

Anyway, struggling to get back to the topic, what I was trying to say is,
after the first pass through the tests, wouldn't the advantage be gone?
Yes. Once the query is cached it probably would remain cached for the
duration of the test. A dummy initial call to the SP should cause the
query to be cached.
And
also, at least on my end, that query plan consists of one step, a clustered
index scan... is that even worth caching?

-Mark
Actually I was thinking more of the SQL compilation time than the query
plan build time, but I have no idea how these times compare.

About http://weblogs.asp.net/fbouma/archive/2003/11/18/38178.aspx

My saved URL info for the article didn't fully indicate (or remind me)
of how much territory/controversy he covers at that URL. I apologize
because that was somewhat misleading - I had only intended to refer to
the few paragraphs where he describes the changes in SQL caching in SQL
Server 7 and 2000. I later went back and found myself caught up in
reading the full article, as I see you did also! But I apologize for
that because most of it is orthogonal to this thread. I thought (and
still do) that it was a very interesting discussion, but I really had
intended to refer to it only as a reference for the change in caching
behavior.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,158
Messages
2,570,882
Members
47,414
Latest member
djangoframe

Latest Threads

Top