Database Statistics... Best way to maintain stats???

L

Lucas Tam

Hi all,


I have an application which logs a considerable amount of data. Each day,
we log about 50,000 to 100,000 rows of data.

We like to report on this data... currently I'm using a stored procedure to
calculate the statistics, however since this is an ad hoc, reports take a
while to generate.

So how do you guys handle large amounts of data? Is there a good way to
precalculate a a set of statistics to handle ad hoc queries (i.e. By Hour,
By Day, By Week, By Month). Our application also provides near realtime
statistics... so precalculation has to be done on a continual basis. Does
..NET have any statistics classes that might help out with this sort of
thing? I don't think Performance counters will work since they don't log
persistent data.

Any ideas?

Thanks!
 
B

Bruce Barker

you might look at using cubes for rollups. if you need realtime adhoc stats
(with slice and dice), then you want to use a snowflake or star schema.

since you are only doing 100k rows a day (pretty small actually), a simple
star schema should doit (with so rollups). with this low of volume, i'd
update the rollup tables in real time (simple trigger). i'd expect no query
to take over a second or two (unless it was years of detail).

-- bruce (sqlwork.com)
 
J

John Rivers

or, if you don't want to get into hypercubes, you can

"squash" the data from the first table into a second table

with an aggregate query which adds, at minimum, a RecordCount column
and removes as much unnecessary detail as possible

then, if you need to include both tables in a query, you
can create a union query that simply adds a RecordCount of 1
to the first table so it has the same number of cols

ie: select field1, field2, field3, 1 as recordcount from table1 union
all select field1, field2, field3, recordcount from table2

there is an efficient way to do the transfer

you create a transactional stored procedure that uses a temporary table
to index the primary keys
of the records to be transferred (usually based on date, say 10,000
every 5 minutes or whatever fits)
so it looks roughly like this:

select top 10000 primary key into #temp from table1 order by date
--
insert into table2 (field1, field2, field3, recordcount)
select field1, field2, field3, count(*)
from table1
where primarykey in (select primarykey from #temp)
--
delete
from table1
where primarykey in (select primarykey from #temp)

(you will have to add the transaction and error handling bits yourself)

i used that when getting around 100,000 records per day with heaps
of detail and it compressed nicely about 10 to 1 and my nastiest
olap-style
t-sql queries (12 months of data) took around 15 seconds

and you can easily simulate olaps intermediate calculations
with a single table refreshed every day or on demand
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,990
Messages
2,570,211
Members
46,796
Latest member
SteveBreed

Latest Threads

Top