How Much Data to Provide?

Kevin Frey · Nov 29, 2007

At our work a bit of a "philosophical" debate is underway as to how much
data (in terms of number of rows of data) should be considered "enough" data
for presenting to a user, via web application. This is not a web-based
report (an entirely different issue) but a UI that allows a user to pick a
record and work on it in a transactional system.

What I mean by this is: if a particular table is extremely large (say 3
MILLION rows) should we consider it "reasonable" to allow them page through
all 3 million rows of data, if that's what they want to do?

In the past I have built desktop applications that would happily let you
page/scroll through millions of records (the concept is relatively similar).
But nowadays I'm of the opinion that there is a fairly small number (let's
say a few thousand, certainly unlikely > 5,000) that if the user needs to
see beyond this amount they are either using the wrong mechanism (should be
using a report) or they should not be doing that due to being so
unproductive. In the second case, the user should instead be using a search
mechanism to increase the selectivity of what they need require and narrow
the number of records displayed.

My technical reason for asking such a question is that when you want to
allow paging through X million records, it is only feasible to do this when
you are following well-defined navigational paths (indexes) in the database.
This allows you to use a "TOP N" approach and grab small chunks of data to
support each "page-full" of data being displayed.

Such a technical "restriction" produces a dichotomy when you also want to
present the data to a user in a grid and have them click a column heading to
sort the data - my way of thinking is if someone sorts the data and then
clicks "Next Page" they *should* see the next chunk of data based on that
sort order. To permit that for every possible column heading of a data table
would either require a bucket-load of indexes. But even then it is often the
case that an index cannot help you because the user wants to sort on a
foreign-key non-primary-key field (for example, Customer connects to
Customer Type, and user wants to sort by Customer Type Description [which is
not the key]).

I know their exist strategies like materialised views and indexable views,
but are these really appropriate solutions to this kind of problem? To make
matters worse in our case, we want users to have some degree of
customisation of what data they see, which means [within the limits of what
makes logical sense in the model] they can "join up" more foreign keys or
remove them. Even if we did use indexed views, simply indexing "everything"
would be wasteful.

Appreciate any comments you wish to make regarding the various strategies
available.

Cowboy \(Gregory A. Beamer\) · Nov 29, 2007

I would not allow more than a few hundred in most cases. I would also
provide a way to set up a hierarchy to reduce numbers (search with
controlled constraints), like alphabetically by last name, currently under
treatment, today's visitors, and the like. The means of whittling down
depend on the nature of the business and the application.

I would also have an advanced search type of feature that allows search for
items by some or all fields that can be searched.

I have retooled tons of BAD user interfaces that allowed people to get far
more data than they would ever be able to use that day (sometimes even in a
lifetime). Allowing a user to page through millions of records would fit
that kind of UI.

Kevin Frey said:
At our work a bit of a "philosophical" debate is underway as to how much
data (in terms of number of rows of data) should be considered "enough"
data for presenting to a user, via web application. This is not a
web-based report (an entirely different issue) but a UI that allows a user
to pick a record and work on it in a transactional system.

What I mean by this is: if a particular table is extremely large (say 3
MILLION rows) should we consider it "reasonable" to allow them page
through all 3 million rows of data, if that's what they want to do?

In the past I have built desktop applications that would happily let you
page/scroll through millions of records (the concept is relatively
similar). But nowadays I'm of the opinion that there is a fairly small
number (let's say a few thousand, certainly unlikely > 5,000) that if the
user needs to see beyond this amount they are either using the wrong
mechanism (should be using a report) or they should not be doing that due
to being so unproductive. In the second case, the user should instead be
using a search mechanism to increase the selectivity of what they need
require and narrow the number of records displayed.

My technical reason for asking such a question is that when you want to
allow paging through X million records, it is only feasible to do this
when you are following well-defined navigational paths (indexes) in the
database. This allows you to use a "TOP N" approach and grab small chunks
of data to support each "page-full" of data being displayed.

Such a technical "restriction" produces a dichotomy when you also want to
present the data to a user in a grid and have them click a column heading
to sort the data - my way of thinking is if someone sorts the data and
then clicks "Next Page" they *should* see the next chunk of data based on
that sort order. To permit that for every possible column heading of a
data table would either require a bucket-load of indexes. But even then it
is often the case that an index cannot help you because the user wants to
sort on a foreign-key non-primary-key field (for example, Customer
connects to Customer Type, and user wants to sort by Customer Type
Description [which is not the key]).

I know their exist strategies like materialised views and indexable views,
but are these really appropriate solutions to this kind of problem? To
make matters worse in our case, we want users to have some degree of
customisation of what data they see, which means [within the limits of
what makes logical sense in the model] they can "join up" more foreign
keys or remove them. Even if we did use indexed views, simply indexing
"everything" would be wasteful.

Appreciate any comments you wish to make regarding the various strategies
available.

Dave Bush · Nov 29, 2007

My general strategy is to provide a text box and submit button above the
gridview where the user can type part of the information he's looking
for. The submit then triggers a query against the db using WHERE x LIKE
'%' + data + '%' to retrieve a subset of the data the user is looking
for.

You may need to modify this somewhat for your circumstance, but my
experience is that the user normally has some idea what they are looking
for and allowing them to pre-filter the data and then refine by paging
is a lot more efficient than either displaying it all on one page or
making them page through the data.

As for "How much data?" I generally show 10 row at a time.

I'd also recommend implementing paging at the stored procedure instead
of letting .NET do it for you. If they are looking for something that
returns a lot of records, this will return less data, which will reduce
the amount of time it will take to display the results on the browser.

Dave Bush
http://blog.dmbcllc.com

-----Original Message-----
From: Kevin Frey [mailto:[email protected]]
Posted At: Thursday, November 29, 2007 5:30 PM
Posted To: microsoft.public.dotnet.framework.aspnet
Conversation: How Much Data to Provide?
Subject: How Much Data to Provide?

At our work a bit of a "philosophical" debate is underway as to how much

data (in terms of number of rows of data) should be considered "enough"
data
for presenting to a user, via web application. This is not a web-based
report (an entirely different issue) but a UI that allows a user to pick
a
record and work on it in a transactional system.

What I mean by this is: if a particular table is extremely large (say 3
MILLION rows) should we consider it "reasonable" to allow them page
through
all 3 million rows of data, if that's what they want to do?

In the past I have built desktop applications that would happily let you

page/scroll through millions of records (the concept is relatively
similar).
But nowadays I'm of the opinion that there is a fairly small number
(let's
say a few thousand, certainly unlikely > 5,000) that if the user needs
to
see beyond this amount they are either using the wrong mechanism (should
be
using a report) or they should not be doing that due to being so
unproductive. In the second case, the user should instead be using a
search
mechanism to increase the selectivity of what they need require and
narrow
the number of records displayed.

My technical reason for asking such a question is that when you want to
allow paging through X million records, it is only feasible to do this
when
you are following well-defined navigational paths (indexes) in the
database.
This allows you to use a "TOP N" approach and grab small chunks of data
to
support each "page-full" of data being displayed.

Such a technical "restriction" produces a dichotomy when you also want
to
present the data to a user in a grid and have them click a column
heading to
sort the data - my way of thinking is if someone sorts the data and then

clicks "Next Page" they *should* see the next chunk of data based on
that
sort order. To permit that for every possible column heading of a data
table
would either require a bucket-load of indexes. But even then it is often
the
case that an index cannot help you because the user wants to sort on a
foreign-key non-primary-key field (for example, Customer connects to
Customer Type, and user wants to sort by Customer Type Description
[which is
not the key]).

I know their exist strategies like materialised views and indexable
views,
but are these really appropriate solutions to this kind of problem? To
make
matters worse in our case, we want users to have some degree of
customisation of what data they see, which means [within the limits of
what
makes logical sense in the model] they can "join up" more foreign keys
or
remove them. Even if we did use indexed views, simply indexing
"everything"
would be wasteful.

Appreciate any comments you wish to make regarding the various
strategies
available.

Kevin Frey · Nov 30, 2007

I'd also recommend implementing paging at the stored procedure instead

of letting .NET do it for you. If they are looking for something that
returns a lot of records, this will return less data, which will reduce
the amount of time it will take to display the results on the browser.

So do I take from your comments that if there was a substantive amount of
data (let's say 500 records) you would only retrieve 10 or so at a time?

How would you then handle the situation if the user wants a different
sorting order for their data, but still wants to page through it?

Mark Rae [MVP] · Nov 30, 2007

So do I take from your comments that if there was a substantial amount of
data (let's say 500 records) you would only retrieve 10 or so at a time?

I certainly would - the trick with web performance is many small requests...

How would you then handle the situation if the user wants a different
sorting order for their data, but still wants to page through it?

Sorting changes the sort order, and sets the page of data back to 1.

Paging retains the sort order.

Dave Bush · Nov 30, 2007

Agree

-----Original Message-----
From: Mark Rae [MVP] [mailto:[email protected]]
Posted At: Friday, November 30, 2007 4:22 AM
Posted To: microsoft.public.dotnet.framework.aspnet
Conversation: How Much Data to Provide?
Subject: Re: How Much Data to Provide?

So do I take from your comments that if there was a substantial amount of
data (let's say 500 records) you would only retrieve 10 or so at a

time?

I certainly would - the trick with web performance is many small
requests...

How would you then handle the situation if the user wants a different
sorting order for their data, but still wants to page through it?

Sorting changes the sort order, and sets the page of data back to 1.

Paging retains the sort order.

Kevin Frey · Dec 2, 2007

Paging retains the sort order.

So you will only allow sorting where there is a corresponding index on the
database?

Mark Rae [MVP] · Dec 2, 2007

So you will only allow sorting where there is a corresponding index on the
database?

Precisely the other way round, otherwise you have IT driving the business
requirements which is total lunacy...

If a set of data needs to be sortable on a column, then that column needs to
be indexed otherwise you risk forcing your RDBMS to perform a table scan,
which can seriously cripple your performance...

Kevin Frey · Dec 3, 2007

Precisely the other way round, otherwise you have IT driving the business

requirements which is total lunacy...

If a set of data needs to be sortable on a column, then that column needs
to be indexed otherwise you risk forcing your RDBMS to perform a table
scan, which can seriously cripple your performance...

I didn't mean to imply that IT was driving the business requirement, simply
that there needed to be an index in place for any column which was to be
sortable.

Help!! Can anyone provide this solution?	3	Jan 30, 2022
How to host data visualization beginner friendly?	1	Aug 10, 2023
I feel like I waste so much time when I'm trying to learn.	0	Jul 30, 2022
How to push data from one HTML page to another	4	Jan 3, 2024
New To Javascript - Accessing Data	3	Nov 26, 2023
Sending data from web page to Raspberry Pi	0	Nov 26, 2022
Issue with passing fetched data to POST form. How can I?	0	Jul 23, 2023
API to excel dont show data	0	Feb 15, 2024

How Much Data to Provide?

Kevin Frey

Cowboy \(Gregory A. Beamer\)

Dave Bush

Kevin Frey

Mark Rae [MVP]

Dave Bush

Kevin Frey

Mark Rae [MVP]

Kevin Frey

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads