K
Kevin Frey
At our work a bit of a "philosophical" debate is underway as to how much
data (in terms of number of rows of data) should be considered "enough" data
for presenting to a user, via web application. This is not a web-based
report (an entirely different issue) but a UI that allows a user to pick a
record and work on it in a transactional system.
What I mean by this is: if a particular table is extremely large (say 3
MILLION rows) should we consider it "reasonable" to allow them page through
all 3 million rows of data, if that's what they want to do?
In the past I have built desktop applications that would happily let you
page/scroll through millions of records (the concept is relatively similar).
But nowadays I'm of the opinion that there is a fairly small number (let's
say a few thousand, certainly unlikely > 5,000) that if the user needs to
see beyond this amount they are either using the wrong mechanism (should be
using a report) or they should not be doing that due to being so
unproductive. In the second case, the user should instead be using a search
mechanism to increase the selectivity of what they need require and narrow
the number of records displayed.
My technical reason for asking such a question is that when you want to
allow paging through X million records, it is only feasible to do this when
you are following well-defined navigational paths (indexes) in the database.
This allows you to use a "TOP N" approach and grab small chunks of data to
support each "page-full" of data being displayed.
Such a technical "restriction" produces a dichotomy when you also want to
present the data to a user in a grid and have them click a column heading to
sort the data - my way of thinking is if someone sorts the data and then
clicks "Next Page" they *should* see the next chunk of data based on that
sort order. To permit that for every possible column heading of a data table
would either require a bucket-load of indexes. But even then it is often the
case that an index cannot help you because the user wants to sort on a
foreign-key non-primary-key field (for example, Customer connects to
Customer Type, and user wants to sort by Customer Type Description [which is
not the key]).
I know their exist strategies like materialised views and indexable views,
but are these really appropriate solutions to this kind of problem? To make
matters worse in our case, we want users to have some degree of
customisation of what data they see, which means [within the limits of what
makes logical sense in the model] they can "join up" more foreign keys or
remove them. Even if we did use indexed views, simply indexing "everything"
would be wasteful.
Appreciate any comments you wish to make regarding the various strategies
available.
data (in terms of number of rows of data) should be considered "enough" data
for presenting to a user, via web application. This is not a web-based
report (an entirely different issue) but a UI that allows a user to pick a
record and work on it in a transactional system.
What I mean by this is: if a particular table is extremely large (say 3
MILLION rows) should we consider it "reasonable" to allow them page through
all 3 million rows of data, if that's what they want to do?
In the past I have built desktop applications that would happily let you
page/scroll through millions of records (the concept is relatively similar).
But nowadays I'm of the opinion that there is a fairly small number (let's
say a few thousand, certainly unlikely > 5,000) that if the user needs to
see beyond this amount they are either using the wrong mechanism (should be
using a report) or they should not be doing that due to being so
unproductive. In the second case, the user should instead be using a search
mechanism to increase the selectivity of what they need require and narrow
the number of records displayed.
My technical reason for asking such a question is that when you want to
allow paging through X million records, it is only feasible to do this when
you are following well-defined navigational paths (indexes) in the database.
This allows you to use a "TOP N" approach and grab small chunks of data to
support each "page-full" of data being displayed.
Such a technical "restriction" produces a dichotomy when you also want to
present the data to a user in a grid and have them click a column heading to
sort the data - my way of thinking is if someone sorts the data and then
clicks "Next Page" they *should* see the next chunk of data based on that
sort order. To permit that for every possible column heading of a data table
would either require a bucket-load of indexes. But even then it is often the
case that an index cannot help you because the user wants to sort on a
foreign-key non-primary-key field (for example, Customer connects to
Customer Type, and user wants to sort by Customer Type Description [which is
not the key]).
I know their exist strategies like materialised views and indexable views,
but are these really appropriate solutions to this kind of problem? To make
matters worse in our case, we want users to have some degree of
customisation of what data they see, which means [within the limits of what
makes logical sense in the model] they can "join up" more foreign keys or
remove them. Even if we did use indexed views, simply indexing "everything"
would be wasteful.
Appreciate any comments you wish to make regarding the various strategies
available.