"Flat" interpolation

R

Roedy Green

consider this interpolation problem:

You measure the cumulative hits on a web page since the beginning of
time each day and plot the daily hits.

Ideally you sample every day at noon, but some days you are early or
late.

If you sample hourly you discover there are patterns, a peak in the
morning and one in late afternoon. You also discover each day of the
week had its own profile, with flatter patterns on the weekends.

You might tackle this two ways:

1. try somehow to normalise a non-noon reading to noon, but massaging
it with the information continuously collected and revised about the
profile.

2. think of the problem as how to smooth the derivative of a plot of
the cumulative hits where x is accurate to the minute. You want an
interpolation that flattens much like a Staedler Mars snake does --
not polynomial interpolation that trying to get given points accurate
going nuts in between.

There is a similar problem smoothing weight measurements. There is
likely a pattern each day that could be used to normalise to the same
time each day.

The other wrinkle is salt intake which increases body weight
temporarily. There should be a way to normalise for that, by using a
body fat measure (which actually measures conductivity).

To ask the question more mathematically, If I know there is a
predictable periodic pattern, I would like to normalise that out of
each reading then flatten the remaining result with the degree of
flattening an adjustable parameter.
--
Roedy Green Canadian Mind Products
http://mindprod.com

"Learning is not compulsory... neither is survival."
~ Dr. W. (William) Edwards Deming (born: 1900-10-14 died: 1993-12-20 at age: 93))
 
R

Roedy Green

Also, I'm not sure why you consider this a Java programming
question ...

The community who hangs out here contains people with background is
such things who perhaps would know the proper names for the sort of
problem I am trying to solve.

--
Roedy Green Canadian Mind Products
http://mindprod.com

"Learning is not compulsory... neither is survival."
~ Dr. W. (William) Edwards Deming (born: 1900-10-14 died: 1993-12-20 at age: 93))
 
R

Roedy Green

If that's not it, I can't help without a better idea of
what you're trying to compute. (And my ability to help is not
a sure thing even then ...) An example of the sort of statement
you'd like to be able to make after computing your answer would
probably be a good thing.


The broad question is how would you graph daily hits in a smoothed way
so you can more easily see trends? How do you massage the data to
compensate for the fact the data not being sampled at the same time
each day? How do you interpolate points in an aesthetically pleasing
smooth way that dampen the oscillations and freakish daily blips?

How do you mathematically roughly simulate the curve you get if you
had used a Staetler Mars snake (a set of flexible steel bands that
slide against each other, encased in a dark green plastic coating) to
interpolate points. In the olden days this was how you interpolated a
graph drawn with a pencil.

I asked this question back in the 1960s of the famous mathematician
Anthony Ralston. He too upbraided me for asking a mathematically
meaningless question. He further chastised me saying nobody would ever
be interested in such a curve since.

I was interested in the time because we were playing with pen plotters
to draw cartoons, and the interpolation were using was very unstable
and needed a great many points to eliminate spurious spikes.

Yet surely there are practical solutions. Without it, how would
computer-assisted animation be possible?

I have seen spline curves. They seem better behaved than polynomials,
but still I would think still a little too untamed and unpredictable.
--
Roedy Green Canadian Mind Products
http://mindprod.com

"Learning is not compulsory... neither is survival."
~ Dr. W. (William) Edwards Deming (born: 1900-10-14 died: 1993-12-20 at age: 93))
 
M

Martin Gregorie

The broad question is how would you graph daily hits in a smoothed way
so you can more easily see trends? How do you massage the data to
compensate for the fact the data not being sampled at the same time each
day? How do you interpolate points in an aesthetically pleasing smooth
way that dampen the oscillations and freakish daily blips?
I used to be a physical inorganic chemist not a mathematician, so take
the following with a grain of anhydrous sodium chloride:

My thesis topic involved a lot of Mossbauer spectroscopy. Spectra were
generated by recording gamma photon counts in a 400 channel multi-channel
analyser, with each channel corresponding to the emitter velocity. This
was very messy data, with peaks almost never exceeding 3 standard
deviations of the background. We found we could get nicely fitted curves
by using the Gaussian curve-fitting method.

Maybe this would work for your data too?
 
D

Daniel Pitts

Roedy said:
The community who hangs out here contains people with background is
such things who perhaps would know the proper names for the sort of
problem I am trying to solve.
comp.programming has more :)
 
O

Owen Jacobson

The broad question is how would you graph daily hits in a smoothed way
so you can more easily see trends? How do you massage the data to
compensate for the fact the data not being sampled at the same time
each day? How do you interpolate points in an aesthetically pleasing
smooth way that dampen the oscillations and freakish daily blips?

How do you mathematically roughly simulate the curve you get if you
had used a Staetler Mars snake (a set of flexible steel bands that
slide against each other, encased in a dark green plastic coating) to
interpolate points. In the olden days this was how you interpolated a
graph drawn with a pencil.

I asked this question back in the 1960s of the famous mathematician
Anthony Ralston. He too upbraided me for asking a mathematically
meaningless question. He further chastised me saying nobody would ever
be interested in such a curve since.

I was interested in the time because we were playing with pen plotters
to draw cartoons, and the interpolation were using was very unstable
and needed a great many points to eliminate spurious spikes.

Yet surely there are practical solutions. Without it, how would
computer-assisted animation be possible?

I have seen spline curves. They seem better behaved than polynomials,
but still I would think still a little too untamed and unpredictable.

Patricia's suggestions are very good, and if you want to do it Right,
start with those. However, I have a simple, easy-to-understand
alternative that's also easy to hack up on short notice.

Let's say you group activity into minute-sized "buckets" for reporting.
The "10:05" bucket counts every event that happened where 10:04:30 <= t
< 10:05:30. You'll get a fairly chaotic graph with some trends visible.

One approach to smoothing is to use wider, overlapping buckets: for the
10:05 bucket, start at 10:00:30 and end at 10:09:30. You still have
exactly as many buckets as before, but each event is in several
adjacent buckets rather than in one. The resulting curve is much
smoother: spikes and dips are spread across a time period and mixed
with the underlying trend a little better.

It's a hack, but like all good hacks it's simple and it works in a pinch.

Cheers,

-o
 
K

Karl Uppiano

Martin Gregorie said:
I used to be a physical inorganic chemist not a mathematician, so take
the following with a grain of anhydrous sodium chloride:

My thesis topic involved a lot of Mossbauer spectroscopy. Spectra were
generated by recording gamma photon counts in a 400 channel multi-channel
analyser, with each channel corresponding to the emitter velocity. This
was very messy data, with peaks almost never exceeding 3 standard
deviations of the background. We found we could get nicely fitted curves
by using the Gaussian curve-fitting method.

Maybe this would work for your data too?

I used to work in digital audio, in which sampling, rate conversion and
filtering (smoothing) are important. So I have a couple of important points,
and some suggestions:

In a sampled data system, regular sampling is crucial. Irregular sampling
gives rise to "jitter", which is erroneous data based on the fact that you
are higher or lower on the curve than you expect to be, due to the time
offset. If you know how far off time you are *and* you know the slope of the
data, you might be able to correct for this, but both, and especially the
latter are big "ifs", especially since, if you knew the slope, you probably
wouldn't be trying to measure it.

In a sampled data system, you must sample more than twice as frequently as
the highest frequency you are trying to capture (note that even aperiodic
signals have frequencies associated with them as described by Fourier and
LaPlace analysis). This is known as the Nyquist criterion. So you cannot
sample once a day and obtain useful data on events that happen on an hourly
basis. In fact, once a day is only adequate for events that happen once
every two days, or aperiodic events which have spectral decompositions that
have most of their amplitude at or below that frequency. Any claims to the
contrary are "inventing" data, or are measuring what is known as "aliasing".

Assuming you have set up a regular sampling interval, and it happens
frequently enough to satisfy Nyquist, then you can filter the data using a
Finite Impulse Response (FIR) filter. This is a filter consisting of a ring
buffer containing the data. The data passes through the ring buffer in the
usual way -- in one end and out the other, as the pointer spins around. Each
buffer element has a filter coefficient obtained from a sinc function
(sin(x)/x) where sinc(1) := 1, centered symmetrically on the ring buffer.
The "frequency" of the sinc function in terms of your sample rate determines
the cutoff frequency of your filter -- its smoothing characteristics. This
type of filter is sometimes called an interpolation filter. The filter
output is the sum of the product of each buffer element multiplied by its
coefficient. The sum of products is calculated for each sample in the data
set. You can run this in batch mode or in continuous real-time. To avoid
errors due to truncation of the sinc function and/or the data, you might
have to "window" your filter, i.e., "fade in" and "fade out" the
coefficients, and the data. The need for windowing is less if you can make
your filter sufficiently long, such that the truncation effect becomes less
than your error tolerance by a factor of ten or so. There is software
available for designing FIR filters, so I would only suggest doing it by
hand as an exercise, if you are so inclined.

HTH,
 
K

Karl Uppiano

Karl Uppiano said:
I used to work in digital audio, in which sampling, rate conversion and
filtering (smoothing) are important. So I have a couple of important
points, and some suggestions:

In a sampled data system, regular sampling is crucial. Irregular sampling
gives rise to "jitter", which is erroneous data based on the fact that you
are higher or lower on the curve than you expect to be, due to the time
offset. If you know how far off time you are *and* you know the slope of
the data, you might be able to correct for this, but both, and especially
the latter are big "ifs", especially since, if you knew the slope, you
probably wouldn't be trying to measure it.

In a sampled data system, you must sample more than twice as frequently as
the highest frequency you are trying to capture (note that even aperiodic
signals have frequencies associated with them as described by Fourier and
LaPlace analysis). This is known as the Nyquist criterion. So you cannot
sample once a day and obtain useful data on events that happen on an
hourly basis. In fact, once a day is only adequate for events that happen
once every two days, or aperiodic events which have spectral
decompositions that have most of their amplitude at or below that
frequency. Any claims to the contrary are "inventing" data, or are
measuring what is known as "aliasing".

Assuming you have set up a regular sampling interval, and it happens
frequently enough to satisfy Nyquist, then you can filter the data using a
Finite Impulse Response (FIR) filter. This is a filter consisting of a
ring buffer containing the data. The data passes through the ring buffer
in the usual way -- in one end and out the other, as the pointer spins
around. Each buffer element has a filter coefficient obtained from a sinc
function (sin(x)/x) where sinc(1) := 1, centered symmetrically on the ring
buffer. The "frequency" of the sinc function in terms of your sample rate
determines the cutoff frequency of your filter -- its smoothing
characteristics. This type of filter is sometimes called an interpolation
filter. The filter output is the sum of the product of each buffer element
multiplied by its coefficient. The sum of products is calculated for each
sample in the data set. You can run this in batch mode or in continuous
real-time. To avoid errors due to truncation of the sinc function and/or
the data, you might have to "window" your filter, i.e., "fade in" and
"fade out" the coefficients, and the data. The need for windowing is less
if you can make your filter sufficiently long, such that the truncation
effect becomes less than your error tolerance by a factor of ten or so.
There is software available for designing FIR filters, so I would only
suggest doing it by hand as an exercise, if you are so inclined.

GAGH! Typo!

sinc(0) = sin(0)/0 := 1. You can't evaluate sinc(0) directly; you have to
take the limit as x->0
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,983
Messages
2,570,187
Members
46,747
Latest member
jojoBizaroo

Latest Threads

Top