Efficient evaluation of step functions

B

Bill Woessner

I've been chewing on a problem and haven't come up with a good solution.
I'm hoping some of you have thought about it and come up with something
better than I have.

Suppose I have a mathematical step function like this:

f(x) = { 0, x <= x_1
1, x_1 <= x < x2
...
n, x >= x_n }

What's an efficient way to evaluate this in C++? Obviously, you can have
a huge if...else if...else statement. But after a while, that gets to be
really inefficient. I thought about using perfect hash (gperf), as you
would for optimizing a case statement, but the hash function would have to
be monotone since I'm dealing with ranges and that's not the case.
Currently, I've implemented it using a lookup table. It's significantly
faster but this only works because the knots (x_1, x_2, ..., x_n) are
integers. If, for example, x_1 = 1 and x_2 = pi, this wouldn't work.

This seems like a fairly ubiquitous problem so I'm hoping some of you
have come up with better solutions than I. Thanks in advance.

--Bill
 
V

Victor Bazarov

Bill Woessner said:
I've been chewing on a problem and haven't come up with a good solution.
I'm hoping some of you have thought about it and come up with something
better than I have.

Suppose I have a mathematical step function like this:

f(x) = { 0, x <= x_1
1, x_1 <= x < x2
...
n, x >= x_n }

What's an efficient way to evaluate this in C++? Obviously, you can have
a huge if...else if...else statement. But after a while, that gets to be
really inefficient. I thought about using perfect hash (gperf), as you
would for optimizing a case statement, but the hash function would have to
be monotone since I'm dealing with ranges and that's not the case.
Currently, I've implemented it using a lookup table. It's significantly
faster but this only works because the knots (x_1, x_2, ..., x_n) are
integers. If, for example, x_1 = 1 and x_2 = pi, this wouldn't work.

This seems like a fairly ubiquitous problem so I'm hoping some of you
have come up with better solutions than I. Thanks in advance.

It's a problem of searching in a sorted list of values. Use binary
search, it's the fastest.

Victor
 
D

David Fisher

Bill Woessner said:
Suppose I have a mathematical step function like this:

f(x) = { 0, x <= x_1
1, x_1 <= x < x2
...
n, x >= x_n }

What's an efficient way to evaluate this in C++?

If the x_ values are reasonably spread out and are not too sparse (eg. 1.5,
2.8, 5.7, 8.9 ...), then one way might be to create an array that can be
used to map the integer part of (x times a constant K) (where K is the
minimum difference between any pair of x_ values; see below if this value is
small) to a struct like:

struct StepData
{
int value;
double cutOffPoint;
};

Then you can have something like:

StepData stepData[] = { ... };
const double minXValue = ...;
const double maxXValue = ...;
const double K = ...;

int f(double x)
{
assert(x >= minXValue && x <= maxXValue);
int index = floor((x - minXValue) / K); // (use floor() instead of
casting to int so negative values are handled OK)
const StepData &s = stepData[index];
return (x < s.cutOffPoint ? s.value : s.value + 1);
}

The value of "cutOffPoint" should be the next highest x value if it does not
apply (ie. if there is no "step" between (minXValue + index * K) and
(minXValue + (index + 1) * K).

If the minimum difference between any pair of x_ values is relatively small,
then multiple steps will map to the same index of stepData[] ... maybe have
a smaller StepData array again ? (ie. a recursive data structure that is
locally more fine grained where necessary). If the elements are mostly far
apart (eg. 1.0, 1000.0, 1000000.0 ...), then this approach might help as
well (to avoid creating an array with a million entries which are mostly
duplicates). Another space saving could be to have a single definition of
each StepData value (in a separate array) and to fill the stepData[] array
with pointers into it.

Obviously lots of refinements possible ...

Hope this helps,

David F
 
D

David Fisher

Bill Woessner said:
Suppose I have a mathematical step function
...
I thought about using perfect hash (gperf), as you
would for optimizing a case statement, but the hash function would have to
be monotone since I'm dealing with ranges and that's not the case.

Another thought:

If your function has some sort of predictable shape (eg. exponential), you
could "preprocess" the value before hashing it (eg. take the log) to make it
closer to a straight line. The preprocessing function would not have to be
completely accurate (eg. an approximation to "log" would do), just have
consistent results ...

David F
 
D

David Fisher

Bill Woessner said:
If the x_ values are reasonably spread out and are not too sparse (eg. 1.5,
2.8, 5.7, 8.9 ...), then one way might be to create an array that can be
used to map the integer part of (x times a constant K) (where K is the
minimum difference between any pair of x_ values; see below if this value is
small) to a struct <snip>
...

I tried it out, and binary search (using std::lower_bound()) is definitely
faster than this for small values (because of the division by K or
multiplication by 1 / K). The performance was about the same for N = 400,
and only significantly better for much larger values ...

The approach I described is only really useful if you can cast to an integer
(ie. for K = 1.0).

David F
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,148
Messages
2,570,838
Members
47,385
Latest member
Joneswilliam01

Latest Threads

Top