X
xhoster
Alf McLaughlin said:It seems to me the data structure you are looking for is a sorted list.
Then you can use binary search to find the first element above the
lower end of the interval, and the last element below the upper end.
The elements in between are all the list elements in the interval.
Of course, sorting only pays when you have many intervals to check
against the same set of data. Otherwise, a linear search would be
faster.
Anno
Thanks, Anno. I am realizing after posting with Xho that I did not
state my question precisely right, but you did deduce what I am
interested in I think. Basically, I have a list of 500 values and I
would like to take another list of values (which happens to be 80 long
at the moment, but this could be longer as well) and for each of these
80 values I would like to find which 2 of the 500 values is the next
greater value and the next lesser value. I've been doing this (in
pseudocode, please take syntax mistakes and finer points lightly):
my @val1 = (my 80 values); my @val2 = (my 500 values);
foreach my $i (@val1) {
for (my $j = 0; $j < @val2 - 1; $j++) {
my $current = $val2[$j];
my $next = $val2[$j + 1];
if ($i > $current && $i < $next) {
#then this value is between $current and next.
}
}
}
In one example I have, it was necessary to loop through this 8000
times.
With a different 500 database set and different 80 querent set each of
those 8000 times?
I may be interested in running this program perhaps 1 million
times on a different set of 80 values;
Maybe this is a naive question, but how would that be different than
running the problem one time on a set of 80 million querent values?
you can start to see how many
times I will have to loop through the 500 values and how much work the
computer will have to do. With one example I have, it took me 16
seconds to compute this 1000 times. Since I am interested in running
this perhaps millions of times, coding the problem this way in Perl may
end up taking my program an additional day or two to run.
I hope I have described the problem more clearly now. Does it still
sound as if a binary search on a sorted array is the way to go?
It is certainly a good candidate, both for performance and ease. Another
choice would be to sort both the database array (the 500) and the querent
array (the 80), then your linear method might be faster than the binary
search (because your linear search through the 500 can pick up where the
previous one left off, rather than restarting at the beginning each time.)
You would have to test it to see, but I wouldn't bother. The code involved
would be more subtle than the binary search is. As you suggest below, if
the binary search isn't good enough I'd switch languages before resorting
to such desperate measures.
Also, I plan on re-coding the same program is C or C++ to see if the
code speeds up; I suspect the gain in performance may make the program
run in a reasonable amount of time without having to use a more
sophisticated algorithm.
Binary searches are't *that* sophisticated. With a size of 500, I would
definitely choose binary over linear. I recently benchmarked a finely
tuned linear search and a finely tuned binary search in C, and the
crossover-point (where binary started doing better than linear) was at an
array sizes of more than ~120. (That is with g++ -O3. With g++ -O0, the
crossover was something like 10). So I think you could safely assume it is
worth doing binary on arrays of 500 or more.
Xho