R
Rich S.
Hi
In an earlier posting I was asking how to read thru millions of data records
keeping the top 2000 (where the top values are based upon a field in the
record) and niklasb suggested using a priority_queue like so
------------------- start old message ---------------------
A more efficient way than what you describe would be
to use priority_queue. Define the predicate in such
a way that the top() always returns the lowest scoring
object.
Let q be the priority_queue and suppose you want to
find the top N scores (where N is 2000 in your example),
the pseudo-code would be something like this:
q.reserve(N);
// add the first N elements to the priority queue
for (i = 0; i < N; ++i)
{
if (!ReadRecord(&obj))
break;
q.push(obj);
}
// invariant: q contains highest N elements
// q.top() is the lowest of the highest N elements
while (ReadRecord(&obj))
{
if (obj > q.top())
{
q.pop();
q.push(obj);
}
}
------------------- end of old message ---------------------
This is working well for me but I'm getting cases where I have duplicate
records in the top 2000 and I'm not quite sure how to fix that problem.
I was thinking of running a loop around the pop until the obj is NOT greater
than the top and then pushing obj but I'm convinced thats the best way.
Thanks for any help.
Regards.
In an earlier posting I was asking how to read thru millions of data records
keeping the top 2000 (where the top values are based upon a field in the
record) and niklasb suggested using a priority_queue like so
------------------- start old message ---------------------
A more efficient way than what you describe would be
to use priority_queue. Define the predicate in such
a way that the top() always returns the lowest scoring
object.
Let q be the priority_queue and suppose you want to
find the top N scores (where N is 2000 in your example),
the pseudo-code would be something like this:
q.reserve(N);
// add the first N elements to the priority queue
for (i = 0; i < N; ++i)
{
if (!ReadRecord(&obj))
break;
q.push(obj);
}
// invariant: q contains highest N elements
// q.top() is the lowest of the highest N elements
while (ReadRecord(&obj))
{
if (obj > q.top())
{
q.pop();
q.push(obj);
}
}
------------------- end of old message ---------------------
This is working well for me but I'm getting cases where I have duplicate
records in the top 2000 and I'm not quite sure how to fix that problem.
I was thinking of running a loop around the pop until the obj is NOT greater
than the top and then pushing obj but I'm convinced thats the best way.
Thanks for any help.
Regards.