R
Rich S.
I'm reading through records in a file creating bitset objects from them.
Each bitset object has a score and I'm trying to collect just the top 2000
scores from over a million records.
I tihnk the best way to go is to create a vector, add the objects to it in
ascending order while keeping track of the size and lowest score. If the
size reaches the 2000 mark then evaluate the score and if it's lower than
the lowest score, discard the object otherwise traverse the vector and
insert it at its appropriate space and finally chop off the beginning of the
vector so that we're still in the 2000 range.
When I say best way I mean the way I thought of given my somewhat limited
experience with stl containers. I was also thinking about determining if the
object belongs in the 2000 set based upon it's score and then tossing it in
the end of the vector and doing a vector sort but I figured I'd skip the
sort and insert it myself at the proper position.
I didn't see anything that sets a vectors size as fixed and only inserts if
above this range type of thing but I guess that's where you override the
allocater? I haven't been down that road but nows a good time to jump in I
guess.
Thanks
This convoluted method below if you care to read and process is what I came
up with.
iSize = m_BitSetVector.size();
// if the vector is empty then just add it and collect the score
if(iSize == 0)
{
iLowestTotal = currentSet.GetScore();
m_BitSetVector.push_back(currentSet);
continue;
}
bool bAtFullCapacity = false;
if(m_BitSetVector.size() > m_iTotalScoresToKeep)
{
bAtFullCapacity = true;
// if the score is less than the lowest in the vector and
// we're at full capacity then continue
if(currentSet.GetScore() < m_BitSetVector.begin()->GetScore())
continue;
}
BitSetVector::iterator p;
bool bInserted = false;
for(p = m_BitSetVector.begin(); p != m_BitSetVector.end(); ++p)
{
if(currentSet.GetScore() < p->GetScore())
{
// insert at this location
m_BitSetVector.insert(p, currentSet);
bInserted = true;
if(m_BitSetVector.size() > m_iTotalScoresToKeep)
bAtFullCapacity = true;
// now chop off the end
if(bAtFullCapacity)
{
m_BitSetVector.erase(m_BitSetVector.begin());
iLowestTotal = m_BitSetVector.begin()->GetScore();
}
break;
}
}
// if it's not inserted yet then
if(!bInserted)
{
m_BitSetVector.push_back(currentSet);
if(m_BitSetVector.size() > m_iTotalScoresToKeep)
bAtFullCapacity = true;
// now chop off the end
if(bAtFullCapacity)
{
m_BitSetVector.erase(m_BitSetVector.begin());
iLowestTotal = m_BitSetVector.begin()->GetScore();
}
}
Each bitset object has a score and I'm trying to collect just the top 2000
scores from over a million records.
I tihnk the best way to go is to create a vector, add the objects to it in
ascending order while keeping track of the size and lowest score. If the
size reaches the 2000 mark then evaluate the score and if it's lower than
the lowest score, discard the object otherwise traverse the vector and
insert it at its appropriate space and finally chop off the beginning of the
vector so that we're still in the 2000 range.
When I say best way I mean the way I thought of given my somewhat limited
experience with stl containers. I was also thinking about determining if the
object belongs in the 2000 set based upon it's score and then tossing it in
the end of the vector and doing a vector sort but I figured I'd skip the
sort and insert it myself at the proper position.
I didn't see anything that sets a vectors size as fixed and only inserts if
above this range type of thing but I guess that's where you override the
allocater? I haven't been down that road but nows a good time to jump in I
guess.
Thanks
This convoluted method below if you care to read and process is what I came
up with.
iSize = m_BitSetVector.size();
// if the vector is empty then just add it and collect the score
if(iSize == 0)
{
iLowestTotal = currentSet.GetScore();
m_BitSetVector.push_back(currentSet);
continue;
}
bool bAtFullCapacity = false;
if(m_BitSetVector.size() > m_iTotalScoresToKeep)
{
bAtFullCapacity = true;
// if the score is less than the lowest in the vector and
// we're at full capacity then continue
if(currentSet.GetScore() < m_BitSetVector.begin()->GetScore())
continue;
}
BitSetVector::iterator p;
bool bInserted = false;
for(p = m_BitSetVector.begin(); p != m_BitSetVector.end(); ++p)
{
if(currentSet.GetScore() < p->GetScore())
{
// insert at this location
m_BitSetVector.insert(p, currentSet);
bInserted = true;
if(m_BitSetVector.size() > m_iTotalScoresToKeep)
bAtFullCapacity = true;
// now chop off the end
if(bAtFullCapacity)
{
m_BitSetVector.erase(m_BitSetVector.begin());
iLowestTotal = m_BitSetVector.begin()->GetScore();
}
break;
}
}
// if it's not inserted yet then
if(!bInserted)
{
m_BitSetVector.push_back(currentSet);
if(m_BitSetVector.size() > m_iTotalScoresToKeep)
bAtFullCapacity = true;
// now chop off the end
if(bAtFullCapacity)
{
m_BitSetVector.erase(m_BitSetVector.begin());
iLowestTotal = m_BitSetVector.begin()->GetScore();
}
}