Fast Container

T

Thomas Lehmann

There are written so many articles but no article is really providing
the answer I'm looking for.

Given a scenario with 100000 data elements the day (500000 the week).
Potential we might have 3000-5000 updates for individual elements
each second.

Imagine a filter that decides on each update the a data element will
be added, updated or removed.

The final container which is sorted by a user defined criteria must be
able to handle this amount of data.

Info: basically each value of that sorted container is a pointer to the
real data.

Final scope: we display the data in a table which provides index based
positions (row)

NOW, what is the right way to implement a fast container for this?

Some of the problems I was confronted with:
- Using a std::vector deletion/insertion is expensive, is it?
- Using a std::set I have no index access. Or is there a way to handle this properly?
- Somebody told me something about a weighted balanced tree as possible solution.
- Somebody else told me to use standard containers only.
 
V

Victor Bazarov

There are written so many articles but no article is really providing
the answer I'm looking for.

Given a scenario with 100000 data elements the day (500000 the week).
Potential we might have 3000-5000 updates for individual elements
each second.

Imagine a filter that decides on each update the a data element will
be added, updated or removed.

The final container which is sorted by a user defined criteria must be
able to handle this amount of data.

Info: basically each value of that sorted container is a pointer to the
real data.

Final scope: we display the data in a table which provides index based
positions (row)

NOW, what is the right way to implement a fast container for this?

Some of the problems I was confronted with:
- Using a std::vector deletion/insertion is expensive, is it?
- Using a std::set I have no index access. Or is there a way to handle this properly?
- Somebody told me something about a weighted balanced tree as possible solution.
- Somebody else told me to use standard containers only.

You didn't say whether the order of the objects in a container is
important. If you can define deletion from your custom vector as
swapping with the last element and resizing by -1, it's fast. Insertion
at the end is not too bad, and causes reallocation only once in a while
(when the vector grows past its capacity), and you can kind of control
that by managing the capacity at "calm" times (shooter's rule: reload
when you have a chance, don't wait 'til you're empty).

Also, consider std::deque, it's appends are even faster than vector's.

V
 
J

jacob navia

Le 02/11/11 16:43, Leigh Johnston a écrit :
[snip]

if you want fast inserts/deletes
anywhere but also want fast indexing then consider using my
"segmented_array" container which can be found at:

http://i42.co.uk/stuff/segmented_array.htm

/Leigh

Actually this is a list of vectors. Why should be faster
than using list and vector?

As far as I can read from your code
you do not use those containers. Why?

Thanks for publishing that code, it is full of ideas.

jacob
 
W

Werner

There are written so many articles but no article is really providing
the answer I'm looking for.

Given a scenario with 100000 data elements the day (500000 the week).
Potential we might have 3000-5000 updates for individual elements
each second.

Imagine a filter that decides on each update the a data element will
be added, updated or removed.

The final container which is sorted by a user defined criteria must be
able to handle this amount of data.

Info: basically each value of that sorted container is a pointer to the
real data.

Final scope: we display the data in a table which provides index based
positions (row)

NOW, what is the right way to implement a fast container for this?

Some of the problems I was confronted with:
- Using a std::vector deletion/insertion is expensive, is it?
- Using a std::set I have no index access. Or is there a way to handle this properly?
- Somebody told me something about a weighted balanced tree as possible solution.
- Somebody else told me to use standard containers only.

I think you can use a combination of list and vector.
The list is used for fast insertions, deletions etc of the data
elements,
and the vector contains iterators that are sorted iaw. criteria as
defined by the actual data elements, and not the iterator (list
unsorted).

The vector always remains sorted (insertion of iterator via
lower_bound).

You have the benefit of random access iterators for fast access on
sorted sequence.
You have the benefit of fast insertion, removal.

Adding data elements happen by pushing back onto the list, and adding
the iterator to the sorted sequence (vector).
Removing data elements happen by finding the item in the (sorted)
vector
and using the iterator to erase the data element from the list. The
"referring" iterators in the vector remain valid.

Kind regards,

Werner
 
W

Werner

I think you can use a combination of list and vector.
The list is used for fast insertions, deletions etc of the data
elements,
and the vector contains iterators that are sorted iaw. criteria as
defined by the actual data elements, and not the iterator (list
unsorted).

typedef std::list<DataElement> ElementList;
typedef std::vector<ElementList::iterator> ElementRefSequence;
ElementList elementList;
ElementRefSequence elementRefSequence;
 
T

Thomas Lehmann

Maybe I'm doing something wrong but comparing to vector
and deque your segmented_array is not faster.

- I'm compiling on Windows Ultimate 64 Bit
- I've tried with 64 and 1024 segments.
- 200000 elements (with push_back)
- took 5ms (vector took 2ms, deque took 5ms)
- The code looks same for deque and vector:

<code>
BOOST_AUTO_TEST_CASE(SegmentedArrayAddValues)
{
const unsigned int limit = 200000;

SegmentedArrayTest<int>* container = new SegmentedArrayTest<int>();
container->createContainer();

// creating content
for (unsigned int ix = 0; ix < limit; ++ix)
container->addValue(ix);

BOOST_CHECK_EQUAL(container->size(), (size_t)limit);

// cleanup
delete container;
}
</code>
 
N

Nick Keighley

On Nov 2, 3:21 pm, Thomas Lehmann

I'm feeling I've missed something...
There are written so many articles but no article is really providing
the answer I'm looking for.

Given a scenario with 100000 data elements the day (500000 the week).
Potential we might have 3000-5000 updates for individual elements
each second.

how are "elements" identified?
Imagine a filter that decides on each update [that] a data element will
be added, updated or removed.

The final container which is sorted by a user defined criteria must be
able to handle this amount of data.

Info: basically each value of that sorted container is a pointer to the
real data.

Final scope: we display the data in a table which provides index based
positions (row)

"final scope"? So when it's sorted it goes in a "table" that's
indexed?
NOW, what is the right way to implement a fast container for this?

use a relational database.

I was thinking std::map but I don't know how your elements are
identified. std::map has reasonable performance for insertion and
deletion- anywhere in the container. And you can choose what to index
it with. at the end you can copy it into whatever your user finds
conventient.
Some of the problems I was confronted with:
- Using a std::vector deletion/insertion is expensive, is it?

yes. Elements are always contiuous so if you delete say the 50th
element of a 100 element vector its going to have to move elements
51-99 down one. At 500000 elements that gets expensive.
- Using a std::set I have no index access. Or is there a way to handle this properly?

so you want to be able to index it? How are these indexes assigned?
Time?
- Somebody told me something about a weighted balanced tree as possible solution.

some sort of tree. (which is what a std::map usually is under the
hood). B-tree? (that's not a binary tree).
- Somebody else told me to use standard containers only.

why? I mean its a good place to start but is it an absolute
constraint- and why?

Do you have any performance criteria?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,149
Members
46,695
Latest member
StanleyDri

Latest Threads

Top