Jonathan said:
Would you explain what the segmented sequence optimization is? My
googling only turns up two other references (both by you) which assume
the reader already knows what it is.
I thought I explained it in some Usenet articles already. However, here
we go:
There are a bunch of segmented sequences in the standard library and the
user might provide own containers which might also be segmented. With a
segmented sequence I mean something which is split into some lower level
segments. 'std::deque' is the obvious example: it is a sequence which is
actually represented by an array of subsequences, i.e. an array of segments.
Other examples include 'std::[io]streambuf_iterator's (the segments here are
the internal character buffers), certain hashes (the segments are the
buffers), etc. The problems with these sequences is that iterators typically
need to do additional checks and compilers cannot really remove them because
they don't know about the structure.
The idea of the optimization is rather simple: instead of processing the
sequence uniformly, the segments are individually processed. That is, the
algorithm actually just iterates over the segments and delegates processing
of the segments to an algorithm using lower level iterators. For example,
rather than using 'std::deque<T>::iterator', the segments can be processed
using 'T*' which is much faster on most systems.
The major trick is to define an interface which tells the algorithm that it
is supposed to handle a segmented sequence and how to access the individual
segments. I don't have a complete implementation of this stuff available
but when I used it, I used something like 'segment_traits' which provided
two kinds of iterators:
- an iterator for the segments
- an iterator for the contents of a segment
The segment iterator would be obtained from the passed 'begin()' and 'end()'
iterators and provide access to the begin and end of the current segment. I
just looked at my 'for_each()' which provides this optimization but it is
pretty confusing. You can have a look at it at
<
http://cvs.sourceforge.net/viewcvs.py/cxxrt/cxxrt-source/include/cxxrt/for_each.h?view=markup>.
The actual logic is hidden in the '_Cxxrt_iterhelper' (which is also
available from the CVS). Since then I have decided for myself that
optimizations like this cannot really be factored out into class but that
algorithms should aggressively reuse other algorithms and I would implement
the optimization now right in one algorithms which is internally used by
other algorithms.
Matt Austern coined the term "segmented sequences" and gave a presentation
on the idea at a workshop on Generic Programming at Schloß Dagstuhl. He
come up with the idea in the context of hashes he was implementing.