J
Juha Nieminen
Reading about the so-called short string optimization used in some
implementation of std::string, many articles out there seem to
contrast it with the copy-on-write technique, as if they were
mutually exclusive.
The short string optimization is a low-level trick where, if the
string is short enough, it's stored in the std::string object itself
rather than allocating memory separately for it. (As a std::string has
as members usually a pointer, an integral indicating the size of the
string and, usually, another indicating the current capacity, and
perhaps a few bytes more for good measure, there's plenty of space
in the object itself to store short strings. For example on a 64-bit
system you could store a string of up to 22 characters or so in this
space; half of that in a 32-bit system.)
The copy-on-write technique, on the other hand, is a way to making
copying/assigning even large strings efficiently, as a deep copy of
the string data is done only when the data is modified rather than
when it's copied. (The advantages and disadvantages of this are
two-fold. Clearly if you have very large strings which get copied and
assigned around a lot, but these copies are seldom modified, it will
be enormously more efficient with CoW. On the other hand, all
modifying operators become more expensive, especially on a multi-threaded
environment, where they need locking. Always deep-copying the string
can be more expensive if the copying is done needlessly, but is more
efficient with strings that do not get copied around a lot but are
modified a lot.)
Anyways, I was wondering why these articles talk like the two
techniques were mutually exclusive. I don't see why that would be so.
I don't see why you couldn't implement *both* of them on the same
std::string class if you so wanted. You could still have the pointer,
size and capacity members "re-used" for short string optimization,
*and* if the string is larger (and thus requires separate memory
allocation) use CoW on that.
Well, anyways, I suppose that with C++11 the need for CoW has been
greatly diminished thanks to move constructors. With C++98 it was a
useful implementation that greatly sped up eg. sorting a vector of
large strings (or inserting a new string in such a vector), but with
move constructors that has become even more efficient than it was with
CoW strings. Of course there are still situations where move constructors
cannot be used and CoW would increase efficiency...
implementation of std::string, many articles out there seem to
contrast it with the copy-on-write technique, as if they were
mutually exclusive.
The short string optimization is a low-level trick where, if the
string is short enough, it's stored in the std::string object itself
rather than allocating memory separately for it. (As a std::string has
as members usually a pointer, an integral indicating the size of the
string and, usually, another indicating the current capacity, and
perhaps a few bytes more for good measure, there's plenty of space
in the object itself to store short strings. For example on a 64-bit
system you could store a string of up to 22 characters or so in this
space; half of that in a 32-bit system.)
The copy-on-write technique, on the other hand, is a way to making
copying/assigning even large strings efficiently, as a deep copy of
the string data is done only when the data is modified rather than
when it's copied. (The advantages and disadvantages of this are
two-fold. Clearly if you have very large strings which get copied and
assigned around a lot, but these copies are seldom modified, it will
be enormously more efficient with CoW. On the other hand, all
modifying operators become more expensive, especially on a multi-threaded
environment, where they need locking. Always deep-copying the string
can be more expensive if the copying is done needlessly, but is more
efficient with strings that do not get copied around a lot but are
modified a lot.)
Anyways, I was wondering why these articles talk like the two
techniques were mutually exclusive. I don't see why that would be so.
I don't see why you couldn't implement *both* of them on the same
std::string class if you so wanted. You could still have the pointer,
size and capacity members "re-used" for short string optimization,
*and* if the string is larger (and thus requires separate memory
allocation) use CoW on that.
Well, anyways, I suppose that with C++11 the need for CoW has been
greatly diminished thanks to move constructors. With C++98 it was a
useful implementation that greatly sped up eg. sorting a vector of
large strings (or inserting a new string in such a vector), but with
move constructors that has become even more efficient than it was with
CoW strings. Of course there are still situations where move constructors
cannot be used and CoW would increase efficiency...