std::string on "const char *"

J

Jarek Blakarz

Hi

Consider the following program:

const char *str = "my name";
std::string s(str);

std::string allocates the new memory on a heap.
I would like to force the std::string to initially work directly on a string
pointed to by "const char *str" and not allocate anything on a heap.
Of course later on when writing to a string occurs the COW is allowed.

Can I do that ? If so, HOW ?

thanks for answer
 
N

Nobody

I would like to force the std::string to initially work directly on a string
pointed to by "const char *str" and not allocate anything on a heap.
Of course later on when writing to a string occurs the COW is allowed.

Can I do that ? If so, HOW ?

No; std::string doesn't support that. It always allocates its own memory.

You can write your own string class, but that won't work with functions
which expect a std::string argument. And subclassing std::string won't
work as none of its methods are virtual, including its destructor.
 
Ö

Öö Tiib

Hi

Consider the following program:

const char *str = "my name";
std::string s(str);

That is not C++ program, it does not compile on any C++ compilers I know of.
You probably meant:

#include <string>
int main()
{
const char *str = "my name";
std::string s(str);
}

std::string allocates the new memory on a heap.

In lot (majority?) of implementations it does not. Lot of std::string
implementations use short string optimization and that means that the
"my name" is copied into stack (where that s resides). Other reason is
that C++ compiler may optimize the above code totally away since it
does nothing externally observable.
I would like to force the std::string to initially work directly on a string
pointed to by "const char *str" and not allocate anything on a heap.

Use the str then, why you need that s? A class that depends on external
management of resources that it "holds" is not worth making.
Of course later on when writing to a string occurs the COW is allowed.

Can I do that ? If so, HOW ?

Can you tell us why you need such a monster? I have not met much issues with
performance of std::string during past 15 years or so. My personal feeling
is that you are fixing something that is not broken by creating something
that IS broken.
 
M

Marcel Müller

Hi

Consider the following program:

const char *str = "my name";
std::string s(str);

Can I do that ? If so, HOW ?

As the others suggested you need to write your own string class for this
purpose.

But if you want to go this way, I have some hints. (I have already done
this before.)

You will need *two* separate classes. One for ordinary strings and one
for compile time constant strings. The reason is quite easy: both accept
const char* as source for construction, but only one of them requires
that the lifetime of the storage behind the pointer exceeds the lifetime
of your string and maybe also copies of the string. C++11's constexpr
could be helpful.

Furthermore you need to decide whether you convert your constant string
to mutable strings at some place or if you modify you mutable string
class in a way that it does not free the storage of your constant
strings. The latter requires that length information and possibly a
reference count is not allocated in the same chunk of memory than the
string content. Fortunately this is common practice.

In fact you need a good reason to do all that, since it breaks
compatibility with std::string. Of course, you could provide a
conversion to std::string, but this would require a new allocation on
each conversion - a really bad idea.

In my case the reason was a C style plug-in interface that did not allow
dynamic allocations of storage that is shared between plug-in and main
program before an initialization function has been called.


Marcel
 
S

Seungbeom Kim

Note that a program needs to convert static strings into std::strings
only once. Static strings are part of the binary image on disk. Reading
the data in from the disk is orders of magnitude slower than performing
the in-memory copy for initializing std::string, so there is hardly any
point in trying to optimize the latter. If loading static data is too
slow you most probably need to get a faster hard drive instead.

But there's a difference in the memory footprint. With another layer
of dynamic allocation, the process consumes twice the address space
for each such string that could have remained only in the read-only
segment. In low-memory situations, this could cause other pages to be
swapped out to disk.

Of course, it is another story how likely is a program with such a
large amount of static data to affect the overall system performance.
 
R

Richard Damon

But there's a difference in the memory footprint. With another layer
of dynamic allocation, the process consumes twice the address space
for each such string that could have remained only in the read-only
segment. In low-memory situations, this could cause other pages to be
swapped out to disk.

Of course, it is another story how likely is a program with such a
large amount of static data to affect the overall system performance.

The other side of the issue is that the constructor for this string
needs to know it its parameter really is a static string that will stay
around "forever", or is a temporary buffer that does need to be copied.

You can't really even count on using the const attribute, as it isn't
too hard to get that applied to a temporary buffer. Take the following
code as an example:


string makestring(const char* data) {
string s(data);
return s;
}



....
char buffer[30];
strcpy(buffer, "String 1");

string s1 = makestring(buffer);
strcpy(buffer, "String 2");


if the string constructor just uses the fact that it's parm has type
const char*, then at the end of the code, s1 holds the value
"String 2", since it would have thought that its input was a const
static string when it wasn't, and thus not make a copy of its input.


Also, if it did somehow have a way to really distinguish static strings
from other character buffer, than it would need to somehow store a flag
to determine if the old data pointer needs to be deleted, which may well
add a cost to every occurrence of the class.
 
8

88888 Dihedral

在 2013å¹´1月22日星期二UTC+8下åˆ12æ—¶46分47秒,Richard Damon写é“:
On 2013-01-21 09:37, Paavo Helde wrote:

But there's a difference in the memory footprint. With another layer
of dynamic allocation, the process consumes twice the address space
for each such string that could have remained only in the read-only
segment. In low-memory situations, this could cause other pages to be
swapped out to disk.

Of course, it is another story how likely is a program with such a
large amount of static data to affect the overall system performance.



The other side of the issue is that the constructor for this string

needs to know it its parameter really is a static string that will stay

around "forever", or is a temporary buffer that does need to be copied.



You can't really even count on using the const attribute, as it isn't

too hard to get that applied to a temporary buffer. Take the following

code as an example:





string makestring(const char* data) {

string s(data);

return s;

}







...

char buffer[30];

strcpy(buffer, "String 1");



string s1 = makestring(buffer);

strcpy(buffer, "String 2");





if the string constructor just uses the fact that it's parm has type

const char*, then at the end of the code, s1 holds the value

"String 2", since it would have thought that its input was a const

static string when it wasn't, and thus not make a copy of its input.





Also, if it did somehow have a way to really distinguish static strings

from other character buffer, than it would need to somehow store a flag

to determine if the old data pointer needs to be deleted, which may well

add a cost to every occurrence of the class.
There are subtle differences in wirting c++ programs
to be compiled in a library to be used by others,
and those with the main program with everything
compiled optimized extremely for those constants
not be exposed to others.
 
J

Jorgen Grahn

.
The other side of the issue is that the constructor for this string
needs to know it its parameter really is a static string that will stay
around "forever", or is a temporary buffer that does need to be copied.

You can't really even count on using the const attribute, as it isn't
too hard to get that applied to a temporary buffer.
[snip]

It's simply the usual old semantics: 'const Foo* foo;' doesn't in any
way guarantee that '*foo' cannot legally change. It just says you
cannot legally change it by using just 'foo'.

/Jorgen
 
R

Richard Damon

Note that once the data has been copied, the read-only pages can be
swapped out from the working set again (i.e. discarded). This is done
automatically by the OS in case of low memory AFAIK. So the memory
consumption is not doubled in principle, it is only just the read-write
pages are more expensive to deal with in case of memory exhaustion. IOW,
if the memory is exhausted and all programs start trashing, a program
using dynamic memory will trash worse than the one using static memory.
Not sure if this scenario is worth optimizing.

Cheers
Paavo

It is worth pointing out that not all machines work this way. Most of
the programs I write will never meet a hard disk, and memory
availability is limited.

In this environment, I do strongly try to avoid making a std:string out
of static strings. If I have a string member that is always getting
initialized to a static string, it may be better to make that member a
char const*

If it is just most cases are static strings, than it may be worth
looking at ways to hold the result for the few dynamic cases to allow
the use of char const*
 
J

Jeff Flinn

Hi

Consider the following program:

const char *str = "my name";
std::string s(str);

std::string allocates the new memory on a heap.
I would like to force the std::string to initially work directly on a string
pointed to by "const char *str" and not allocate anything on a heap.
Of course later on when writing to a string occurs the COW is allowed.

Can I do that ? If so, HOW ?

Not sure it helps in you case but:

Google "string_ref". There is a standard proposal:
www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3442.html, and a
recently added implementation in boost.

string_ref is separate from std::string though IIUC, has a common
interface and will be usable by templated code where a const
std::string& could be used.

Jeff
 
G

Gerald Breuer

It is possible when you write your own allocator; but it's not
just three lines of code.
 
B

Bo Persson

Jarek Blakarz skrev 2013-01-21 10:47:
Hi

Consider the following program:

const char *str = "my name";
std::string s(str);

std::string allocates the new memory on a heap.
I would like to force the std::string to initially work directly on a string
pointed to by "const char *str" and not allocate anything on a heap.
Of course later on when writing to a string occurs the COW is allowed.

For such a short string there is generally no heap allocation. Most
string implementations use a small-string-optimization where short
strings are stored inside the std::string object.

Here is a post in another forum, showing that constructing or copying a
small string only uses 4-5 machine instructions, and executes in
nanoseconds.

http://stackoverflow.com/a/11639305/597607



Bo Persson
 
N

Nobody

For such a short string there is generally no heap allocation. Most string
implementations use a small-string-optimization where short strings are
stored inside the std::string object.

For a counterpoint, GNU libstdc++ always allocates. Specifically, a
std::string consists of a single pointer which points to the first byte of
the string data, which is preceded by a 3-word header containing the
length, the capacity and a reference count.

The advantages are that a std::string is only as large as a pointer, and
can in fact be cast to a char*, so if you have a pointer to an array of
std::string you can pass it to a function expecting a "const char * const *".
 
J

Jarek Blakarz

For a counterpoint, GNU libstdc++ always allocates. Specifically, a

std::string consists of a single pointer which points to the first byte of

the string data, which is preceded by a 3-word header containing the

length, the capacity and a reference count.



The advantages are that a std::string is only as large as a pointer, and

can in fact be cast to a char*, so if you have a pointer to an array of

std::string you can pass it to a function expecting a "const char * const *".

Thanks to all of you for pointing out a lot of interesting details.
Currently I'm refactoring the C++ C-style code.
The code contains a lot of static strings that are pointed to by a class
"const char*" members. There are a lot of such static strings but not megabytes
of them. Now
thanks to the information you provided me I'm
considering not to change those "const char*" members at all.
 
Ö

Öö Tiib

Are you seriously being that nitpicky?

Nay, I was it jokingly ... major reason was that a piece of real code was needed to
support the other points (stack? optimizations?) that I made in my answer. Two lines
out of context whatsoever didn't form enough substance to discuss what compilers
do with those.
 
Ö

Öö Tiib

For a counterpoint, GNU libstdc++ always allocates. Specifically, a
std::string consists of a single pointer which points to the first byte of
the string data, which is preceded by a 3-word header containing the
length, the capacity and a reference count.

The advantages are that a std::string is only as large as a pointer, and
can in fact be cast to a char*, so if you have a pointer to an array of
std::string you can pass it to a function expecting a "const char * const *".

That feels terrible if someone really casts std::string into 'char const*' .
That would not pass my review: "use c_str()".

If someone casts pointer of first element of vector of strings into
'char const* const*' and it works thanks to an extension ... then I
would require commented static_asserts close-by that detect that it is
indeed an implementation with such extension.

Also, I haven't seen useful functions that take 'char const* const*'
as parameters for a decade or so. Length information readily available
in std::string is on most cases worth its price and so strings commonly
outperform usage of raw char const*. Can you bring example where
lot of strings are used but size of those does not matter?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,962
Messages
2,570,134
Members
46,692
Latest member
JenniferTi

Latest Threads

Top