Legacy APIs which output C-style strings: Opportunity to use movesemantics?

  • Thread starter null hypothesis
  • Start date
N

null hypothesis

Greetings,

Assuming that the APIs provide me with a way to
query to the length of the output string
before the actual copy request is made, I can either:

* allocate a buffer
* call API to copy data to a basic_string
* delete said buffer
* return basic_string

or:

* create a vector with a size argument
* copy data
* return basic string copying out the data

(the Meyer's solution)

Is there a way I can eliminate the final copying?

I would have liked to see basic_string with a move
constructor and a move enabled assign at the very least.

Or, is there a fundamental problem with the move
approach that I have overlooked?
 
N

null hypothesis

Is there a way I can eliminate the final copying?

I would have liked to see basic_string with a move
constructor and a move enabled assign at the very least.

I meant move semantics with members/ctors which take a
pointer to char/wchar_t as a parameter. I am aware that
there are move ctors/op=/assign etc. for basic_string
parameters.
 
F

Francesco S. Carta

I meant move semantics with members/ctors which take a
pointer to char/wchar_t as a parameter. I am aware that
there are move ctors/op=/assign etc. for basic_string
parameters.

IIUIC, an efficient move operation from a basic_string to a
null-terminated string would imply that the basic_string uses a
null-terminated string as the internal representation of the data - and
the move semantic would be efficient because there would be just some
sort of pointer mangling.

If the internal representation happens to be different, the move
operation would necessarily imply copying the data - which in turn would
mean that the move operation would not eliminate the final copy you are
aiming to elide from your sequence.
 
F

Francesco S. Carta

IIUIC, an efficient move operation from a basic_string to a
null-terminated string would imply that the basic_string uses a
null-terminated string as the internal representation of the data - and
the move semantic would be efficient because there would be just some
sort of pointer mangling.

If the internal representation happens to be different, the move
operation would necessarily imply copying the data - which in turn would
mean that the move operation would not eliminate the final copy you are
aiming to elide from your sequence.

Small correction, where I say that the basic_string would have "a
null-terminated string" as the internal representation I really meant to
say just "a single contiguous storage of chars", not necessarily
null-terminated.
 
K

Kai-Uwe Bux

Francesco said:
Small correction, where I say that the basic_string would have "a
null-terminated string" as the internal representation I really meant to
say just "a single contiguous storage of chars", not necessarily
null-terminated.

Athough the current standard does not guarantee that basic_string uses
contigous memory for the character sequence, most if not all implementations
actually do that. Moreover, C++0X will turn this into a guarantee (see
[21.4.1/5] in n3092).


Best

Kai-Uwe Bux
 
F

Francesco S. Carta

Francesco said:
Small correction, where I say that the basic_string would have "a
null-terminated string" as the internal representation I really meant to
say just "a single contiguous storage of chars", not necessarily
null-terminated.

Athough the current standard does not guarantee that basic_string uses
contigous memory for the character sequence, most if not all implementations
actually do that. Moreover, C++0X will turn this into a guarantee (see
[21.4.1/5] in n3092).

Ah, that's really good to know!

Is it planned to add move operations from and to plain C-strings as the
OP asks?
 
N

null hypothesis

IIUIC, an efficient move operation from a basic_string to a
null-terminated string would imply that the basic_string uses a
null-terminated string as the internal representation of the data - and
the move semantic would be efficient because there would be just some
sort of pointer mangling.

Yes, I understand. It appears that I have not been able to make
myself very clear. I was actually asking if we can enable move
semantics for null-terminated string to basic_string conversions.

As Kai-Uwe Bux already mentioned the internal representation will
probably be mandated in the next standard (as was done with vectors).

So, I am looking at something like:

/*
** The function replaces the string controlled by *this
** with a string of length strlen(str) whose elements
** are a copy of the string controlled by str. Leaves str
** in a valid but unspecified state.
*/
basic_string<charT,traits,Allocator>&
assign(_Elem *str);

Or, more generally:

struct mystr {
size_t len;
char *b;
mystr() : len(0), b(0) {}
mystr(mystr const& s)
: len(s.len),
b(new char[len + 1]) {
memcpy(&b[ 0 ], &s.b[ 0 ], len + 1);
}
mystr(mystr&& s)
: len(0), b(0)
{
swap(b, s.b);
s.len = 0;
}
mystr(char *s)
: len(strlen(s)), b(0) {
swap(b, s);
}
/**
Others omitted for brevity
*/
};

int main()
{
char s[] = "hello";
char *l = "world";
mystr ms(s);
mystr ms2(l);
mystr ms3("jjj");
}

This of course leaves us with the issue of string literals
(e.g. instantiating ms2/ms3 may lead to undefined behavior later
on until and unless we guarantee these will never be written to?).
I have a feeling that type_traits may help somehow to identify
string literals though I can't figure a way out for the time
being. Any help would be greatly appreciated though!
 
B

Bo Persson

null said:
IIUIC, an efficient move operation from a basic_string to a
null-terminated string would imply that the basic_string uses a
null-terminated string as the internal representation of the data
- and the move semantic would be efficient because there would be
just some
sort of pointer mangling.

Yes, I understand. It appears that I have not been able to make
myself very clear. I was actually asking if we can enable move
semantics for null-terminated string to basic_string conversions.

As Kai-Uwe Bux already mentioned the internal representation will
probably be mandated in the next standard (as was done with
vectors).

So, I am looking at something like:

/*
** The function replaces the string controlled by *this
** with a string of length strlen(str) whose elements
** are a copy of the string controlled by str. Leaves str
** in a valid but unspecified state.
*/
basic_string<charT,traits,Allocator>&
assign(_Elem *str);

Or, more generally:

struct mystr {
size_t len;
char *b;
mystr() : len(0), b(0) {}
mystr(mystr const& s)
: len(s.len),
b(new char[len + 1]) {
memcpy(&b[ 0 ], &s.b[ 0 ], len + 1);
}
mystr(mystr&& s)
: len(0), b(0)
{
swap(b, s.b);
s.len = 0;
}
mystr(char *s)
: len(strlen(s)), b(0) {
swap(b, s);
}
/**
Others omitted for brevity
*/
};

int main()
{
char s[] = "hello";
char *l = "world";
mystr ms(s);
mystr ms2(l);
mystr ms3("jjj");
}

This of course leaves us with the issue of string literals
(e.g. instantiating ms2/ms3 may lead to undefined behavior later
on until and unless we guarantee these will never be written to?).
I have a feeling that type_traits may help somehow to identify
string literals though I can't figure a way out for the time
being. Any help would be greatly appreciated though!

I have a feeling that this will run into the same problems that
COW-implementations had, and then some. :)

If the string is about to modify (insert/erase/replace) its content,
or return a non-const reference or iterator, it would have to remember
if the buffer is const or not. A lot of bookkeeping that somtimes only
postpones the copying.

There are also ownership problems: Should the buffer be deleted in the
string destructor? Are there any other strings constructed from the
same char[] buffer? How do we know that the buffer lives at least as
long as the string?


In your examples with short strings, the currently popular
short-string-optimization is a definite win. Instead of allocating a
buffer, the space needed for a pointer and a capacity (perhaps 16
bytes on a 64-bit system) can be used to store the "hello world"
inside the std::string.

Just copy once, and be done with it! :)



Bo Persson
 
K

Kai-Uwe Bux

null said:
Yes, I understand. It appears that I have not been able to make
myself very clear. I was actually asking if we can enable move
semantics for null-terminated string to basic_string conversions.

As Kai-Uwe Bux already mentioned the internal representation will
probably be mandated in the next standard (as was done with vectors).

Well, I don't think that contiguous memory for the character sequence will
be sufficient.

a) Memory governed by a string is handled via an allocator. If you move from
a char* the information about how the memory for the char* was allocated
(and has to be deallocated) is lost.

b) String implementations have to manage size information (e.g., because
strings are allowed to contain 0-characaters). It is not ruled out that the
size information is put into the same contiguous memory as the character
sequence, which then has to be sizeof(size_type) longer. In moving from
char* to string, it might be impossible to obtain this additional piece of
memory in the right place. Moving the other way, you run into problems when
it comes to deallocating the char*.

So, I am looking at something like:

/*
** The function replaces the string controlled by *this
** with a string of length strlen(str) whose elements
** are a copy of the string controlled by str. Leaves str
** in a valid but unspecified state.
*/
basic_string<charT,traits,Allocator>&
assign(_Elem *str);

Or, more generally:

struct mystr {
size_t len;
char *b;
mystr() : len(0), b(0) {}
mystr(mystr const& s)
: len(s.len),
b(new char[len + 1]) {
memcpy(&b[ 0 ], &s.b[ 0 ], len + 1);
}
mystr(mystr&& s)
: len(0), b(0)
{
swap(b, s.b);
s.len = 0;
}
mystr(char *s)
: len(strlen(s)), b(0) {
swap(b, s);
}
/**
Others omitted for brevity
*/
};

Note that this implementation does not take care of the allocator issue by
implicitly assuming the char* member and the free char* are to be deallcated
the same way.

int main()
{
char s[] = "hello";
char *l = "world";
mystr ms(s);
mystr ms2(l);
mystr ms3("jjj");
}

This of course leaves us with the issue of string literals
(e.g. instantiating ms2/ms3 may lead to undefined behavior later
on until and unless we guarantee these will never be written to?).
I have a feeling that type_traits may help somehow to identify
string literals though I can't figure a way out for the time
being. Any help would be greatly appreciated though!


Best

Kai-Uwe Bux
 
N

null hypothesis

a) Memory governed by a string is handled via an allocator. If you move from
a char* the information about how the memory for the char* was allocated
(and has to be deallocated) is lost.

Assume we move a char * to a mystr S allocated with allocator A: Is it
too
difficult for the compiler to:
*) free the original contents of S by calling A.destroy()
*) know full well that it is moving a char * with some (probably
magic) allocator
and mark it as such?
b) String implementations have to manage size information (e.g., because
strings are allowed to contain 0-characaters). It is not ruled out that the
size information is put into the same contiguous memory as the character
sequence, which then has to be sizeof(size_type) longer.

And the reverse is equally true -- the implementation can choose to
keep this as a
separate member of the basic_string_impl struct. Then all we need is
swap the
data member of this struct and initialize the length = capacity to
equal the
length of the string.
In moving from
char* to string, it might be impossible to obtain this additional piece of
memory in the right place.

When moving from char * to strings, why would I even consider anything
beyond
the first null terminator?
Moving the other way, you run into problems when
it comes to deallocating the char*.

Yes, absolutely. I should have stated this, but I did not intend that
basic_strings
could be moved to a char *. Such semantics would be as limited as
c_str() is.
So, I am looking at something like:
/*
**    The function replaces the string controlled by *this
**    with a string of length strlen(str) whose elements
**    are a copy of the string controlled by str. Leaves str
**    in a valid but unspecified state.
*/
basic_string<charT,traits,Allocator>&
    assign(_Elem *str);
Or, more generally:
struct mystr {
    size_t len;
    char *b;
    mystr() : len(0), b(0) {}
    mystr(mystr const& s)
        : len(s.len),
        b(new char[len + 1]) {
            memcpy(&b[ 0 ], &s.b[ 0 ], len + 1);
    }
    mystr(mystr&& s)
        : len(0), b(0)
    {
        swap(b, s.b);
        s.len = 0;
    }
    mystr(char *s)
        : len(strlen(s)), b(0) {
            swap(b, s);
    }
    /**
         Others omitted for brevity
    */
};

Note that this implementation does not take care of the allocator issue by
implicitly assuming the char* member and the free char* are to be deallcated
the same way.

Not that this did not occur to me, but I was trying to explain
what I was trying to devise: A one way char * to string move semantics
for the
string library. I intentionally left the allocator out for simplicity.

BTW: Why doesn't basic_string have a ctor analogous to
vector(size_type n)?
 
N

null hypothesis

null hypothesis wrote: [...]
So, I am looking at something like:
/*
**    The function replaces the string controlled by *this
**    with a string of length strlen(str) whose elements
**    are a copy of the string controlled by str. Leaves str
**    in a valid but unspecified state.
*/
basic_string<charT,traits,Allocator>&
    assign(_Elem *str);
Or, more generally:
struct mystr {
    size_t len;
    char *b;
    mystr() : len(0), b(0) {}
    mystr(mystr const& s)
        : len(s.len),
        b(new char[len + 1]) {
            memcpy(&b[ 0 ], &s.b[ 0 ], len + 1);
    }
    mystr(mystr&& s)
        : len(0), b(0)
    {
        swap(b, s.b);
        s.len = 0;
    }
    mystr(char *s)
        : len(strlen(s)), b(0) {
            swap(b, s);
    }
    /**
         Others omitted for brevity
    */
};
int main()
{
    char s[] = "hello";
    char *l = "world";
    mystr ms(s);
    mystr ms2(l);
    mystr ms3("jjj");
}
This of course leaves us with the issue of string literals
(e.g. instantiating ms2/ms3 may lead to undefined behavior later
on until and unless we guarantee these will never be written to?).
I have a feeling that type_traits may help somehow to identify
string literals though I can't figure a way out for the time
being. Any help would be greatly appreciated though!

I have a feeling that this will run into the same problems that
COW-implementations had, and then some. :)

I had the exact same feeling -- am I trying to reinvent COW :)
If the string is about to modify (insert/erase/replace) its content,
or return a non-const reference or iterator, it would have to remember
if the buffer is const or not. A lot of bookkeeping that somtimes only
postpones the copying.

Exactly why I wanted to identify if we are dealing with a string
literal
or an array initialized with a string literal.
There are also ownership problems: Should the buffer be deleted in the
string destructor?

Yes. We moved the buffer to the string, so yes, the string should be
responsible for cleanup.
Are there any other strings constructed from the
same char[] buffer? How do we know that the buffer lives at least as
long as the string?

Copy constructed: Yes, possible. But not moved.
In your examples with short strings, the currently popular
short-string-optimization is a definite win. [...]

I didn't feel like typing out a 702 character PATH variable to prove
my point ;-) But yes, the short string optimizations are okay.
However, I am dealing with OS APIs where I typically run into strings
in the range of 30 - 1000 characters and hence the call for move.
 
K

Kai-Uwe Bux

null said:
Assume we move a char * to a mystr S allocated with allocator A: Is it
too
difficult for the compiler to:
*) free the original contents of S by calling A.destroy()
*) know full well that it is moving a char * with some (probably
magic) allocator
and mark it as such?

No, that might not be too difficult. But it does not solve the problem.

How should the destructor of the string go about releasing the memory? After
all, even if the compiler passes the information that the memory was
allocated via weird_alloc_method_from_library_X to the string, how could it
guess successfully the required deallocation function?
And the reverse is equally true -- the implementation can choose to
keep this as a
separate member of the basic_string_impl struct. Then all we need is
swap the
data member of this struct and initialize the length = capacity to
equal the
length of the string.

True, but mandating move constructors in the standard would essentially
force this implementation. I can see why the committee decided not to go
that way.
When moving from char * to strings, why would I even consider anything
beyond
the first null terminator?

The problem is not the space beyond the first null terminator but the space
_before_ the character sequence. That is a place where the string
implementation (in the memory it manages via the allocator) may store the
size information. With a char* provided from the outside, that space might
not be available.
Moving the other way, you run into problems when
it comes to deallocating the char*.

Yes, absolutely. I should have stated this, but I did not intend that
basic_strings
could be moved to a char *. Such semantics would be as limited as
c_str() is.
So, I am looking at something like:
/*
** The function replaces the string controlled by *this
** with a string of length strlen(str) whose elements
** are a copy of the string controlled by str. Leaves str
** in a valid but unspecified state.
*/
basic_string<charT,traits,Allocator>&
assign(_Elem *str);
Or, more generally:
struct mystr {
size_t len;
char *b;
mystr() : len(0), b(0) {}
mystr(mystr const& s)
: len(s.len),
b(new char[len + 1]) {
memcpy(&b[ 0 ], &s.b[ 0 ], len + 1);
}
mystr(mystr&& s)
: len(0), b(0)
{
swap(b, s.b);
s.len = 0;
}
mystr(char *s)
: len(strlen(s)), b(0) {
swap(b, s);
}
/**
Others omitted for brevity
*/
};

Note that this implementation does not take care of the allocator issue
by implicitly assuming the char* member and the free char* are to be
deallcated the same way.

Not that this did not occur to me, but I was trying to explain
what I was trying to devise: A one way char * to string move semantics
for the
string library. I intentionally left the allocator out for simplicity.

Even without the allocator, the rub comes with the destructor.

BTW: Why doesn't basic_string have a ctor analogous to
vector(size_type n)?

It does:

basic_string(size_type n,
charT c,
const Allocator& a = Allocator());

The difference is only that you may not omit the charT parameter c.


Best

Kai-Uwe Bux
 
B

Bo Persson

Kai-Uwe Bux said:
It does:

basic_string(size_type n,
charT c,
const Allocator& a = Allocator());

The difference is only that you may not omit the charT parameter c.

I believe std::vector (and other containers) got this separate
constructor for C++0x, to get away with the CopyConstructible
requirement for its value_type.

With std::string we don't have that problem, as the char types are
trivially copyable anyway.


Bo Persson
 
K

Kai-Uwe Bux

Bo said:
I believe std::vector (and other containers) got this separate
constructor for C++0x, to get away with the CopyConstructible
requirement for its value_type.

Do you mean "do away" instead of "get away"?
With std::string we don't have that problem, as the char types are
trivially copyable anyway.

Best

Kai-Uwe Bux
 
N

null hypothesis

No, that might not be too difficult. But it does not solve the problem.

How should the destructor of the string go about releasing the memory? After
all, even if the compiler passes the information that the memory was
allocated via weird_alloc_method_from_library_X to the string, how could it
guess successfully the required deallocation function?

Ah! Brilliant point! I hadn't thought about this. Instead of supplying
you
with a half-baked solution let me rephrase to understand the problem
better: Does that mean that move semantics is inherently unsuitable
for
any type that does not provide us with a clear notion of the
underlying
allocator?
True, but mandating move constructors in the standard would essentially
force this implementation.

I was/am under the impression that the forthcoming standard *mandates*
(in
the sense that it'd like more people to use move semantics where
possible)
move semantics?
I can see why the committee decided not to go
that way.

Okay, now I have absolutely no idea what you mean by this! Can you
kindly
elaborate?
The problem is not the space beyond the first null terminator but the space
_before_ the character sequence. That is a place where the string
implementation (in the memory it manages via the allocator) may store the
size information. With a char* provided from the outside, that space might
not be available.

I think I already replied to this. But I get your point, allowing
move
semantics would necessarily limit the implementers choice of design.
Am I
correct?
Yes, absolutely. I should have stated this, but I did not intend that
basic_strings
could be moved to a char *. Such semantics would be as limited as
c_str() is.
So, I am looking at something like:
/*
**    The function replaces the string controlled by *this
**    with a string of length strlen(str) whose elements
**    are a copy of the string controlled by str. Leaves str
**    in a valid but unspecified state.
*/
basic_string<charT,traits,Allocator>&
assign(_Elem *str);
Or, more generally:
struct mystr {
size_t len;
char *b;
mystr() : len(0), b(0) {}
mystr(mystr const& s)
: len(s.len),
b(new char[len + 1]) {
memcpy(&b[ 0 ], &s.b[ 0 ], len + 1);
}
mystr(mystr&& s)
: len(0), b(0)
{
swap(b, s.b);
s.len = 0;
}
mystr(char *s)
: len(strlen(s)), b(0) {
swap(b, s);
}
/**
Others omitted for brevity
*/
};
Note that this implementation does not take care of the allocator issue
by implicitly assuming the char* member and the free char* are to be
deallcated the same way.
Not that this did not occur to me, but I was trying to explain
what I was trying to devise: A one way char * to string move semantics
for the
string library. I intentionally left the allocator out for simplicity.

Even without the allocator, the rub comes with the destructor.
BTW: Why doesn't basic_string have a ctor analogous to
vector(size_type n)?

It does:

basic_string(size_type n,
             charT c,
             const Allocator& a = Allocator());

The difference is only that you may not omit the charT parameter c.

Hm. I am aware of this. In essence, none of the STL containers/the
string
library (I state them separately since the latter is not considered
part
of the STL by some) allow us to allocate without initialization. I
guess
that's good and that essentially forces a copy. So, with the Meyer's
solution I actually end up with two copies to the vector's memory
(once
during creation and once during the actual writing via the call to a
legacy
API). Suddenly, it appears that creating a char/wchar_t buffer and
copying
it out to a basic_string is more efficient than what we learn from
Effective
STL! Thoughts?
 
K

Kai-Uwe Bux

null said:
null said:
[...]
a) Memory governed by a string is handled via an allocator. If you
move from a char* the information about how the memory for the char*
was allocated (and has to be deallocated) is lost.
Assume we move a char * to a mystr S allocated with allocator A: Is it
too
difficult for the compiler to:
*) free the original contents of S by calling A.destroy()
*) know full well that it is moving a char * with some (probably
magic) allocator
and mark it as such?

No, that might not be too difficult. But it does not solve the problem.

How should the destructor of the string go about releasing the memory?
After all, even if the compiler passes the information that the memory
was allocated via weird_alloc_method_from_library_X to the string, how
could it guess successfully the required deallocation function?

Ah! Brilliant point! I hadn't thought about this. Instead of supplying
you
with a half-baked solution let me rephrase to understand the problem
better: Does that mean that move semantics is inherently unsuitable
for
any type that does not provide us with a clear notion of the
underlying
allocator?

I think so.

I was/am under the impression that the forthcoming standard *mandates*
(in
the sense that it'd like more people to use move semantics where
possible)
move semantics?

Yes, but C++0X does not provide a move constructor from char* to
std::string. The above is part of a possible rationale (independent of the
allocator issue) for that decision.
Okay, now I have absolutely no idea what you mean by this! Can you
kindly
elaborate?

I think, we just talk about different things. I was pondering the question
whether the standard should provide a move constructor from char* to
std::string. The point here is that (besides the allocator issue) such a
constructor would restrict possible implementations of std::string.
I think I already replied to this. But I get your point, allowing
move
semantics would necessarily limit the implementers choice of design.
Am I
correct?

Yes! that's exactly what I was trying to say.

Moving the other way, you run into problems when
it comes to deallocating the char*.
Yes, absolutely. I should have stated this, but I did not intend that
basic_strings
could be moved to a char *. Such semantics would be as limited as
c_str() is.
So, I am looking at something like:
/*
** The function replaces the string controlled by *this
** with a string of length strlen(str) whose elements
** are a copy of the string controlled by str. Leaves str
** in a valid but unspecified state.
*/
basic_string<charT,traits,Allocator>&
assign(_Elem *str);
Or, more generally:
struct mystr {
size_t len;
char *b;
mystr() : len(0), b(0) {}
mystr(mystr const& s)
: len(s.len),
b(new char[len + 1]) {
memcpy(&b[ 0 ], &s.b[ 0 ], len + 1);
}
mystr(mystr&& s)
: len(0), b(0)
{
swap(b, s.b);
s.len = 0;
}
mystr(char *s)
: len(strlen(s)), b(0) {
swap(b, s);
}
/**
Others omitted for brevity
*/
};
Note that this implementation does not take care of the allocator
issue by implicitly assuming the char* member and the free char* are
to be deallcated the same way.
Not that this did not occur to me, but I was trying to explain
what I was trying to devise: A one way char * to string move semantics
for the
string library. I intentionally left the allocator out for simplicity.

Even without the allocator, the rub comes with the destructor.
BTW: Why doesn't basic_string have a ctor analogous to
vector(size_type n)?

It does:

basic_string(size_type n,
charT c,
const Allocator& a = Allocator());

The difference is only that you may not omit the charT parameter c.

Hm. I am aware of this. In essence, none of the STL containers/the
string
library (I state them separately since the latter is not considered
part
of the STL by some) allow us to allocate without initialization. I
guess
that's good and that essentially forces a copy. So, with the Meyer's
solution I actually end up with two copies to the vector's memory
(once
during creation and once during the actual writing via the call to a
legacy
API). Suddenly, it appears that creating a char/wchar_t buffer and
copying
it out to a basic_string is more efficient than what we learn from
Effective
STL! Thoughts?

About:

std::string result ( api_get_length(), 0c );
api_get_str( &result[0] );

With a little bit of luck the compiler might even optimize away the filling
with 0 when it detects that the full content of the string is overwritten
right away (that may only happen when api_get_str() is inlined, and it may
not happen at all).


Best

Kai-Uwe Bux
 
B

Bo Persson

Kai-Uwe Bux said:
Do you mean "do away" instead of "get away"?

Probably. :) "do away with" or "get away from"?

A C++0x vector should be able to store objects that are only
DefaultConstructible and movable. That introduced a couple of
signatures that std::string doesn't need. Unfortunately it also made
the containers' interfaces different.


Bo Persson
 
N

null hypothesis

null said:
So, with the Meyer's
solution I actually end up with two copies to the vector's memory
(once
during creation and once during the actual writing via the call to a
legacy
API). Suddenly, it appears that creating a char/wchar_t buffer and
copying
it out to a basic_string is more efficient than what we learn from
Effective
STL! Thoughts?

About:

  std::string result ( api_get_length(), 0c );
  api_get_str( &result[0] );

With a little bit of luck the compiler might even optimize away the filling
with 0 when it detects that the full content of the string is overwritten
right away (that may only happen when api_get_str() is inlined, and it may
not happen at all).

Interesting. I'd like to see some compiler do that. Do you suggest
any
specific flags that I can try with say, gcc? Another point, most C
APIs
would ask you to provide a buffer large enough to hold the null-
terminator
which not being required for a string would necessitate a final erase
for
the string to be consistent.
 
K

Kai-Uwe Bux

null said:
null said:
So, with the Meyer's
solution I actually end up with two copies to the vector's memory
(once
during creation and once during the actual writing via the call to a
legacy
API). Suddenly, it appears that creating a char/wchar_t buffer and
copying
it out to a basic_string is more efficient than what we learn from
Effective
STL! Thoughts?

About:

std::string result ( api_get_length(), 0c );
api_get_str( &result[0] );

With a little bit of luck the compiler might even optimize away the
filling with 0 when it detects that the full content of the string is
overwritten right away (that may only happen when api_get_str() is
inlined, and it may not happen at all).

Interesting. I'd like to see some compiler do that. Do you suggest
any
specific flags that I can try with say, gcc?

I'd like to know myself :) In the absence of knowledge, I would just try
the various optimization options.
Another point, most C
APIs
would ask you to provide a buffer large enough to hold the null-
terminator
which not being required for a string would necessitate a final erase
for
the string to be consistent.

Right. That's just another one of those cases where the committee leaves
some freedom to the implementation. After all, a possible implementation
would add the needed terminator (and even the memmory it needs) only when
data() is called.

I ran into the same problem with a strprintf() function (just like the
various xxxprintf() functions, now with a std::string as the target). C++0X
does make the nice guarantee that memory is contiguous, so one can use
vsnprintf() to do the job, but the terminating 0 is still a problem.

So, to accommodate the API, one would do:

std::string::size_type length = api_get_length;
std::string result ( length+1, 0c );
api_get_str( &result[0] );
result.resize( length );

Two observations:
a) Downsizing the string will not cause reallocation hence no copy.
b) The previous code is wedged in between the additional stuff. Hence the
data flow analysis required of the compiler to optimize away the
initialization has not become any harder.
c) With the strprintf() function mentioned above, I just checked the string
implementation and added a valgrind-based unit test to catch problems with
changes in the string implementation. I will switch to the "right"
implementation of strprintf() when it becomes necessary.


Best

Kai-Uwe Bux
 
N

null hypothesis

I ran into the same problem with a strprintf() function (just like the
various xxxprintf() functions, now with a std::string as the target). C++0X
does make the nice guarantee that memory is contiguous, so one can use
vsnprintf() to do the job, but the terminating 0 is still a problem.

So, to accommodate the API, one would do:

  std::string::size_type length = api_get_length;
  std::string result ( length+1, 0c );
  api_get_str( &result[0] );
  result.resize( length );

Two observations:
a) Downsizing the string will not cause reallocation hence no copy.

Correct me if I am wrong, but the very unusual iterator invalidation
semantics that string has also pertains to resize() (the first call
to
any non-const member function and those that are equivalent to erase
or insert). Wouldn't that imply that the implementation *may* actually
reallocate for a resize call to invalidate iterators?
b) The previous code is wedged in between the additional stuff. Hence the
data flow analysis required of the compiler to optimize away the
initialization has not become any harder.

Yes, but I am dealing with OS APIs and seriously doubt if they ever
are
inlined.
c) With the strprintf() function mentioned above, I just checked the string
implementation and added a valgrind-based unit test to catch problems with
changes in the string implementation. I will switch to the "right"
implementation of strprintf() when it becomes necessary.

I am curious, what is the difference between the multiple
implementations
that you manage -- fiddling with the 0 terminator?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,145
Messages
2,570,826
Members
47,372
Latest member
LucretiaFo

Latest Threads

Top