Concatenating multiple std::string's

A

A B

Hi,

I recently came across this strange behavior (from my point of view)
with C++ strings. I thought it was supposed to be intuitively easy to
use, unlike C-style char arrays. In particular, operator+ was
supposed to concatenate strings on the fly and let the compiler figure
out the low-level details. Here is a short code sample that I tested:

#include <string>
#include <iostream>

using namespace std;


int main() {

const string name("sample");
const string ext("txt");


const char* const srcName = (string("file_") + name + string(".")
+ ext).c_str();

cout << "srcName, when created: " << srcName << endl;

const string trgNameStr = name + string("_1.") + ext;

cout << "srcName, after trgNameStr has been created: " << srcName
<< endl;

const char* const trgName = trgNameStr.c_str();

cout << "trgName, when created: " << trgName << endl;

return 0;
}


Under GCC 4.4.3, this produces

srcName, when created: file_sample.txt
srcName, after trgNameStr has been created: sample_1.txt
trgName, when created: sample_1.txt

So, the mere fact of creating a "target file name" string modifies the
"source file name" string, and makes it equal to the target??? Despite
the fact that everything is constant? This is not the behavior I
expected from my limited C++ knowledge. Could one of the gurus please
clarify?

Thanks
 
R

red floyd

Hi,

I recently came across this strange behavior (from my point of view)
with C++ strings. I thought it was supposed to be intuitively easy to
use, unlike C-style char arrays.  In particular, operator+ was
supposed to concatenate strings on the fly and let the compiler figure
out the low-level details. Here is a short code sample that I tested:

#include <string>
#include <iostream>

using namespace std;

int main() {

    const string name("sample");
    const string ext("txt");

    const char* const srcName = (string("file_") + name + string(".")
+ ext).c_str();

    cout << "srcName, when created: " << srcName << endl;

    const string trgNameStr = name + string("_1.") + ext;

    cout << "srcName, after trgNameStr has been created: " << srcName
<< endl;

    const char* const trgName = trgNameStr.c_str();

    cout << "trgName, when created: " << trgName << endl;

    return 0;

}

Under GCC 4.4.3, this produces

srcName, when created: file_sample.txt
srcName, after trgNameStr has been created: sample_1.txt
trgName, when created: sample_1.txt

So, the mere fact of creating a "target file name" string modifies the
"source file name" string, and makes it equal to the target??? Despite
the fact that everything is constant? This is not the behavior I
expected from my limited C++ knowledge. Could one of the gurus please
clarify?

You've got UB. I believe that once the temporaries involved in
creating
srcName go away, you've got a pointer to deallocated memory.

Sort of like:

char *s = new char[100];
strcpy(s,"Hello");
char *t = s;
delete[] s;
 
A

Andrey

I recently came across this strange behavior (from my point of view)
with C++ strings. I thought it was supposed to be intuitively easy to
use, unlike C-style char arrays.  In particular, operator+ was
supposed to concatenate strings on the fly and let the compiler figure
out the low-level details. Here is a short code sample that I tested:
#include <string>
#include <iostream>
using namespace std;
int main() {
    const string name("sample");
    const string ext("txt");
    const char* const srcName = (string("file_") + name + string(".")
+ ext).c_str();
    cout << "srcName, when created: " << srcName << endl;
    const string trgNameStr = name + string("_1.") + ext;
    cout << "srcName, after trgNameStr has been created: " << srcName
<< endl;
    const char* const trgName = trgNameStr.c_str();
    cout << "trgName, when created: " << trgName << endl;
    return 0;

Under GCC 4.4.3, this produces
srcName, when created: file_sample.txt
srcName, after trgNameStr has been created: sample_1.txt
trgName, when created: sample_1.txt
So, the mere fact of creating a "target file name" string modifies the
"source file name" string, and makes it equal to the target??? Despite
the fact that everything is constant? This is not the behavior I
expected from my limited C++ knowledge. Could one of the gurus please
clarify?

You've got UB.  I believe that once the temporaries involved in
creating
srcName go away, you've got a pointer to deallocated memory.

Sort of like:

char *s = new char[100];
strcpy(s,"Hello");
char *t = s;
delete[] s;


Pardon my ignorance, but what is "UB"?
I suspected it has something to do with temporary objects, I just
thought the compiler should at least give some kind of warning.
Conctenating several strings seems to be such as a common operation.
What is the recommended approach for the example I wrote? Concatenate
strings one by one?
 
I

Ian Collins

Hi,

I recently came across this strange behavior (from my point of view)
with C++ strings. I thought it was supposed to be intuitively easy to
use, unlike C-style char arrays. In particular, operator+ was
supposed to concatenate strings on the fly and let the compiler figure
out the low-level details. Here is a short code sample that I tested:

#include<string>
#include<iostream>

using namespace std;


int main() {

const string name("sample");
const string ext("txt");


const char* const srcName = (string("file_") + name + string(".")
+ ext).c_str();

Here srcName points to a buffer in a temporary object (the result of the
concatenation).
cout<< "srcName, when created: "<< srcName<< endl;

const string trgNameStr = name + string("_1.") + ext;

Here you create another temporary object, probably reusing the same
chunk of memory.

Under GCC 4.4.3, this produces

srcName, when created: file_sample.txt
srcName, after trgNameStr has been created: sample_1.txt
trgName, when created: sample_1.txt

So, the mere fact of creating a "target file name" string modifies the
"source file name" string, and makes it equal to the target???

There isn't a "source file name" string! All you have is a pointer to
some memory.

I can show this using Sun CC, which has an option not to destroy
temporary object until they go out of scope:

CC x.cc -features=no%tmplife
../a.out
srcName, when created: file_sample.txt
srcName, after trgNameStr has been created: file_sample.txt
trgName, when created: sample_1.txt
 
J

Jonathan Lee

I recently came across this strange behavior (from my point of view)
with C++ strings. I thought it was supposed to be intuitively easy to
use, unlike C-style char arrays.
snip

    const char* const srcName = (string("file_") + name + string(".")
+ ext).c_str();

I think you're misunderstanding this line.

c_str() returns a pointer to memory which is managed by the
std::string you
call c_str on, but in this case the string is a temporary. The
temporary is
destroyed before execution of the next line of code, and the memory
c_str()
pointed to is freed.

Since that memory can be reused, it is (in your case) being picked up
by
the next std::string you make, trgNameStr. The constructor sets the
memory
to be "sample_1.txt", which is why srcName shows the same string.

Note that this is in no way guaranteed behavior. The value returned by
c_str() is no longer valid once the temporary string is destroyed.
i.e.,
in your program srcName is never a valid pointer. A lot of different
things can happen.

You could, of course, do something like

std::string bleh = string("file_") + name + string(".") + ext;
const char* srcName = a.c_str();

and srcName will be valid as long as "bleh" is unchanged.

Or simply don't mix C-style strings std::strings.

--Jonathan
 
A

Andrey

I think you're misunderstanding this line.

c_str() returns a pointer to memory which is managed by the
std::string you
call c_str on, but in this case the string is a temporary. The
temporary is
destroyed before execution of the next line of code, and the memory
c_str()
pointed to is freed.

Since that memory can be reused, it is (in your case) being picked up
by
the next std::string you make, trgNameStr. The constructor sets the
memory
to be "sample_1.txt", which is why srcName shows the same string.

Note that this is in no way guaranteed behavior. The value returned by
c_str() is no longer valid once the temporary string is destroyed.
i.e.,
in your program srcName is never a valid pointer. A lot of different
things can happen.

You could, of course, do something like

   std::string bleh = string("file_") + name + string(".") + ext;
   const char* srcName =  a.c_str();

and srcName will be valid as long as "bleh" is unchanged.

Or simply don't mix C-style strings std::strings.

--Jonathan

Jonathan, I think this line

std::string bleh = string("file_") + name + string(".") + ext;

was the answer I was looking for. I was trying to do everything at
once (i.e. a single line of code), and spent hours debugging as a
result.

Unfortunately, I have to convert from C++ to C strings, because many
functions (such as "rename" for files in stdio.h) only accept C
strings.

Thanks
 
I

Ian Collins

Jonathan, I think this line

std::string bleh = string("file_") + name + string(".") + ext;

was the answer I was looking for. I was trying to do everything at
once (i.e. a single line of code), and spent hours debugging as a
result.

Unfortunately, I have to convert from C++ to C strings, because many
functions (such as "rename" for files in stdio.h) only accept C
strings.

Then pass bleh.c_str() to the C function.
 
J

Jonathan Lee

Unfortunately, I have to convert from C++ to C strings, because many
functions (such as "rename" for files in stdio.h) only accept C
strings.

In such cases, just hold onto the std::string and use c_str() when
calling functions from <cstdio>, etc.

ex.,
std::string file_name = string("file_") + name + "." + ext;

remove(file_name.c_str()); // c_str() is okay here

--Jonathan
 
G

Goran Pusic

Hi,

I recently came across this strange behavior (from my point of view)
with C++ strings. I thought it was supposed to be intuitively easy to
use, unlike C-style char arrays.  In particular, operator+ was
supposed to concatenate strings on the fly and let the compiler figure
out the low-level details. Here is a short code sample that I tested:

#include <string>
#include <iostream>

using namespace std;

int main() {

    const string name("sample");
    const string ext("txt");

    const char* const srcName = (string("file_") + name + string(".")
+ ext).c_str();

    const char* const trgName = trgNameStr.c_str();
So, the mere fact of creating a "target file name" string modifies the
"source file name" string, and makes it equal to the target??? Despite
the fact that everything is constant? This is not the behavior I
expected from my limited C++ knowledge. Could one of the gurus please
clarify?

The answer lies in the documentation ;-). E.g.
http://www.cplusplus.com/reference/string/string/c_str/ says:

The returned array points to an internal location with the required
storage space for this sequence of characters plus its terminating
null-character, but the values in this array should not be modified in
the program and are only granted to remain unchanged until the next
call to a non-constant member function of the string object.

Perhaps that's not clear in your case, but the string object is the
temporary string created through concatenation, e.g this:

(string("file_") + name + string(".") + ext) [1]

Once your assignment statement is finished, in:

const char* const srcName = ...;

a destructor is called on a temporary object [1]. You could say that
this destructor is the non-const member function that was called. At
that point, your srcName points to freed memory and all hell breaks
loose ;-).

Mixing char* and std::string is involved ;-). Each time you take a
char* out of a std::string, you have to make sure that said pointer
does not outlive said string and any of it's modifications. That's a
direct consequence of manual memory handling in C and C++ that anyone
must get right.

Goran.
 
J

James Kanze

Then pass bleh.c_str() to the C function.

If the only use of the char const* is to be passed to
a C function which doesn't save the pointer, temporaries work
well, e.g.:

cFunc((stringA + stringB).c_str());

The problem only occurs if you start using char const* variables
(whose life extends beyond that of the temporary) in the C++
code. The key, in such cases, is to defer the c_str() to the
last possible moment.
 
R

red floyd

Pardon my ignorance, but what is "UB"?
I suspected it has something to do with temporary objects, I just
thought the compiler should at least give some kind of warning.
Conctenating several strings seems to be such as a common operation.
What is the recommended approach for the example I wrote? Concatenate
strings one by one?

UB = Undefined Behavior.
 
J

Jorgen Grahn

....
There isn't a "source file name" string! All you have is a pointer to
some memory.

I can show this using Sun CC, which has an option not to destroy
temporary object until they go out of scope:
....

The poster is apparently working on Linux on x86, so he has a pretty
good tool already. When I ran his code through valgrind, it listed 92
errors.

/Jorgen
 
I

Ian Collins

...

The poster is apparently working on Linux on x86, so he has a pretty
good tool already. When I ran his code through valgrind, it listed 92
errors.

92? If I try it on Sun's dbx, I get the one that matters:

Read from unallocated!
 
J

Jorgen Grahn

From a standards perspective that's not the same thing.

Uh, I didn't say it *was*.
Undefined
behavior means only that the standard doesn't say what happens. It
doesn't preclude an implementation from specifying what happens.

I said this distinction was not interesting in most cases. Andrey
simply needed to be told his code didn't work, and why.

/Jorgen
 
J

Jorgen Grahn

92? If I try it on Sun's dbx, I get the one that matters:

Read from unallocated!

I guess you're joking somehow, but of course that is what valgrind
(which unlike dbx the poster *does* have access too) says to, only in
much more detail.


/Jorgen
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,233
Members
46,820
Latest member
GilbertoA5

Latest Threads

Top