perl-like string concatenation

  • Thread starter Christof Warlich
  • Start date
C

Christof Warlich

Hi,

is there any danger in overloading operator+ as follows:

template<typename T> string operator+(const string &x, T y) {
ostringstream tmp;
tmp << x << y;
return tmp.str();
}

apart from the fact that it does conflict with the definition from the
standard string library, which already provides dedicated overloads for:

string operator+ (const string& lhs, const string& rhs);
string operator+ (const char* lhs, const string& rhs);
string operator+ (char lhs, const string& rhs);
string operator+ (const string& lhs, const char* rhs);
string operator+ (const string& lhs, char rhs);

As an (ugly) workaround, I overloaded operator&() instead, i.e.

template<typename T> string operator&(const string &x, T y) {
ostringstream tmp;
tmp << x << y;
return tmp.str();
}

but I wonder if there was a good reason that the standard only provides
the dedicated overloads for operator+() on strings instead of the more
generic one employing a template for the second parameter.
 
J

James Kanze

is there any danger in overloading operator+ as follows:
template<typename T> string operator+(const string &x, T y) {
ostringstream tmp;
tmp << x << y;
return tmp.str();
}

Only that it results in unreadable and unmaintainable code, and
that it's not really very useful.
 
C

Christof Warlich

James said:
Only that it results in unreadable and unmaintainable code, and
that it's not really very useful.

Hmm... - why unreadable code? I'm writing a text processing application
where I quite frequently need to concatenate strings from other strings,
integers, characters, C character strings, ..., so I found it much more
readable to write something like:

int i = 22;
string newString = string("Hello") + i + "whatever";

instead of fiddling arround with ostringstream or sprintf() every now
and then. And consider scripting languages like Perl: They do offer the
same ideom. Could you give me an example as to what I may be missing
w.r.t. readability?

But my concern is whether I'm about to risk shooting myself into the
foot: It looks just too simple to be a good idea, particularly after the
standard string library does obviously _not_ use a template function.
Thus, I suspect that you are right w.r.t unmaintainability, but so far,
I couldn't think of a concrete problem with it, which is why I was
asking for comments from the community.
 
A

Alf P. Steinbach

* Christof Warlich:
Hmm... - why unreadable code? I'm writing a text processing application
where I quite frequently need to concatenate strings from other strings,
integers, characters, C character strings, ..., so I found it much more
readable to write something like:

int i = 22;
string newString = string("Hello") + i + "whatever";

instead of fiddling arround with ostringstream or sprintf() every now
and then. And consider scripting languages like Perl: They do offer the
same ideom. Could you give me an example as to what I may be missing
w.r.t. readability?

But my concern is whether I'm about to risk shooting myself into the
foot: It looks just too simple to be a good idea, particularly after the
standard string library does obviously _not_ use a template function.
Thus, I suspect that you are right w.r.t unmaintainability, but so far,
I couldn't think of a concrete problem with it, which is why I was
asking for comments from the community.

I don't know James' reasoning and until he clarifies I'd disagree with it.

*However*, adopting string concatenation as an idiomatic way to build strings is
IMHO ungood because it easily leads to O(n^2) behavior, e.g. for adding in
things in a loop.

Instead I prefer to overload operator <<, C++ "output", to a string, modifying
the string. This works well even with a temporary e.g.

foo( string().append("") << blah << 42 << gnurk );

and if that first sub-expression is defined as a macro TEMP_STR

foo( TEMP_STR << blah << 42 << gnurk );

and even better if it's defined as three- or four-line class

foo( TempStr() << blah << 42 << gnurk );

without the O(n^2) behavior so common in scripting languages.


Cheers & hth.,

- Alf
 
C

Christof Warlich

Alf said:
*However*, adopting string concatenation as an idiomatic way to build
strings is IMHO ungood because it easily leads to O(n^2) behavior, e.g.
for adding in things in a loop.

Instead I prefer to overload operator <<, C++ "output", to a string,
modifying the string. This works well even with a temporary e.g.

foo( string().append("") << blah << 42 << gnurk );

and if that first sub-expression is defined as a macro TEMP_STR

foo( TEMP_STR << blah << 42 << gnurk );

and even better if it's defined as three- or four-line class

foo( TempStr() << blah << 42 << gnurk );

without the O(n^2) behavior so common in scripting languages.

Did I get you right that you are suggesting something like this, i.e
returning a reference to a string instead:?

template<typename T> string &operator<<(string &x, T y) {
ostringstream tmp;
tmp << x << y;
x = tmp.str();
return x;
}

It avoids creating a new string over and over again when being used in a
loop, so it should run faster. But a quick test did not show a big
difference:

#include <string>
#include <iostream>
#include <sstream>
using namespace std;

template<typename T> string &operator<<(string &x, T y) {
ostringstream tmp;
tmp << x << y;
x = tmp.str();
return x;
}

template<typename T> string operator&(const string &x, T y) {
ostringstream tmp;
tmp << x << y;
return tmp.str();
}

int main(void) {
int i;
string tmp;
for(i = 0; i < 30000; i++) {
tmp = tmp << i << ";";
}
cout << tmp << endl;
tmp = string();
for(i = 0; i < 30000; i++) {
tmp = tmp & i & ";";
}
cout << tmp << endl;
return 0;
}

Compiled with gcc, both loops seem to run more or less equally long.
And my first solution had the advantage that the source string remains
unmodified.
 
A

Alf P. Steinbach

* Christof Warlich:
Did I get you right that you are suggesting something like this, i.e
returning a reference to a string instead:?

template<typename T> string &operator<<(string &x, T y) {
ostringstream tmp;
tmp << x << y;
x = tmp.str();
return x;
}

Not quite.

You have to specialize for various types, especially char const*.

Otherwise you're incurring a heck of an overhead of the ordinary sort, as you're
doing.

And you have to absolutely avoid copying the lhs string, otherwise you're
incurring the O(n^2) algorithmic overhead, which you're also doing, which means
that this implementation is not just inefficient but /incorrect/ wrt. to goal.

But apart from the extreme inefficiency of that concrete implementation, and
apart from it's incorrectness, then yes, sort of. :)

It avoids creating a new string over and over again when being used in a
loop, so it should run faster. But a quick test did not show a big
difference:

#include <string>
#include <iostream>
#include <sstream>
using namespace std;

template<typename T> string &operator<<(string &x, T y) {
ostringstream tmp;
tmp << x << y;
x = tmp.str();
return x;
}

template<typename T> string operator&(const string &x, T y) {
ostringstream tmp;
tmp << x << y;
return tmp.str();
}

int main(void) {
int i;
string tmp;
for(i = 0; i < 30000; i++) {
tmp = tmp << i << ";";

This should just be

tmp << i << ";".

}
cout << tmp << endl;
tmp = string();
for(i = 0; i < 30000; i++) {
tmp = tmp & i & ";";
}
cout << tmp << endl;
return 0;
}

Compiled with gcc, both loops seem to run more or less equally long.

See above.

I leave re-testing with that fix (removing incorrect usage), plus a fix of the
earlier mentioned incorrectness of operator<< implementation, to you. ;-)

And my first solution had the advantage that the source string remains
unmodified.

That's not an advantage, it's a disadvantage. With the "&" operator above you
don't have a choice about whether to create a new string or not, you always have
to create a new string. With the "<<" above the client code can choose in each
case -- and as mentioned, when it's used correctly it avoids that dreaded
O(n^2) behavior (there's no way to guard against silly client code, though).

Cheers & hth.,

- Alf
 
J

James Kanze

James Kanze schrieb:
Hmm... - why unreadable code? I'm writing a text processing
application where I quite frequently need to concatenate
strings from other strings, integers, characters, C character
strings, ..., so I found it much more readable to write
something like:
int i = 22;
string newString = string("Hello") + i + "whatever";

And what is that supposed to mean? Who knows what concatenating
an int to a string should mean: how many characters, what base,
etc.

More generally, an int is not a string, and confusing the two
leads to confusing code. If you want formatting, you know where
to get it, but it certainly isn't (and shouldn't be) the role of
std::string.
instead of fiddling arround with ostringstream or sprintf()
every now and then. And consider scripting languages like
Perl: They do offer the same ideom. Could you give me an
example as to what I may be missing w.r.t. readability?

I don't think you can argue readability from perl, in any way,
shape, form or fashion. But perl is also a different language
than C++. It doesn't have types (at least not in the sense C++
does); everything is a string. So what makes sense in perl
(supposing that there are things that make sense in perl) may
not make sense in C++.
But my concern is whether I'm about to risk shooting myself
into the foot: It looks just too simple to be a good idea,
particularly after the standard string library does obviously
_not_ use a template function. Thus, I suspect that you are
right w.r.t unmaintainability, but so far, I couldn't think of
a concrete problem with it, which is why I was asking for
comments from the community.

It's an implicit conversion, and a lossy one at that.
Experience has shown that implicit conversions are a real source
of errors; they pop up when you don't want them to, etc. And as
I said before, converting an arbitrary type into a string is
often ambiguous; there are many different strings which could
logically result from the conversion.

If you need the conversion, you might want to consider
boost::lexical_cast. This suffers from the last problem as
well, of course---the authors decided on one particular
representation, and that's all you're going to get---but at
least, it's not an implicit conversion. The reader can see that
you're converting, and knows where to look to find out what this
conversion means.
 
A

Alf P. Steinbach

* James Kanze:
And what is that supposed to mean? Who knows what concatenating
an int to a string should mean: how many characters, what base,
etc.

Assuming for the sake of discussion that that is a problem, then isn't that a
problem also with e.g. std::eek:stringstream::eek:perator<<?

I don't think such reasonable defaults are problematic.

On the contrary, having to specify every little customizable detail can IMHO be
very counter-productive, but it's my impression that it's almost impossible to
convince the IBM-oriented (e.g. Lotus Notes interface) and/or math-oriented
(e.g. C++0x random generators) and/or whatever-oriented guys that such extreme
verbosity is in conflict with goals like clarity, conciseness & correctness. And
in the other direction, it's almost impossible to convince some people that
having all kinds of implicit conversion (e.g. to char const* or to bool) is
ungood, that explicit code is better. But regarding that, a '+' is explicit.

Not that I think operator+ is good choice for building up strings, because its
natural semantics -- not the syntax -- are IMO unsuitable for this.

I've commented on that separate issue else-thread.


Cheers,

- Alf
 
C

Christof Warlich

Jeff said:
Are you familiar with Boost.Lexical Cast?

.... not before James mentioned it in his second answer.
Thanks for pointing me to this.
 
J

James Kanze

* James Kanze:

[...]
Assuming for the sake of discussion that that is a problem,
then isn't that a problem also with e.g.
std::eek:stringstream::eek:perator<<?

Except that the role of ostream *is* to format, and it provides
everything you need to specify exactly what you want.
I don't think such reasonable defaults are problematic.

It depends. In the case of int, maybe, maybe not. In the case
of unsigned, probably not (I usually want hex). And in the case
of floating point, almost never.
On the contrary, having to specify every little customizable
detail can IMHO be very counter-productive,

That's what formatting is all about: specifying every little
detail. Normally, you'll use custom manipulators for this, and
probably output almost nothing except label strings without a
custom manipulator before it, to indicate the semantics of the
value which follows. (How the value is formatted depends on the
semantics it has in the program.)
but it's my impression that it's almost impossible to convince
the IBM-oriented (e.g. Lotus Notes interface) and/or
math-oriented (e.g. C++0x random generators) and/or
whatever-oriented guys that such extreme verbosity is in
conflict with goals like clarity, conciseness & correctness.
And in the other direction, it's almost impossible to convince
some people that having all kinds of implicit conversion (e.g.
to char const* or to bool) is ungood, that explicit code is
better. But regarding that, a '+' is explicit.

But the conversion of int to string isn't.
Not that I think operator+ is good choice for building up
strings, because its natural semantics -- not the syntax --
are IMO unsuitable for this.

Yes. I may be naïve, but I'd normally expect any overloaded
operator+ to be commutative. On the other hand, there's an
enormous history in other langauges (well, at least Basic), and
it seems to be an established convention that + means
concatenate when applied to strings.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,185
Members
46,736
Latest member
AdolphBig6

Latest Threads

Top