std::string performance (Sun implementation)

M

mlimber

Default said:
Often with real-time projects it isn't the speed of operations that's
the problem. The reason std::string and others can't be used (or have
to be used carefully) is that memory allocation during program run is
often forbidden. Consistency of program cycles is more important that
the speed. You have to know what the bounds of your program run cycle
is, you can't have it vary much.

You can use standard containers if they're sized at start-up and no
operations are performed that will cause the size to shift. That means
avoid many of the cool operators and such.

Agreed. Or you can use containers/strings with a custom allocator,
e.g., one that uses a pre-allocated area of memory for its operations.
In any case, efficiency, not memory allocation, was the chief
constraint laid out by the OP.

Cheers! --M
 
R

Roland Pibinger

Also, someone else recommended the use of a different
library implementation such as STLPort, whose allocators are supposed
to be superior.

"Standard C++ string class. This class has performance
characteristics very much like vector<>, meaning, for example, that it
does not perform reference-count or copy-on-write, and that
concatenation of two strings is an O(N) operation."

quoted from: STLport-4.6.2\stlport\stl\_string.h

Best regards,
Roland Pibinger
 
R

rapool

Markus Moll wrote:

#include <string.h>
#include <stdio.h>


int main()
{
char *s = "Hello world. This is an unnecessarily long string.";

unsigned i;
unsigned total = 0;
for(i=0; i!=1000000; ++i)
total += strlen(s);
printf("%d\n", total);
return 0;
}

I just change this source code like this,

int main()
{
char *s = "Hello world. This is an unnecessarily long string.";

unsigned i;
unsigned total = 0;
unsigned size = strlen(s);

for(i=0; i!=1000000; ++i)
total += size;
printf("%d\n", total);
return 0;
}

The result shows three times faster than std::string.
I did another test.

int function1 ()
{
unsigned i;
unsigned total = 0;

for(i=0; i!=10000000; ++i)
{
char *s = "Hello world. This is an unnecessarily long string.";
total += strlen(s);
}

printf("%d\n", total);
return 0;
}

int function2 ()
{
unsigned total = 0;

for (unsigned i=0; i != 10000000; i++)
{
std::string s = "Hello world. This is an unnecessarily long
string.h";
total += s.length();
}

std::printf("%d\n", total);
return 0;
}

The function1 is ten times faster than function2.
In my opinion, although std::string can be good solution
for developing application, it is not for framework.
 
J

jortizclaver

I tried yesterday using Sun Workshop 10 and linking with the STLPort
libraries it includes and I have to say execution times has plummeted
about 30% and now difference between char* implementation and
std::string is not that big. Using a very very basic test program, it's
a 1:1.5 ratio (no doubt in other tasks std::strings will beat char*)
Those results are perfectly valid for me. I think I'll use strings.

Thanks,
Jorge
 
J

jortizclaver

Using char* implies managing memory. I don't trust in unexperienced
Amen and amen. But you're making my case for me!

I'm not discussing which one is better in terms of memory managment or
friendly use, I'm just trying to evaluate std::string performance so I
can keep my job after this project. I do prefer using std::string but I
need to be sure they don't slow my process down dramatically.
Depending on your knowledge of how the
framework will be used, you may be able to measure for true (rather
than speculative) bottlenecks and hand tune those parts of the code.

That's the case. I've been working in this company for five years so I
know exactly where the problems are. And performance passing data
forward and backward is one of them. That's why I concern std::string
performance.

Regards,
Jorge
 
P

Patrick Kowalzick

Hello Jorge,
I think I'll use strings.

You could use your own typedefs:

*** jcstring.h ***

#include <string>

namespace jc
{
typedef std::string string;
typedef std::wstring wstring;
}

This will make it easier to change to a different allocator later. On the
other hand it is getting a little bit nasty if you have interfaces with a
std::string and want to feed a jc::string (and vice versa). So, keep it
consistent :).

Even if performance is fine now, there are other benefits of an own
allocator (e.g. lower fragmentation for long term stability).

Good luck,
Patrick
 
N

Nitin Motgi

Ben said:
(e-mail address removed) wrote:
Yes. The other things that are really easy with C-Style strings are
buffer overruns and much other undefined behaviour.
Reece H. Dunn has a fixed_string implementation that tries to solve
this problem (buffer over flow) and still use C-Style Strings. It's
currently being reviewed by for inclusion into boost library.
By the time you have correctly, managed your C-Style strings, ensuring
to keep track of the length, add your null termination, feed the
functions with the buffer length less 1, in some situations etc., etc.,
you will probably find that, apart from your code becoming cluttered
with many more lines managing the char array, speed is similar.

Notion behind fixed_string is really good. It's neat and does not look
cluttered. It adds a little over head but I have been using it and till
now it seems good.


Thank you,
Nitin Motgi
[ Just My Thoughts ]
 
P

persenaama

This code, and the one that Markus posted earlier don't really show
either std::strlen or std::string::length to be any faster than the
other, just ways to use either poorly, and that description only holds
when it makes any real world difference.

You can write less than good code with either, it is also possible to
write good code with either. All it takes is experience and a brain to
capitalize on it... agreed?

The biggest advantage of std::string is the management of the object's
member data, it makes higher level code easier to write and read,
basicly less maintenance required so I tend to personally go with that
approach. Most of the code I write isn't performance critical, actually
very little is. On the other hand, most applications or parts of
application I write, could easily be bottlenecks (I write graphics
drivers for embedded graphics processors) but then again, in that field
code is never fast enough, so better do architechtural decisions that
are wise and leave implementation less headroom to cause problems. But
that's not very interesting to the rest of the world. :)
 
P

persenaama

Am I the only one to miss the point of these performance comparisons
you posted, but why compile the code to be as slow as possible when
trying to decide which is faster? I don't get it?
 
B

Ben Pope

persenaama said:
Am I the only one to miss the point of these performance comparisons
you posted, but why compile the code to be as slow as possible when
trying to decide which is faster? I don't get it?

I believe I made the same comment, no optimisations in the code. I
don't understand.

Ben Pope
 
M

mlimber

jortizclaver said:
I tried yesterday using Sun Workshop 10 and linking with the STLPort
libraries it includes and I have to say execution times has plummeted
about 30% and now difference between char* implementation and
std::string is not that big. Using a very very basic test program, it's
a 1:1.5 ratio (no doubt in other tasks std::strings will beat char*)
Those results are perfectly valid for me. I think I'll use strings.

Thanks,
Jorge

Glad we could help. :)

Cheers! --M
 
D

Daniel T.

"jortizclaver said:
Hi,

I'm about to develop a new framework for my corporative applications
and my first decision point is what kind of strings to use: std::string
or classical C char*.

Performance in my system is quite importante - it's not a realtime
system, but almost - and I concern about std::string performance in
terms of speed. No doubt to use std implementation is a lot easier but
I can't sacrifice speed.

I'm using Sun Workshop 6. A very basic test shows processing with
std::string can be 3 times slower than using char*. Is there any
improvement in later versions?

The C++ standard does not specify the complexity of basic_string
operations, as such there is no guarantee that std::string will be as
fast as using a C char*.

Of course, this also means you can implement your own string classes and
have them conform to the standard rather easily. Back when I first
started using string classes, the popular thing to do was to use
reference counting internally, lately most strings are implemented
basically like a vector<char>, one could however implement it in terms
of a deque<char>.

What is it you do with your strings in your program? In most programs,
different string like objects are used in different ways; some must be
able to resize, some are always a fixed size but must be able to mutate
their contents, and some always have a fixed size and fixed contents.
Some must be able to efficiently handle insertions in the middle (more
efficiently than vector<char>.

A reasonably large company with very tight performance requirements
would be best served by having several different string implementations,
using std::string for everything would probably not be the best choice;
nor would just using C char* for everything.
 
P

persenaama

It would interesting to see some statistics, but unlike they exist
because most have better things to do. But here's what I observed about
string class uses (from personal experience, don't try to inflict
opinion to anyone)

Most typical use I observe is to use string object to store names of
objects and similiar use: basicly, just storing text. <- doh? ;-)

Another use is stringstream / sprintf -like usage, where strings are
created from, typically from (again, just observation) binary data.
Common practise when writing to file and the format is ascii as an
example.

I can think of countless other uses aswell but those two in my
experience are present practically everywhere.

For use #1 the performance is rarely critical when constructing the
string object. When using the name for indexing somekind of fast
comparison might be useful (insertion to a map might be one place where
this is useful optimization to have?)

I could go on, but I recognize the futility of iterating my own
personal experience in lack of statistics with very large number of
samples. :()

I don't remember string performance ever being a real problem in any
application I've written. I suppose I haven't written enough
applications to worry about this. I can think of instances where it may
pose a problem, but I haven't actually written such software (yet) so I
rather not comment on that -- however -- comments by those who have are
welcome!
 
R

roberts.noah

jortizclaver said:
Hi,

I'm about to develop a new framework for my corporative applications
and my first decision point is what kind of strings to use: std::string
or classical C char*.

Performance in my system is quite importante - it's not a realtime
system, but almost - and I concern about std::string performance in
terms of speed. No doubt to use std implementation is a lot easier but
I can't sacrifice speed.

I'm using Sun Workshop 6. A very basic test shows processing with
std::string can be 3 times slower than using char*. Is there any
improvement in later versions?

You should use generic methods of accessing an object that acts like a
std::string, then you can reimplement as you need. I myself found that
profiling rarely focuses on string operations, if ever.

I did do some performance tests of std::string vs. char[]. Using
std::string poorly, such as creating unnecissary temporaries (don't use
operator +), can greately reduce execution speed if in fact you are in
an area of code that needs speed. stringstream was much faster than
sprintf but strcat was also quite a bit faster than append().

Really its a balancing act. There are many costs of using char[] that
have nothing to do with execution speed including debug time caused by
buffer overruns (code riddled with static sized char arrays can really
cost when it comes time to change features). In the end the cost of
using std::string is negligable (when used reasonably) in execution
speed but of major benefit in development time when compared to char[].
If std::string is costing too much you probably need to look closer at
your algorithm, not the string implementation.

And remember, profile before optimizing. I get into arguments with
coworkers about std::string vs. char[] all the time (their claim is the
cost of allocation which I found to be negligable - big believers in
the Clib part of C++) and std::string is never in the top of a
profile...it is always something else. On the other hand it took me 12
hours to track down a buffer overflow in a char[]...
 
A

Alex Vinokur

Ben Pope said:
I believe I made the same comment, no optimisations in the code. I
don't understand.
[snip]

The performance comparisons I posted is an example of using C++ Program Perfometer. The user can use the Perfometer to get
comparison of various algorithms for various levels of optimization.
 
R

Roland Pibinger

The C++ standard does not specify the complexity of basic_string
operations, as such there is no guarantee that std::string will be as
fast as using a C char*.

Right, std::string is under-specified. That forces users to program
against a concrete string implementation, not against an interface.
Should you return a string by value? Depends on the implementation.
Of course, this also means you can implement your own string classes and
have them conform to the standard rather easily. Back when I first
started using string classes, the popular thing to do was to use
reference counting internally,

The most important design decision is whether copying and assignment
should be 'cheap' or 'expensive'.
lately most strings are implemented
basically like a vector<char>, one could however implement it in terms
of a deque<char>.

What is it you do with your strings in your program? In most programs,
different string like objects are used in different ways; some must be
able to resize, some are always a fixed size but must be able to mutate
their contents, and some always have a fixed size and fixed contents.
Some must be able to efficiently handle insertions in the middle (more
efficiently than vector<char>.

A reasonably large company with very tight performance requirements
would be best served by having several different string implementations,
using std::string for everything would probably not be the best choice;
nor would just using C char* for everything.

Well said! Call them (immutable) String, StringBuilder, and
(stack-based) Buffer.

Best regards,
Roland Pibinger
 
D

Daniel T.

Right, std::string is under-specified. That forces users to program
against a concrete string implementation, not against an interface.
Should you return a string by value? Depends on the implementation.
What is the performance of vector<string>? Depends on the
implementation.

Hum... You should return by value when you need to, otherwise don't...
The performance of any class depends on the implementation, the fact
that vector et al have known performance characteristics and string
doesn't is a result of the fact that the standard committee requires a
particular implementation for vector but not for string.

This is actually a boon to string implementors/users. They can customize
their particular string class to work best with their particular problem
space.
deque<char> is an interesting idea, albeit .c_str() could become
expensive.

..c_str() need only be cheep if the program in question uses many
libraries that only work with 'const char*' and we often need to convert
from string to 'const char*'. In a program where such conversion is not
needed, and has lots of insertions/removals at the beginning/middle of
strings, implementing string in terms of a deque makes more sense.

I expect that a string implementation in terms of 'list' wold not be a
good idea in any case (though still technically standards conforming. :)
Well said! Call them (immutable) String, StringBuilder, and
(stack-based) Buffer.

Thanks.
 
I

Ian Collins

Daniel said:
Hum... You should return by value when you need to, otherwise don't...
The performance of any class depends on the implementation, the fact
that vector et al have known performance characteristics and string
doesn't is a result of the fact that the standard committee requires a
particular implementation for vector but not for string.

This is actually a boon to string implementors/users. They can customize
their particular string class to work best with their particular problem
space.
Which is the case with this compiler, two standard libraries are
provided, one with a reference counted string that is very efficient for
passing fixed strings by value and one not reference counted (stlport).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,823
Latest member
Nadia88

Latest Threads

Top