std::string performance (Sun implementation)

J

jortizclaver

Hi,

I'm about to develop a new framework for my corporative applications
and my first decision point is what kind of strings to use: std::string
or classical C char*.

Performance in my system is quite importante - it's not a realtime
system, but almost - and I concern about std::string performance in
terms of speed. No doubt to use std implementation is a lot easier but
I can't sacrifice speed.

I'm using Sun Workshop 6. A very basic test shows processing with
std::string can be 3 times slower than using char*. Is there any
improvement in later versions?

Thanks,
Jorge Ortiz
 
M

Marcelo Pinto

jortizclaver escreveu:
Hi,

I'm about to develop a new framework for my corporative applications
and my first decision point is what kind of strings to use: std::string
or classical C char*.

Performance in my system is quite importante - it's not a realtime
system, but almost - and I concern about std::string performance in
terms of speed. No doubt to use std implementation is a lot easier but
I can't sacrifice speed.

AFAIK, one of the main problems with string performance is the time
spent (re) allocating space for the string, if you know beforehand the
size of the strings you will be dealing with you could use the reserve
member function of string class.

If I was in your place I would use strings for simplicity and security.
When I detected a problem with performance (by profiling) I would
strugle to solve it ("premature optimization is the root of all evil"
Hoare).
I'm using Sun Workshop 6. A very basic test shows processing with
std::string can be 3 times slower than using char*. Is there any
improvement in later versions?

The difference in performance could be due to the allocation problem I
mentioned, but I don't know Sun Workshop 6.
Thanks,
Jorge Ortiz

HTH,

Marcelo Pinto
 
M

mlimber

Reposting since Google Groups seems to be having problems.
Hi,

I'm about to develop a new framework for my corporative applications
and my first decision point is what kind of strings to use: std::string
or classical C char*.

Performance in my system is quite importante - it's not a realtime
system, but almost - and I concern about std::string performance in
terms of speed. No doubt to use std implementation is a lot easier but
I can't sacrifice speed.

I'm using Sun Workshop 6. A very basic test shows processing with
std::string can be 3 times slower than using char*. Is there any
improvement in later versions?

Thanks,
Jorge Ortiz

First, speed is implementation-dependent -- both compiler and library.
Since I presume you don't want to change compilers, you may be able to
find a faster library implementation (e.g., STLPort, Dinkumware, etc.).
Also, you might try fiddling with the switches for your compiler. On
some compilers, if you don't specify an optimization level, the
compiler doesn't even inline functions, which is a speed killer for the
C++ standard library in general. For more on these sorts of concerns,
you may want to consult a newsgroup (or list or whatever) that is more
familiar with your particular compiler and library.

Second, we might question the validity of your test. If you didn't
factor in the extra (read: manual, error-prone, often tedious) work
you'll have to do with arrays to validate lengths, prevent buffer
overflows, allocate and deallocate, etc., then it might not be a fair
test. See this FAQ for some more thoughts on why standard containers
should be preferred over arrays:

http://www.parashift.com/c++-faq-lite/containers.html#faq-34.1

Finally, let me remind you to beware premature optimization. Guru
Sutter reminds us about the rules for optimizing
(http://www.gotw.ca/publications/mill09.htm):

"1. Don't optimize early. 2. Don't optimize until you know that it's
needed. 3. Even then, don't optimize until you know *what* [is] needed,
and *where*.

"By and large, programmers--that includes you and me--are notoriously
bad at guessing the actual space/time performance bottlenecks in their
own code. If you don't have performance profiles or other empirical
evidence to guide you, you can easily spend days optimizing something
that doesn't need optimizing and that won't measurably affect runtime
space or time performance. What's even worse, however, is that when you
don't understand what needs optimizing you may actually end up
pessimizing (degrading your program) by of saving a small cost while
unintentionally incurring a large cost. Once you've run performance
profiles and other tests, and you actually know that a particular
optimization will help you in your particular situation, then it's the
right time to optimize."

Cheers! --M
 
B

Ben Pope

C style strings is more faster that std::string,

Are they, in general?
> and almost things that
u can do with std::string u can do with C style strings

Yes. The other things that are really easy with C-Style strings are
buffer overruns and much other undefined behaviour.

By the time you have correctly, managed your C-Style strings, ensuring
to keep track of the length, add your null termination, feed the
functions with the buffer length less 1, in some situations etc., etc.,
you will probably find that, apart from your code becoming cluttered
with many more lines managing the char array, speed is similar.

Of course, if you don't mind a bit of undefined behaviour here and
there, you don't need to bother with the checks, in which case, yes,
C-style strings are often marginally faster to mess around with.


Alternatively, get the code written correctly in much less time by using
std::string, and if the performance is not good enough, spend the time
saved to ensure your strings are allocated with enough space to avoid
reallocations when you concatenate, and feel free to play with the
allocator if you can do a better job then the provided one.

Ben Pope
 
K

Kai-Uwe Bux

jortizclaver said:
I'm about to develop a new framework for my corporative applications
and my first decision point is what kind of strings to use: std::string
or classical C char*.

Performance in my system is quite importante - it's not a realtime
system, but almost - and I concern about std::string performance in
terms of speed. No doubt to use std implementation is a lot easier but
I can't sacrifice speed.

Library design is tricky. Usually, std::string implementations will beat
char* solutions on common problems, unless the char* solution is carefully
optimized. The reason is that the std::string implementation can use fancy
stuff like reference counted copy on write and/or short string optimization
to make many operations faster than their naive char* counter parts.

On the other hand, the library implementor has no idea about your particular
application. Thus, the optimizations in the library may turn out to be
pessimizing in a particular case.
I'm using Sun Workshop 6. A very basic test shows processing with
std::string can be 3 times slower than using char*. Is there any
improvement in later versions?

I have a hard time believing that. Could you post your benchmarking code?


Best

Kai-Uwe Bux
 
J

jortizclaver

C style strings is more faster that std::string, and almost things that
u can do with std::string u can do with C style strings

You,re right, no doubt. But let's think you're designing a framework
that will be used by many different profiled programmers (from newies
to ten years experienced). Using STD containers and strings put system
robustness far away from unexperienced hands.
Also, you might try fiddling with the switches for your compiler. On
some compilers, if you don't specify an optimization level, the
compiler doesn't even inline functions, which is a speed killer for the
C++ standard library in general.

Already done. I've compiled test programs with optimizacion flags.
Second, we might question the validity of your test. If you didn't
factor in the extra (read: manual, error-prone, often tedious) work
you'll have to do with arrays to validate lengths, prevent buffer
overflows, allocate and deallocate, etc., then it might not be a fair
test.

It's an interesting point of view but my comparison already includes
all this validation process. Probably test does not include exactly the
same funcionality for both implementations but it doesn't justify
performance difference.

See this FAQ for some more thoughts on why standard containers
should be preferred over arrays:
http://www.parashift.com/c++-faq-lite/containers.html#faq-34.1

I agree containers are a lot better than arrays. My dilemma is between
std::string and char*. Map/vector performance is fine.
"1. Don't optimize early. 2. Don't optimize until you know that it's
needed. 3. Even then, don't optimize until you know *what* [is] needed,
and *where*.

I can't agree on that. Most of the times when you realize you have
performance problems is too late for going back. Sometimes you can't
predict how you'll system will work but at least you should start from
a advantageous point.

Regards,
Jorge
 
J

jortizclaver

Library design is tricky. Usually, std::string implementations will beat
char* solutions on common problems, unless the char* solution is carefully
optimized. The reason is that the std::string implementation can use fancy
stuff like reference counted copy on write and/or short string optimization
to make many operations faster than their naive char* counter parts.

I'm sure it works in that way with g++ implementation or other
libraries but Sun's doesn't seem to be very good.
I have a hard time believing that. Could you post your benchmarking code?

Believe me, I'd like to be wrong :)

I've made many tests. One of them is a very basic test program someone
sent to another C++ newsgroup:
http://groups.google.com/group/comp...8?lnk=st&q=test_three&rnum=4#e7d1ffef4fd97668

1570000:using assign,append
650000:using C strings

Regards,
Jorge
 
M

mlimber

jortizclaver said:
Hi,

I'm about to develop a new framework for my corporative applications
and my first decision point is what kind of strings to use: std::string
or classical C char*.

Performance in my system is quite importante - it's not a realtime
system, but almost - and I concern about std::string performance in
terms of speed. No doubt to use std implementation is a lot easier but
I can't sacrifice speed.

I'm using Sun Workshop 6. A very basic test shows processing with
std::string can be 3 times slower than using char*. Is there any
improvement in later versions?

Thanks,
Jorge Ortiz

First, speed is implementation-dependent -- both compiler and library.
Since I presume you don't want to change compilers, you may be able to
find a faster library implementation (e.g., STLPort, Dinkumware, etc.).
Also, you might try fiddling with the switches for your compiler. On
some compilers, if you don't specify an optimization level, the
compiler doesn't even inline functions, which is a speed killer for the
C++ standard library in general. For more on these sorts of concerns,
you may want to consult a newsgroup (or list or whatever) that is more
familiar with your particular compiler and library.

Second, we might question the validity of your test. If you didn't
factor in the extra (read: manual, error-prone, often tedious) work
you'll have to do with arrays to validate lengths, prevent buffer
overflows, allocate and deallocate, etc., then it might not be a fair
test. See this FAQ for some more thoughts on why standard containers
should be preferred over arrays:

http://www.parashift.com/c++-faq-lite/containers.html#faq-34.1

Finally, let me remind you to beware premature optimization. Guru
Sutter reminds us about the rules for optimizing
(http://www.gotw.ca/publications/mill09.htm):

"1. Don't optimize early. 2. Don't optimize until you know that it's
needed. 3. Even then, don't optimize until you know *what* [is] needed,
and *where*.

"By and large, programmers--that includes you and me--are notoriously
bad at guessing the actual space/time performance bottlenecks in their
own code. If you don't have performance profiles or other empirical
evidence to guide you, you can easily spend days optimizing something
that doesn't need optimizing and that won't measurably affect runtime
space or time performance. What's even worse, however, is that when you
don't understand what needs optimizing you may actually end up
pessimizing (degrading your program) by of saving a small cost while
unintentionally incurring a large cost. Once you've run performance
profiles and other tests, and you actually know that a particular
optimization will help you in your particular situation, then it's the
right time to optimize."

Cheers! --M
 
M

mlimber

jortizclaver said:
You,re right, no doubt. But let's think you're designing a framework
that will be used by many different profiled programmers (from newies
to ten years experienced). Using STD containers and strings put system
robustness far away from unexperienced hands.

First, please don't quote multiple authors in the same post without
identifying them. Respond to each post individually or at least give
proper attribution. Second, I have no idea what your last sentence
means. Please clarify.

In any case, if you want people to *use* this framework, it must work,
and as the saying goes, "It's far easier to make a correct program fast
than to make a fast program correct." In that regard standard
containers and strings are invaluable (see below).
Already done. I've compiled test programs with optimizacion flags.

At the risk of some redundancy, what about checking different
implementations of the standard library?
It's an interesting point of view but my comparison already includes
all this validation process. Probably test does not include exactly the
same funcionality for both implementations but it doesn't justify
performance difference.

Show us the code you used to gather your metrics. Then we can discuss
it more fully.
I agree containers are a lot better than arrays. My dilemma is between
std::string and char*. Map/vector performance is fine.

Ok, ok: std::string is not technically a standard container per se.
However, in this context, I would suggest that a comparison of
"1. Don't optimize early. 2. Don't optimize until you know that it's
needed. 3. Even then, don't optimize until you know *what* [is] needed,
and *where*.

I can't agree on that. Most of the times when you realize you have
performance problems is too late for going back. Sometimes you can't
predict how you'll system will work but at least you should start from
a advantageous point.

<Booming voice>: Who dares contradict Guru Sutter (and Uber-Gurus Knuth
and Hoare)?!

Sutter (with Alexandrescu) advises: "When writing libraries, it's
harder to predict what operations will end up being used in
performance-sensitive code. But even library authors run performance
tests against a broad range of client code before committing to
obfuscating optimizations." (_C++ Coding Standards_, p. 17).

And the same book argues that we should nearly always "[a]void
implementing array abstractions with C-style arrays, pointer
arithmetic, and memory management primitives. Using vector or string
not only makes your life easier, but also helps you write safer and
more scalable software.... Buffer overruns and security flaws are,
hands down, a front-running scourge of today's software.... Most of
these are caused by using bare C-level facilities--such as build-in
arrays, pointers and pointer aritmetic, and manual memory
management--as a substitute for higher-level concepts such as buffers,
vectors, or strings." (p. 152).

You can disagree with them all you want, but I would suggest than more
people trust their opinion than yours, and for good reason.

Cheers! --M
 
G

Gavin Deane

jortizclaver said:
I agree containers are a lot better than arrays. My dilemma is between
std::string and char*. Map/vector performance is fine.

What's the difference between "string vs char*" where you seem to
favour reinventing the wheel and "map/vector vs handcoded equivalent"
where you seem happy to use the wheel you've been given?
"1. Don't optimize early. 2. Don't optimize until you know that it's
needed. 3. Even then, don't optimize until you know *what* [is] needed,
and *where*.

I can't agree on that.

Then I expect you are in the minority, with much weight of advice
against you. For an appropriate definiton of "early" of course.
Most of the times when you realize you have
performance problems is too late for going back. Sometimes you can't
predict how you'll system will work but at least you should start from
a advantageous point.

Yes you should start from an advantageous point, but not the point you
are thinking of. It is advantageous to start your optimisation effort
from the point of having correct working code (perhaps with unit tests
around it if appropriate). Not only is it generally much easier to
target bottlenecks and speed up correct code then it is to fix the more
complex and buggy code that results from reinventing the wheel, but
also, on all those occasions where it turns out that your correct code
happens already to be fast/small enough, you have reached your
detination in the quickest and most painless way.

Gavin Deane
 
M

Markus Moll

Hi

C style strings is more faster that std::string, and almost things that
u can do with std::string u can do with C style strings

I can easily imagine cases where std::string outperforms C-style strings by
lengths, so I would be careful with statements like that. (Honestly, I
doubt that any good std::string implementation would be slower than the
C-style equivalent).

Besides, C-style strings make it quite easy to invoke undefined behavior.

E.g:

(C code using char*)
--- snip ---

#include <string.h>
#include <stdio.h>

int main()
{
char *s = "Hello world. This is an unnecessarily long string.";
unsigned i;
unsigned total = 0;
for(i=0; i!=1000000; ++i)
total += strlen(s);
printf("%d\n", total);
return 0;
}

--- snip ---

times
real 0m0.044s
user 0m0.038s
sys 0m0.000s

where

(C++ code using std::string)
--- snip ---

#include <string>
#include <cstdio>

int main()
{
std::string s = "Hello world. This is an unnecessarily long
string.";
unsigned total = 0;
for(unsigned i=0; i!=1000000; ++i)
total += s.length();
std::printf("%d\n", total);
}

--- snip ---

times

real 0m0.006s
user 0m0.004s
sys 0m0.002s

(both compiled with GCC 3.3.5 and -O2)

cheers
Markus
 
J

jortizclaver

You,re right, no doubt. But let's think you're designing a framework
First, please don't quote multiple authors in the same post without
identifying them. Respond to each post individually or at least give
proper attribution. Second, I have no idea what your last sentence
means. Please clarify.

Using char* implies managing memory. I don't trust in unexperienced
hands for that task. If I can hide memory management using std::string
I'd feel a lot better.
You can disagree with them all you want, but I would suggest than more
people trust their opinion than yours, and for good reason.

Yeah, I'm brave! Probably my "mistake" was to understand those words in
a different way.

I'll use a very simple example. Let's say you need to implement a
real-time system where speed is definitively the main issue and where
you need to run thousands of concurrent threads for accomplish really
complex tasks. Would you use Java? I wouldn't as I know C/C++
performance can be a lot better (of course, there are other important
details that can make me change that decission). How do you apply your
guru's teachings to this case? Would you call this premature
optimization?

That's what I meaned. It's important to choose carefully the components
of your architecture and the libraries and third party tools you'll use
are part of them.

Regards,
Jorge

P.S: BTW, man, do you have all those books you talk about in your mind?
I'm impressed:)
 
B

Ben Pope

jortizclaver said:
I agree containers are a lot better than arrays. My dilemma is between
std::string and char*. Map/vector performance is fine.

Then use std::vector<char> ;-)

Ben Pope
 
B

Ben Pope

Alex said:
jortizclaver said:
Hi,

I'm about to develop a new framework for my corporative applications
and my first decision point is what kind of strings to use: std::string
or classical C char*.
[snip]

Look at C/C++ Performance Tests:
http://alexvn.freeservers.com/s1/perfo/tests/pftests.htm

"Compilation : No optimization"

Is that how most people release code? I thought -02 would be a minimum
for most release code concerned about performance.

What have I missed?

Ben Pope
 
M

mlimber

jortizclaver said:
Using char* implies managing memory. I don't trust in unexperienced
hands for that task. If I can hide memory management using std::string
I'd feel a lot better.

Amen and amen. But you're making my case for me!
I'll use a very simple example. Let's say you need to implement a
real-time system where speed is definitively the main issue and where
you need to run thousands of concurrent threads for accomplish really
complex tasks. Would you use Java? I wouldn't as I know C/C++
performance can be a lot better (of course, there are other important
details that can make me change that decission). How do you apply your
guru's teachings to this case? Would you call this premature
optimization?

If speed is definitively the main issue, then yes you should use
arrays/pointers. If correctness, usability, marketability, ease of
maintenance, quality, etc. were major factors in your project, I would
suggest you should use std::string throughout. The Gurus' (plural)
advice on the matter would likely be that with such a complex program,
it will be nearly impossible to track down all the security flaws,
buffer overflows, etc. that will be incurred by even the most careful
and meticulous programmer. Depending on your knowledge of how the
framework will be used, you may be able to measure for true (rather
than speculative) bottlenecks and hand tune those parts of the code.

On the other hand, if you insist on doing things by hand, Java might be
the way to go since its garbage collection would help clean up after
your mistakes. ;-)
It's important to choose carefully the components
of your architecture and the libraries and third party tools you'll use
are part of them.

Agreed, which is why we recommend you use robust, standard tools that
are part of the language rather than building your software built on a
more fragile foundation.

Cheers! --M
 
M

mlimber

jortizclaver said:
Believe me, I'd like to be wrong :)

I've made many tests. One of them is a very basic test program someone
sent to another C++ newsgroup:
http://groups.google.com/group/comp...8?lnk=st&q=test_three&rnum=4#e7d1ffef4fd97668

A number of people noted flaws in that test program. See the rest of
the thread. Also, someone else recommended the use of a different
library implementation such as STLPort, whose allocators are supposed
to be superior. Now, I hate to repeat myself over and over, but you
have not responded to a similar suggestion in two of my posts above.
May I humbly recommend you try a different implementation?

BTW, the standard committee's "Technical Report on C++ Performance"
(http://www.research.att.com/~bs/performanceTR.pdf) has this caveat
about std::string for unchangeable strings: "The Standard class
std::string is not a lightweight component. Because it has a lot of
functionality, it comes with a certain amount of overhead.... In many
applications, strings are created, stored, and referenced, but never
changed. As an extension, or as an optimization, it might be useful to
create a lighter-weight, unchangeable string class."

Cheers! --M
 
I

Ian Collins

jortizclaver said:
I'm sure it works in that way with g++ implementation or other
libraries but Sun's doesn't seem to be very good.
You are using a very old, unsupported version of the compiler.

Try something newer.
 
D

Default User

mlimber said:
jortizclaver wrote:
If speed is definitively the main issue, then yes you should use
arrays/pointers. If correctness, usability, marketability, ease of
maintenance, quality, etc. were major factors in your project, I would
suggest you should use std::string throughout.

Often with real-time projects it isn't the speed of operations that's
the problem. The reason std::string and others can't be used (or have
to be used carefully) is that memory allocation during program run is
often forbidden. Consistency of program cycles is more important that
the speed. You have to know what the bounds of your program run cycle
is, you can't have it vary much.

You can use standard containers if they're sized at start-up and no
operations are performed that will cause the size to shift. That means
avoid many of the cool operators and such.




Brian
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top