M
Marcin Kalicinski
I have recently done a simple measurement of std::string performance in C++,
and I'm shocked how bad it is. C++ is first and foremost meant to be a
performance language, with near-zero overhead. This holds quite well with
regards to the language, but overheads of the std library are absolutely
abysmal. And string is one of the most often used datatypes.
The test program creates 10 million strings and puts them in a container
(see source code #1, #2 below).
Results:
C++ (VS2005, -O2): 7.265s
C# (VS2005): 0.56s
C# string is 13 times faster than C++ std::string. Horrible.
There's a crucial difference between C++ and C# strings - C# strings are
immutable. Does this make a difference? To test it, I created the simplest
and fastest possible C++ string implementation (see code #3 below). It is
absolutely minimalistic (only handles construction). It does not make any
memory allocations, all the memory is obtained from a static pool by simply
incrementing a pointer. There's no deallocation scheme of any sort
implemented - this would certainly slow things down additionally. I don't
think it is possible to create a faster implementation of string in C++ than
#3. _Any_ real-life implementation will be slower.
The results when std::string was replaced with my String?
C++ (VS2005, -O2): 0.64s - still slower than 0.56s of C#
C# string is still 14% faster than the fastest theoretically imaginable C++
string. What?! How did they do it?
One reason may be that C++ string is constructed from a const char *
pointer. This is a fatal C/C++ flaw - the size of such string is unknown,
even if it's given at compile time as literal such as "foo". To find out the
size, the string must be scanned at runtime for 0 character. C# does not
have this limitation. Compiler can place the string directly where it's
needed, and also store its length in a member variable. All during compile
time.
Any comments?
Test programs for reference:
// #1: C++
std::vector<std::string> v;
int main()
{
clock_t t1 = clock();
for (int i = 0; i < 10000000; ++i)
v.push_back("poo");
clock_t t2 = clock();
cout << double(t2 - t1) / CLOCKS_PER_SEC;
}
// #2: C#
class Test
{
public static void Main()
{
DateTime t1 = System.DateTime.Now;
List<string> v = new List<string>();
for (int i = 0; i < 10000000; ++i)
v.Add("foo");
DateTime t2 = System.DateTime.Now;
Console.WriteLine(t2 - t1);
}
}
// #3: std::string replacement used for testing: a fastest theoretically
possible C++ string implementation using an "infinite" memory pool (with no
deallocation)
struct String
{
String(const char *s = "")
{
str = ptr;
do // Copy string to the pool incrementing pool pointer
{
*ptr = *s;
++ptr;
++s;
} while (*s);
}
char *str; // Pointer to this string
data in the pool
static char buffer[100000000]; // Global pool
static char *ptr; // Global free memory
pointer in the pool
};
and I'm shocked how bad it is. C++ is first and foremost meant to be a
performance language, with near-zero overhead. This holds quite well with
regards to the language, but overheads of the std library are absolutely
abysmal. And string is one of the most often used datatypes.
The test program creates 10 million strings and puts them in a container
(see source code #1, #2 below).
Results:
C++ (VS2005, -O2): 7.265s
C# (VS2005): 0.56s
C# string is 13 times faster than C++ std::string. Horrible.
There's a crucial difference between C++ and C# strings - C# strings are
immutable. Does this make a difference? To test it, I created the simplest
and fastest possible C++ string implementation (see code #3 below). It is
absolutely minimalistic (only handles construction). It does not make any
memory allocations, all the memory is obtained from a static pool by simply
incrementing a pointer. There's no deallocation scheme of any sort
implemented - this would certainly slow things down additionally. I don't
think it is possible to create a faster implementation of string in C++ than
#3. _Any_ real-life implementation will be slower.
The results when std::string was replaced with my String?
C++ (VS2005, -O2): 0.64s - still slower than 0.56s of C#
C# string is still 14% faster than the fastest theoretically imaginable C++
string. What?! How did they do it?
One reason may be that C++ string is constructed from a const char *
pointer. This is a fatal C/C++ flaw - the size of such string is unknown,
even if it's given at compile time as literal such as "foo". To find out the
size, the string must be scanned at runtime for 0 character. C# does not
have this limitation. Compiler can place the string directly where it's
needed, and also store its length in a member variable. All during compile
time.
Any comments?
Test programs for reference:
// #1: C++
std::vector<std::string> v;
int main()
{
clock_t t1 = clock();
for (int i = 0; i < 10000000; ++i)
v.push_back("poo");
clock_t t2 = clock();
cout << double(t2 - t1) / CLOCKS_PER_SEC;
}
// #2: C#
class Test
{
public static void Main()
{
DateTime t1 = System.DateTime.Now;
List<string> v = new List<string>();
for (int i = 0; i < 10000000; ++i)
v.Add("foo");
DateTime t2 = System.DateTime.Now;
Console.WriteLine(t2 - t1);
}
}
// #3: std::string replacement used for testing: a fastest theoretically
possible C++ string implementation using an "infinite" memory pool (with no
deallocation)
struct String
{
String(const char *s = "")
{
str = ptr;
do // Copy string to the pool incrementing pool pointer
{
*ptr = *s;
++ptr;
++s;
} while (*s);
}
char *str; // Pointer to this string
data in the pool
static char buffer[100000000]; // Global pool
static char *ptr; // Global free memory
pointer in the pool
};