* SG:
In Java it is the case (just tested on Sun's JVM/Compiler 1.6.0_10) by
wich I mean
a = a + ".";
in a loop is horribly slow. You are supposed to use a
java.lang.StringBuilder for this.
I'm sorry but that is a misunderstanding and /very/ different from the example I
commented on.
It is however somewhat related, and using infix '+' is a very common newbie Bad
Coding Style (also reportedly employed by MS!
), so I'll comment on it.
a = a + ".";
is of necessity slow unless you have a supersmart optimizer, because it
constructs a separate new string instance. It therefore generally leads to
O(n^2) time when it's repeated in a loop. That is, the total time is
proportional to the /square/ of the final number of characters.
a += ".";
is on the other hand the potentially fastest way to do concatenation, and with
any reasonable implementation of '+=' yields O(n) time.
*However*, with Java and C# it seems (as you found, and as also I found now when
repeating your testing) that '+=' is really ineffecient.
I.e., at least with the used compilers it seems that those languages have really
lousy '+=' operator implementations.
The difference between infix '+' and the '+=' update operator is what the 'a'
object can do, what knowledge it has of what's going on. With '+=' it only has
to make itself single reference (it it isn't already) and then append in its own
buffer. Which buffer, for a reasonable implementation, it then doubles in size
as necessary to achieve amortized O(n) behavior.
I'm not familiar with C#/.NET and it looks like there might be
compiler/VM magic involved w.r.t. the string class. So, yes, I can
imagine that in the .NET world string's "+=" isn't as bad as Java's
version.
Have you really tested Java's version of '+=' or have you tested infix '+'?
Anyways, my own testing, shown below, yields that unexpectedly bad result.
But the compiler and VM I used are old.
Still, what's the purpose of StringBuilder in C# if it wasn't
for speeding up string assembly.
I don't know what you mean by "conversion call" in this concext
Preparatory conversion from original string to string buffer, final conversion
from string buffer to string.
but ... Yes, I can imagine an implementation where string objects
share character buffers and only manage their own start/end pointers.
Yes, just about.
So, if there's some yet unused and big enough room left in that buffer
there's no need to allocate a new buffer for concatenation. But you
might need to do some locking/synchronization.
Only for the reference counting.
In Java there is also a String member function "concat" which could do
what I described above. Just for kicks and giggles I wrote a simple
test in Java:
1: String a = "";
for (int k=0; k<0x10000; ++k) {
a = a + ".";
}
2: String a = "";
for (int k=0; k<0x10000; ++k) {
a = a.concat(".");
}
3: StringBuilder sb = new StringBuilder();
for (int k=0; k<0x10000; ++k) {
sb.append(".");
}
String a = sb.toString();
Test | Runtime
-----+---------------
1 | 10.369 seconds
2 | 2.624 seconds
3 | 0.076 seconds
#3 looks like C# not Java, and you're not testing '+='.
But I'm surprised at the results, which I've duplicated for both Java and C#,
and in particular for '+='.
Even though my compilers are old, it seems that both Java and C# have really
inefficient '+=' implementations.
As far as I know java.lang.StringBuilder doesn't do any kind of
locking/synchronization which is probably one reason it is so fast.
Alf, care to provide some C# test results just for the heck of it?
OK, but note that these are OLD compilers and VMs.
<versions>
C:\temp> java -version
java version "1.6.0_05"
Java(TM) SE Runtime Environment (build 1.6.0_05-b13)
Java HotSpot(TM) Client VM (build 10.0-b19, mixed mode, sharing)
C:\temp> csc
Microsoft (R) Visual C# 2005 Compiler version 8.00.50727.1433
for Microsoft (R) Windows (R) 2005 Framework version 2.0.50727
Copyright (C) Microsoft Corporation 2001-2005. All rights reserved.
fatal error CS2008: No inputs specified
C:\temp> g++ --version
g++ (GCC) 3.4.5 (mingw-vista special r3)
Copyright (C) 2004 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
C:\temp>
</version>
<code language="Java">
import java.util.Date;
import java.text.DecimalFormat;
class Time
{
private long myMilliSecs;
public Time()
{
myMilliSecs = new Date().getTime();
}
public long msecs() { return myMilliSecs; }
public double secs() { return myMilliSecs/1000.0; }
}
class ConcatTest
{
static final long n = 0x8000;
static void doInfixAdd()
{
String a = "";
for( int i = 1; i <= n; ++i )
{
a = a + ".";
}
}
static void doConcatAdd()
{
String a = "";
for( int i = 1; i <= n; ++i )
{
a = a.concat( "." );
}
}
static void doBuilder()
{
StringBuffer sb = new StringBuffer();
for( int i = 1; i <= n; ++i )
{
sb.append(".");
}
String a = sb.toString();
}
static void doUpdateAdd()
{
String a = "";
for( int i = 1; i <= n; ++i )
{
a += ".";
}
}
static void printResult( String s, Time start, Time end )
{
String num = new DecimalFormat( "00.00" ).format( end.secs() -
start.secs() );
System.out.println( s + ": " + num );
}
public static void main( String[] args )
{
Time startTime;
startTime = new Time();
doInfixAdd();
printResult( "Infix '+' ", startTime, new Time() );
startTime = new Time();
doConcatAdd();
printResult( "concat ", startTime, new Time() );
startTime = new Time();
doBuilder();
printResult( "builder ", startTime, new Time() );
startTime = new Time();
doUpdateAdd();
printResult( "'+=' operator", startTime, new Time() );
}
}
</code>
<results language="Java">
C:\temp> java ConcatTest
Infix '+' : 12,33
concat : 02,84
builder : 00,02
'+=' operator: 11,83
C:\temp> java ConcatTest
Infix '+' : 11,45
concat : 02,88
builder : 00,00
'+=' operator: 11,73
C:\temp> java ConcatTest
Infix '+' : 11,40
concat : 02,85
builder : 00,00
'+=' operator: 11,42
C:\temp>
</results>
<code language="C#">
using DateTime = System.DateTime;
using TimeSpan = System.TimeSpan;
using StringBuilder = System.Text.StringBuilder;
class ConcatTest
{
private const long n = 0x8000;
static void doInfixAdd()
{
string a = "";
for( int i = 1; i <= n; ++i )
{
a = a + ".";
}
}
static void doConcatAdd()
{
string a = "";
for( int i = 1; i <= n; ++i )
{
a = string.Concat( a, "." );
}
}
static void doBuilder()
{
StringBuilder sb = new StringBuilder();
for( int i = 1; i <= n; ++i )
{
sb.Append( "." );
}
string a = sb.ToString();
}
static void doUpdateAdd()
{
string a = "";
for( int i = 1; i <= n; ++i )
{
a += ".";
}
}
static void printResult( string s, DateTime start, DateTime end )
{
string num = string.Format( "{0:00.00}", (end -
start).Milliseconds/1000.0 );
System.Console.WriteLine( s + ": " + num );
}
static void Main()
{
DateTime startTime;
startTime = DateTime.Now;
doInfixAdd();
printResult( "Infix '+' ", startTime, DateTime.Now );
startTime = DateTime.Now;
doConcatAdd();
printResult( "concat ", startTime, DateTime.Now );
startTime = DateTime.Now;
doBuilder();
printResult( "builder ", startTime, DateTime.Now );
startTime = DateTime.Now;
doUpdateAdd();
printResult( "'+=' operator", startTime, DateTime.Now );
}
}
</code>
<results language="C#">
C:\temp> concat
Infix '+' : 00,46
concat : 00,44
builder : 00,09
'+=' operator: 00,38
C:\temp> concat
Infix '+' : 00,43
concat : 00,40
builder : 00,00
'+=' operator: 00,74
C:\temp> concat
Infix '+' : 00,40
concat : 00,38
builder : 00,00
'+=' operator: 00,49
C:\temp>
</concat>
<code language="C++">
#include <iostream>
#include <iomanip> // std::fixed
#include <string>
#include <time.h> // clock
using namespace std;
long const n = 0x4000;
class TimeSpan
{
private:
clock_t myTicks;
public:
TimeSpan( clock_t ticks ): myTicks( ticks ) {}
double secs() const { return double(myTicks)/CLOCKS_PER_SEC; }
};
class Time
{
private:
clock_t myTicks;
public:
Time(): myTicks( clock() ) {}
TimeSpan operator-( Time const& rhs ) const { return myTicks - rhs.myTicks; }
};
void doInfixAdd()
{
string a = "";
for( int i = 1; i <= n; ++i )
{
a = a + ".";
}
}
void doConcatAdd()
{
string a = "";
for( int i = 1; i <= n; ++i )
{
a = a.append( "." );
}
}
// Not applicable.
//void doBuilder() {}
void doUpdateAdd()
{
string a = "";
for( int i = 1; i <= n; ++i )
{
a += ".";
}
}
void printResult( string const& s, Time const& start, Time const& end )
{
cout << fixed << setprecision( 4 );
cout << s << ": " << (end - start).secs() << endl;
}
int main()
{
Time startTime;
startTime = Time();
doInfixAdd();
printResult( "Infix '+' ", startTime, Time() );
startTime = Time();
doConcatAdd();
printResult( "concat ", startTime, Time() );
startTime = Time();
doUpdateAdd();
printResult( "'+=' operator", startTime, Time() );
}
</code>
<results language="C++">
C:\temp> a
Infix '+' : 0.0940
concat : 0.0000
'+=' operator: 0.0000
C:\temp> a
Infix '+' : 0.1560
concat : 0.0000
'+=' operator: 0.0000
C:\temp> a
Infix '+' : 0.0930
concat : 0.0000
'+=' operator: 0.0000
C:\temp>
</results>
So, in summary, at least for C++ programming it is, as I noted earlier, a really
good idea to use '+=' (or equivalently 'append') instead of infix '+', and a
really bad idea to do the opposite. And it is surprising that the Java and C#
compilers don't implement '+=' efficiently; it's easy to do. I thought they did.
Cheers & hth.,
- Alf