StringBuffer/StringBuilder efficiency

L

Lew

Lew quoted or indirectly quoted someone who said :
Roedy said:
I think you will find it optimises further to:

String text = "Hello World";

In other words it just moves a reference into the literal pool to the
variable 'text'.

Are HotSpot optimizations allowed to break the semantics of the code like
that? The variable 'text' is not defined as a 'String'.

Or are you assuming that the snippet is immediately followed by a 'toString()'
call, and that the literal reference will be assigned to some other variable
that was not mentioned?
 
L

Lew

Roedy said:
You might notice that JSP uses neither a StringBuffer or
StringBuilder. It uses some sort of stream, presumably a
StringWriter.

Has anyone looked at the code or done some experiments to decide when
to use StringWriter vs StringBuilder?

StringWriter uses StringBuffer (not StringBuilder) internally.
 
R

Roedy Green

StringWriter uses StringBuffer (not StringBuilder) internally.

A better StringWriter implementation might store an arrayList of
char[]s, rather than doubling and copying. When you do the final
"toString" equivalent, it could build the string the correct length
copy in the pieces.
--
Roedy Green Canadian Mind Products
http://mindprod.com

"Don’t worry about people stealing an idea; if it’s original, you’ll have to shove it down their throats."
~ Howard Aiken (born: 1900-03-08 died: 1973-03-14 at age: 73)
 
R

Roedy Green

Are HotSpot optimizations allowed to break the semantics of the code like
that? The variable 'text' is not defined as a 'String'.


Oops. I was unclear on the original code.

final String greeting = "Happy" + " " + "birthday.";

will collapse to:

final String greeting = "Happy birthday.";

--
Roedy Green Canadian Mind Products
http://mindprod.com

"Don’t worry about people stealing an idea; if it’s original, you’ll have to shove it down their throats."
~ Howard Aiken (born: 1900-03-08 died: 1973-03-14 at age: 73)
 
L

Lew

Lew quoted or indirectly quoted someone who said :
Roedy said:
Oops. I was unclear on the original code.

final String greeting = "Happy" + " " + "birthday.";

will collapse to:

final String greeting = "Happy birthday.";

Certainly, as required by the Java Language Specification.
 
D

Dave Searles

Peter said:
StringWriter uses StringBuffer (not StringBuilder) internally.

A better StringWriter implementation might store an arrayList of
char[]s, rather than doubling and copying.

Do you mean the "ArrayList" class? How do you think that class is
implemented? The JDK documentation doesn't promise a particular
implementation, but in reality, it's basically the same thing
StringBuffer and StringBuilder do: allocate new storage, copy the
existing data over.

How would using ArrayList be "better"?

The idea is instead of periodically growing a char[] array, when a
char[] array fills it's added to an ArrayList of char[] arrays. That
will need growing much less frequently, and at the end a single
right-size char[] array is allocated to copy everything into.
 
R

Roedy Green

Do you mean the "ArrayList" class? How do you think that class is
implemented? The JDK documentation doesn't promise a particular
implementation, but in reality, it's basically the same thing StringBuffer
and StringBuilder do: allocate new storage, copy the existing data over.

I am thinking of an ArrayList<char[]>. As you run out of space to
append chars, you create another char[8096], and tack it on the end of
the ArrayList. If the ArrayList overflowed, the usual doubling the
size of the ArrayList and copy of the REFERENCES happens, not copying
the individual chars themselves, a relatively quick operation in
comparison with what a StringBuilder does allocating an new contiguous
char[] for all the chars, and copying over all the individual
characters. The other advantage of proposed scheme is it is easier to
allocate 10 x 8096 char buffers than one 80960 one. Further, with my
proposed scheme, with a fat stream, you don't waste large amounts of
RAM from doubling.

I would hope I could create the final string allocating no more than
one buffer the final size. The methods I would need might be not be
accessible.

My proposal is fairly simple. The problem would be persuading JSP to
use it if I wrote a StringWriter replacement class.


--
Roedy Green Canadian Mind Products
http://mindprod.com

"Don’t worry about people stealing an idea; if it’s original, you’ll have to shove it down their throats."
~ Howard Aiken (born: 1900-03-08 died: 1973-03-14 at age: 73)
 
R

Roedy Green

Okay, I understand your proposal now. I'm not sure it would be an
universal improvement; you simply trade some performance cost in one place
(buffer expansion) for some other performance cost in another place
(conversion to String). Which winds up being better would be highly usage
dependent. But yes, at least it would be different. :)

If the stream were < 8096 bytes long, they would be almost identical.

If the stream were 300,000 bytes, with StringBuilder you would start
with a 16 byte buffer, and double/copy 15 times ending up with a
buffer 524,288 bytes long. You end up copying 16 + 32 + ... + 262,144
+ 300,000 = 824,272 bytes copied.

With my proposed scheme you would allocate 38x8096 = 307,648 worth of
buffers. With my proposed scheme you end up copying 300,000 bytes.


// calc overhead of StringBuilder in char copies for 300,000
public class Overhead
{
private static final int size = 300000;
public static void main (String[] args )
{
int sum = 0;
for ( int inc = 16; inc <= size; inc *= 2 )
{
sum += inc ;
}
sum += size;
System.out.println( sum );
}
}

--
Roedy Green Canadian Mind Products
http://mindprod.com

"Don’t worry about people stealing an idea; if it’s original, you’ll have to shove it down their throats."
~ Howard Aiken (born: 1900-03-08 died: 1973-03-14 at age: 73)
 
D

Dave Searles

Roedy said:
If the stream were < 8096 bytes long, they would be almost identical.

If the stream were 300,000 bytes, with StringBuilder you would start
with a 16 byte buffer, and double/copy 15 times ending up with a
buffer 524,288 bytes long. You end up copying 16 + 32 + ... + 262,144
+ 300,000 = 824,272 bytes copied.

With my proposed scheme you would allocate 38x8096 = 307,648 worth of
buffers. With my proposed scheme you end up copying 300,000 bytes.

1,648,544 and 600,000 with 16-bit wide chars.

Add another 192 bytes (384 on 64-bit hardware) for two expansions of the
ArrayList.
 
T

Tom Anderson

Do you mean the "ArrayList" class? How do you think that class is
implemented? The JDK documentation doesn't promise a particular
implementation, but in reality, it's basically the same thing
StringBuffer and StringBuilder do: allocate new storage, copy the
existing data over.

I am thinking of an ArrayList<char[]>. As you run out of space to
append chars, you create another char[8096], and tack it on the end of
the ArrayList.

Ah, i thought you meant that the ArrayList would hold all the individual
things appended. So doing:

RoedyBuffer buf = new RoedyBuffer();
buf.append("Hello ");
buf.append(someExpression);
buf.append(" world!");

Would leave you with an ArrayList like:

["Hello ", someExpression.toString(), " world!"]

Building this list would involve no copying of characters at all, beyond
that involved in the toString calls. The only copying done would be at the
end, where the number of characters copied would be exactly the length of
the final string. It doesn't get any better than that!

The list would of course be bigger than in your scheme, but i suspect
there would be a net saving. Plus, it avoids wasting space when the final
string is small.

LinkedList might be better than ArrayList, as it doesn't do any copying,
although it uses about five times more memory per element than the
ArrayList. You could use a BlockList (which you'd have to write - a
linked-list of fixed-size blocks), which would give you memory efficiency
and no copying, with (1) appending and O(n) iteration.

Oh, and you could make it a List<CharSequence>, and only toString things
if they aren't instanceof CharSequence. The main effect of that is that
you can append things like StringBuffers and CharBuffers directly, without
having to toString them.

It would be pretty straightforward for us to all write implementations of
our ideas behind a common interface, and also wrap StringBuilder with it,
then do some micro benchmarking. Anyone interested? I might have a crack
at this on sunday if i'm bored!

tom

--
It's a surprising finding, but that's science all over: the results
are often counterintuitive. And that's exactly why you do scientific
research, to check your assumptions. Otherwise it wouldn't be called
"science", it would be called "assuming", or "guessing", or "making it
up as you go along". -- Ben Goldacre
 
R

Roedy Green

Except for the additional memory overhead of allocating an overly-large
buffer in the List<char[]> implementation.

The main use of StringWriters is in JSP. Your final stream is the
size of a typical generated JSP page in chars. You could implement
this with any size buffer. I picked 8096 because that is pretty much
the minimum size of a JSP web page.

For smaller Strings or when you have an accurate idea of the final
size, you would use a StringBuilder.

Do StringWriters have to be thread safe?


--
Roedy Green Canadian Mind Products
http://mindprod.com

"Civilisation advances by extending the number of important operations which we can perform without thinking about them."
~ Alfred North Whitehead (born: 1861-02-15 died: 1947-12-30 at age: 86)
 
L

Lew

Roedy said:
Do StringWriters have to be thread safe?

I don't know about have to be, but they are.

The fact that Writer uses a lock in all its implemented methods is a rather
telling hint.
 
R

Roedy Green

RoedyBuffer buf = new RoedyBuffer();
buf.append("Hello ");
buf.append(someExpression);
buf.append(" world!");

Hmm. That sounds a bit like LazyString.

See http://mindprod.com/project/lazystring.html

It would be interesting to see these various ideas in a bakeoff to
find out which algorithms work best in which circumstances. One factor
is, how accurate a number do you have for the final size of the string
ahead of time. Another is how long is the final string.

This is worthwhile sorting out. My static macros, which are a poor
man's JSP, speeded up 10% when I optimised StringBuilder sizes. Not
only do you save copies, you save littering the heap with discarded
buffers which will trigger more frequent GC passes.

Another StringBuilder optimisation that would be useful would be if
the estimate of StringBuilder size were accurate to say with 10 bytes,
then the buffer could be converted to a String without copying it.

See http://mindprod.com/jgloss/concatenation.html#ALTERNATES for a
number of alternative ways to concatenate.
--
Roedy Green Canadian Mind Products
http://mindprod.com

"Civilisation advances by extending the number of important operations which we can perform without thinking about them."
~ Alfred North Whitehead (born: 1861-02-15 died: 1947-12-30 at age: 86)
 
T

Tom Anderson

Tom Anderson wrote:
...
...

Doing that would need to be very carefully documented, and lead to
inconsistent behavior. Appending something that is not a CharSequence
would capture its toString() result as of the time of the append.
Appending a CharSequence would capture its toString() as of the time of
extraction of the concatenated string.

Ah yes, very good point!

If this was python, i'd think about modifying appended StringBuilders at
runtime to intercept any attempts to modify them, to run a hook which
would copy the original contents to a string for appending purposes before
removing the interception code and carrying on with the append, but
fortunately, such madness is not possible in java.

tom
 
T

Tom Anderson

Roedy Green wrote:
...
Another StringBuilder optimisation that would be useful would be if
the estimate of StringBuilder size were accurate to say with 10 bytes,
then the buffer could be converted to a String without copying it.
...

This would take some care, given the mutable nature of StringBuilder,
but would be possible, given the fact that String and StringBuilder are
in the same package.

One possible design:

1. Add to String a package-only constructor that sets the char[],
length, and offset fields directly.

2. Modify StringBuilder:

Add a boolean field sharedArray, initially false.

In toString, if size is accurate enough, use the new String constructor
and set sharedArray=true.

In state changing method calls that use the old value, such as append,
with sharedArray true, replace the reference to the shared char[] with a
reference to a copy of it and set sharedArray=false.

Actually, in the specific case of append, you can carry on using the
shared array - if the String only cares about characters 1-10 of the
buffer, then appending more stuff in positions 11-20 is fine.

It's delete, insert, replace, reverse, and setCharAt that you need to
worry about. And perhaps setLength.

Perhaps instead of having a boolean sharedArray, you could have an int
sharedCharLimit, being an index into the array beyond which there are no
shared characters, initially zero. Modifications to the buffer after
sharedCharLimit wouldn't trigger copying of the buffer.
Of course, this is mainly precautionary. Often, a toString() call is the
last event in the life of a StringBuilder before unreachability.

Almost always, i suspect. Certainly a common case that looks worth
optimising for.

tom
 
T

Tom Anderson

It would be pretty straightforward for us to all write implementations
of our ideas behind a common interface, and also wrap StringBuilder with
it, then do some micro benchmarking. Anyone interested? I might have a
crack at this on sunday if i'm bored!

Eh, coding was more appealing than cleaning the kitchen.

http://urchin.earth.li/~twic/Code/StringAccumulator/

Now, who wants to write a benchmark?

I've implemented Roedy's idea using StringBuilders rather than char[]s
(which still have a fixed size), as it means i don't have to write all the
bookkeeping code myself. I doubt there's a significant performance impact.
Anyone who disagrees is free to write a different implementation!

tom
 
W

Wojtek

bugbear wrote :
I disagree (although not strongly). Maintaining code
is hard enough that I will ALWAYS err on the side
of readability and maintainability.

I learned a LONG time ago not to write "cute" code. However I do keep
performance in mind as I type. Things like balancing object storage vs
retrieval performance to a "get" call which returns a derived value.
I save performance oriented "stuff" for where it's needed,
as indicated by performance testing/profiling.

If you DON'T find maintaining code hard - you're a genius,
and don't need any advice from me.

Ha!
 
W

Wojtek

Leif Roar Moldskred wrote :
It probably would be a good idea for Sun to add append methods with
variable length argument lists to the StringBuilder, though. That
would allow code like this:

StringBuilder strBld = new StringBuilder( );
strBld.append( "Timestamp: ", data.getTimestamp( ), " " );
strBld.append( "Temprature: ", data.getTemprature( ), " ")
for( int i 0 ; i < 10; ++i ) {
strBld.append( "T[", i, "]: ", data.getT( i ), " " );
}

This I would like. Maybe it could be injected into the next language
spec?
 
L

Lew

Leif Roar Moldskred wrote :
It probably would be a good idea for Sun to add append methods with
variable length argument lists to the StringBuilder, though. That
would allow code like this:
  StringBuilder strBld = new StringBuilder( );
  strBld.append( "Timestamp: ", data.getTimestamp( ), " " );
  strBld.append( "Temprature: ", data.getTemprature( ), " ")
  for( int i 0 ; i < 10; ++i ) {
    strBld.append( "T[", i, "]: ", data.getT( i ), " " );
  }
This I would like. Maybe it could be injected into the next language
spec?

Why would it require a language change?

All you have to do is add an overload to the class.

They'd have to resolve the conflict with the current overload for such
a call, 'append(Object)'. Perhaps that's an acceptable loss of
compatibility.
 
M

markspace

Wojtek said:
Leif Roar Moldskred wrote :
This I would like. Maybe it could be injected into the next language spec?


Project Coin was a Sun sponsored effort to request small changes to the
Java language from the Java community. It was pretty interesting to
read through their discussions. One thing they did mention is that
their normal process was to scan through the bug-base and look for bugs
marked "RFE" - Request For Enhancement. That is, improvements, not bugs.

If you have the time, you should go to their bug database and add an RFE
for a varargs on StringBuilder. It's a good idea.

<http://blogs.sun.com/darcy/entry/project_coin>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top