StringBuffer/StringBuilder efficiency

W

Wojtek

markspace wrote :
Project Coin was a Sun sponsored effort to request small changes to the Java
language from the Java community. It was pretty interesting to read through
their discussions. One thing they did mention is that their normal process
was to scan through the bug-base and look for bugs marked "RFE" - Request For
Enhancement. That is, improvements, not bugs.

If you have the time, you should go to their bug database and add an RFE for
a varargs on StringBuilder. It's a good idea.

<http://blogs.sun.com/darcy/entry/project_coin>

I would but I would need to set up yet another account in yet another
web site. I lose track of where I have accounts.

Could some nice person maybe add a comment and point to this thread?

Thanks
 
R

Roedy Green

Almost always, i suspect. Certainly a common case that looks worth
optimising for.

In the ordinary case, the only methods of StringBuilder you use are
append and toString. Consider that server side code is mostly just
appending strings to put together the response. I think you might hit
20% improvement over what is typically done now with something that
avoids allocating RAM needlessly and copying needlessly. Multiply
that savings by all the Java servers out in the world, and you have
something worth doing.

Reducing CPU cycles has an additional payoff since these servers
typically use less AC power when not doing useful work. The cost of
power is becoming increasingly important in the overall economics of
servers.
--
Roedy Green Canadian Mind Products
http://mindprod.com

On two occasions I have been asked [by members of Parliament], "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
~ Charles Babbage (born: 1791-12-26 died: 1871-10-18 at age: 79)
 
L

Lew

markspace wrote :
I would but I would need to set up yet another account in yet another
web site. I lose track of where I have accounts.

Could some nice person maybe add a comment and point to this thread?

Oh, so you care passionately about the issue until you actually have to do
something, then it's someone else's job, eh?

I'm glad you're not asking for another method overload to
StringBuilder#append(), and I could go another hundred years without it.
 
R

Roedy Green

I've implemented Roedy's idea using StringBuilders rather than char[]s
(which still have a fixed size), as it means i don't have to write all the
bookkeeping code myself. I doubt there's a significant performance impact.
Anyone who disagrees is free to write a different implementation!

Here is one implementation of a Fast Concatenate to replace
StringBuilder. It needs two speedups, I don't know are possible
without some sort of hack. It would be nice if String had a String...
constructor to avoid the creation of two buffers and an extra copy.

/*
* @(#)FastCat.java
*
* Summary: Stripped down, fast replacement for StringBuilder and
StringWriter to build concatentated Strings.
*
* Copyright: (c) 2009 Roedy Green, Canadian Mind Products,
http://mindprod.com
*
* Licence: This software may be copied and used freely for any
purpose but military.
* http://mindprod.com/contact/nonmil.html
*
* Requires: JDK 1.6+
*
* Created with: IntelliJ IDEA IDE.
*
* Version History:
* 1.0 2009-09-29 - initial release

*/
package com.mindprod.fastcat;

import java.io.File;
import java.util.ArrayList;

/**
* Stripped down, fast replacement for StringBuilder and StringWriter
to build concatentated Strings.
*
* @author Roedy Green, Canadian Mind Products
* 1.0 2009-09-29 - initial release
* @since 2009-09-29
*/

public class FastCat
{
// ------------------------------ FIELDS
------------------------------

/**
* private collection of pieces we will later glue together once
we know the size of the final String
*/
final ArrayList<String> pieces;

private int length = 0;

// -------------------------- PUBLIC INSTANCE METHODS
--------------------------

/**
* no-arg constructor
*/
public FastCat()
{
pieces = new ArrayList<String>( 20 );
}

/**
* constructor
*
* @param estNumberOfPieces estimated number of chunks you will
concatenate. It is better to estimate slightly high than slightly
low.
*/
public FastCat( int estNumberOfPieces )
{
pieces = new ArrayList<String>( estNumberOfPieces );
}

/**
* append String
*
* @param s String to append
*/
public void append( String s )
{
length += s.length();
pieces.add( s );
}

/**
* append arbitrary number of strings
*
* @param ss comma-separated list of Strings to append
*/
public void append( String... ss )
{
for ( String s : ss )
{
length += s.length();
pieces.add( s );
}
}

/**
* append Object
*
* @param o Object to append. toString is called to acquire a
String to concatenate.
*/
public void append( Object o )
{
final String s = o.toString();
length += s.length();
pieces.add( s );
}

/**
* append arbitrary number of Objects
*
* @param ,, comma-separated list of Objects to to append.
toString is called to acquire a String to concatenate.
*/
public void append( Object... oo )
{
for ( Object o : oo )
{
final String s = o.toString();
length += s.length();
pieces.add( s );
}
}

/**
* empty the concatenated String being created
*/
public void clear()
{
pieces.clear();
}

/**
* Get the concatenation of all the strings appended so far
*/
public String toString()
{
int offset = 0;
final char[] buffer = new char[length];
for ( String p : pieces )
{
for ( int i = 0; i < p.length(); i++ )
{
// copy over that once piece, char by char. Can this
be improved?
buffer[ offset++ ] = p.charAt( i );
}
}
return new String( buffer ); // Would like some way to just
hand buffer over to String to avoid copy.
}

// --------------------------- main() method
---------------------------

/**
* test harness
*
* @param args not used.
*/
public static void main( String[] args )
{
FastCat f = new FastCat( 7 );
f.append( "Hello" );
f.append( " " );
f.append( "World. " );
f.append( new File( "temp.txt" ) );
f.append( " " ,"abc", "def" );
System.out.println( f.toString() );
// prints Hello World. temp.txt abcdef
}
}
--
Roedy Green Canadian Mind Products
http://mindprod.com

On two occasions I have been asked [by members of Parliament], "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
~ Charles Babbage (born: 1791-12-26 died: 1871-10-18 at age: 79)
 
R

Roedy Green

/**
* Get the concatenation of all the strings appended so far
*/
public String toString()
{
int offset = 0;
final char[] buffer = new char[length];
for ( String p : pieces )
{
for ( int i = 0; i < p.length(); i++ )
{
// copy over that once piece, char by char. Can this
be improved?
buffer[ offset++ ] = p.charAt( i );
}
}
return new String( buffer ); // Would like some way to just
hand buffer over to String to avoid copy.
}

Here is a slight optimisation that avoids the charAt loop.


/**
* Get the concatenation of all the strings appended so far
*/
public String toString()
{
int offset = 0;
final char[] buffer = new char[length];
for ( String p : pieces )
{
// copy chars from String into our accumulating char[]
buffer.
p.getChars( 0, p.length(), buffer, offset );
offset += p.length();
}
return new String( buffer ); // Would like some way to just
hand buffer over to String to avoid copy.
}
--
Roedy Green Canadian Mind Products
http://mindprod.com

On two occasions I have been asked [by members of Parliament], "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
~ Charles Babbage (born: 1791-12-26 died: 1871-10-18 at age: 79)
 
W

Wojtek

Lew wrote :
markspace wrote :


Oh, so you care passionately about the issue until you actually have to do
something, then it's someone else's job, eh?

"care passionately about the issue"? Well I am interested enough to be
curious about the issue, but I am hardly going to go into a panic if it
does not get implemented.
then it's someone else's job

No, that is not what I said. I have scattered personas on various sites
which I actually USE. Setting up a persona on a site which I am
unlikely to ever visit again...

OTOH, for instance, I do have a persona on the Eclipse bug system where
I have reported various issues over the years.

We have this system called news groups (NNTP). It works very well.
There are many established readers/servers for it.

And yet we have scattered islands of knowledge, each one wanting
personal information from me. If Sun had a newsgroup, I would certainly
have made a posting there.
 
D

Dave Searles

Roedy said:
for ( int i = 0; i < p.length(); i++ )
{
// copy over that once piece, char by char. Can this
be improved?
buffer[ offset++ ] = p.charAt( i );
}
}

System.arraycopy perhaps?
 
R

Roedy Green

for ( int i = 0; i < p.length(); i++ )
{
// copy over that once piece, char by char. Can this
be improved?
buffer[ offset++ ] = p.charAt( i );
}
}

System.arraycopy perhaps?

The problem is you need to get at the char[] inside the String to use
System.arrayCopy. Happily String has a getChars method that does just
that. I used it in my improved version of that method.

I am still chewing on how to avoid the copy involved in new String(
char[] ). I vaguely recall a long time ago seeing some Sun code that
did that. I looked over the String methods, but did not see anything.

If Sun implemented a new String( String... ) method, then the double
copy and double buffer allocation could be avoided, providing a much
more efficient way to concatenate Strings. It is a very simple
method.

--
Roedy Green Canadian Mind Products
http://mindprod.com

On two occasions I have been asked [by members of Parliament], "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
~ Charles Babbage (born: 1791-12-26 died: 1871-10-18 at age: 79)
 
W

Wojtek

Roedy Green wrote :
final ArrayList<String> pieces;

Roedy, you already have this built.

See if making this change speeds it up:
final ArrayList<StringBuilder> pieces;

Then for the toString, create a new StringBuilder but with the length
set to the total of all the other StringBuilder lengths.

It might be that there are private methods/calls which do a fast char
copy if a StringBuilder is appended.
 
R

Roedy Green

It might be that there are private methods/calls which do a fast char
copy if a StringBuilder is appended.

I have that already, with String.getChars that calls System.arrayCopy.

The tricky part is the new String( char[] ).

Ideally I would like to create a new String that just pointed to the
char[], and dropped all other reference to it rather than allocating
yet another buffer and copying.

It is seemingly impossible because String can't allow anyone else to
have a pointer to the char[], or it would break the immutability of
String.

That implies to me that String must do the joining so that it
allocates the private buffer of precisely the right size needed.

I think my next step is to write the code for a new String constructor
new String( String... )
and see if I can sell Sun on including it, and persuading Javac to use
that method to implement the + operator.


--
Roedy Green Canadian Mind Products
http://mindprod.com

There is one brain organ that is optimised for understanding and articulating logical processes, and that is the outer layer of the brain, called the cerebral cortex. Unlike the rest of the brain, this relatively recent evolutionary development is rather flat, only about 0.32 cm (0.12 in) thick, and includes a mere 6 million neurons. This elaborately folded organ provides us with what little competence we do possess for understanding what we do and who we do it.
~ Ray Kurzweil (born: 1948-02-12 age: 61)
 
R

Roedy Green

I think my next step is to write the code for a new String constructor
new String( String... )
and see if I can sell Sun on including it, and persuading Javac to use
that method to implement the + operator.


/*
* @(#)DummyString.java
*
* Summary: Demonstrate how a new efficient concatenating String
constructor would work.
*
* Copyright: (c) 2009 Roedy Green, Canadian Mind Products,
http://mindprod.com
*
* Licence: This software may be copied and used freely for any
purpose but military.
* http://mindprod.com/contact/nonmil.html
*
* Requires: JDK 1.6+
*
* Created with: IntelliJ IDEA IDE.
*
* Version History:
* 1.0 2009-09-29 - initial release

*/
package com.mindprod.fastcat;

/**
* Demonstrate how a new efficient concatenating String constructor
would work.
* Constructor permits more efficient concatenation than StringBuffer
or StringBuilder.
* It uses no intermediate buffers and no intermediate character
copying.
* It might be used to more efficiently implement the + concatenation
operator and StringWriter.
* Why bother? Webservers do almost nothing but concatenate Strings
and garbage collection.
*
* @author Roedy Green, Canadian Mind Products
* 1.0 2009-09-30 - initial release
* @since 2009-09-30
*/
public class DummyString
{
// ------------------------------ FIELDS
------------------------------

/**
* The value is used for character storage.
*/
private final char value[];

/**
* The count is the number of characters in the String.
*/
private final int count;

/**
* The offset is the first index of the storage that is used.
*/
private final int offset;

// -------------------------- PUBLIC INSTANCE METHODS
--------------------------

/**
* Initializes a newly created {@code String} object so that it
represents
* an empty character sequence. Note that use of this constructor
is
* unnecessary since Strings are immutable.
*/
public DummyString()
{
this.offset = 0;
this.count = 0;
this.value = new char[0];
}

/**
* concatenating String constructor
*
* @param pieces vararg of Strings to be concatenated to create
this new String.
*/
public DummyString( String... pieces )
{
offset = 0;
int joinedLength = 0;
/** find out how big a buffer we need to contain all the
joined Strings */
for ( String piece : pieces )
{
joinedLength += piece.length();
}

// allocate buffer for joined pieces. This becomes the new
String.
count = joinedLength;
value = new char[joinedLength];
// copy over all the pieces into the slot in the joined buffer
int position = 0;
for ( String piece : pieces )
{
final int length = piece.length();
// copy piece into buffer with System.arraycopy
piece.getChars( 0, length, value, position );
position += length;
}
}
}
--
Roedy Green Canadian Mind Products
http://mindprod.com

There is one brain organ that is optimised for understanding and articulating logical processes, and that is the outer layer of the brain, called the cerebral cortex. Unlike the rest of the brain, this relatively recent evolutionary development is rather flat, only about 0.32 cm (0.12 in) thick, and includes a mere 6 million neurons. This elaborately folded organ provides us with what little competence we do possess for understanding what we do and who we do it.
~ Ray Kurzweil (born: 1948-02-12 age: 61)
 
R

Roedy Green

*
* Summary: Demonstrate how a new efficient concatenating String
constructor would work.


There are over 225 million websites, many with multiple servers (and
some on shared servers). Imagine the aggregate electric bill for the
computers and cooling and the hardware bill. If you did something to
improve the efficiency of them by even 1%, you would have more than
justified your existence in terms of reducing your energy and green
house gas emissions footprint.


--
Roedy Green Canadian Mind Products
http://mindprod.com

There is one brain organ that is optimised for understanding and articulating logical processes, and that is the outer layer of the brain, called the cerebral cortex. Unlike the rest of the brain, this relatively recent evolutionary development is rather flat, only about 0.32 cm (0.12 in) thick, and includes a mere 6 million neurons. This elaborately folded organ provides us with what little competence we do possess for understanding what we do and who we do it.
~ Ray Kurzweil (born: 1948-02-12 age: 61)
 
R

Roedy Green

There is a String(int offset, int count, char[] value) constructor in
Sun's code, which references the array without copying it. But, of
course, that constructor is not public (it has package visibility,
thus can be seen from other classes in java.lang).

That is probably what it was. I recall digging into how substring
worked, with its pinning, which might well have used that constructor.
It is also possible StringBuilder or StringBuffer could use it in some
circumstances.

For now, I think I will flip my code to FastCat, and sacrifice a small
reptile and wait for Sun to hit them selves on the brow, and say "Why
didn't we think of the concatenating String constructor ourselves" and
add the DummyString constructor implementation to the next release.

Then I can think about what a hotspot could do with calls to it and
start jittering Sun and Jet.

--
Roedy Green Canadian Mind Products
http://mindprod.com

There is one brain organ that is optimised for understanding and articulating logical processes, and that is the outer layer of the brain, called the cerebral cortex. Unlike the rest of the brain, this relatively recent evolutionary development is rather flat, only about 0.32 cm (0.12 in) thick, and includes a mere 6 million neurons. This elaborately folded organ provides us with what little competence we do possess for understanding what we do and who we do it.
~ Ray Kurzweil (born: 1948-02-12 age: 61)
 
D

Dave Searles

Patricia said:
Roedy Green wrote:
...
/**
* Get the concatenation of all the strings appended so far
*/
public String toString()
{
int offset = 0;
final char[] buffer = new char[length];
for ( String p : pieces )
{
for ( int i = 0; i < p.length(); i++ )
{
// copy over that once piece, char by char. Can this
be improved?
buffer[ offset++ ] = p.charAt( i );
}
}
...

For long pieces, I would try the libc memcpy.

This is probably what System.arraycopy does.
 
R

Roedy Green

StringBuilder could maintain an
internal list (not necessarily List) of the appended Strings, and
postpone building the char[] until it was actually needed

that is what the FastCat and DummyString code I posted does.

See

https://wush.net/websvn/mindprod/fi...dprod&path=/com/mindprod/fastcat/FastCat.java

https://wush.net/websvn/mindprod/fi...d&path=/com/mindprod/fastcat/DummyString.java

I have submitted an RFE to Sun to add a concatenating String...
constructor to do this, thus eliminating all intermediate buffers and
intermediate copying.
--
Roedy Green Canadian Mind Products
http://mindprod.com

There is one brain organ that is optimised for understanding and articulating logical processes, and that is the outer layer of the brain, called the cerebral cortex. Unlike the rest of the brain, this relatively recent evolutionary development is rather flat, only about 0.32 cm (0.12 in) thick, and includes a mere 6 million neurons. This elaborately folded organ provides us with what little competence we do possess for understanding what we do and who we do it.
~ Ray Kurzweil (born: 1948-02-12 age: 61)
 
R

Roedy Green

nd as someone else mentioned, the new method would change
the behavior of today's StringBuilder:

StringBuilder buff = new StringBuilder();
buff.append( new String[] { "Hello, ", "world!" } );
System.out.println(buff);

I don't follow. The API has an extra method, but I can't see how the
result would be any different from traditional StringBuilder or
StringBuffer.
--
Roedy Green Canadian Mind Products
http://mindprod.com

There is one brain organ that is optimised for understanding and articulating logical processes, and that is the outer layer of the brain, called the cerebral cortex. Unlike the rest of the brain, this relatively recent evolutionary development is rather flat, only about 0.32 cm (0.12 in) thick, and includes a mere 6 million neurons. This elaborately folded organ provides us with what little competence we do possess for understanding what we do and who we do it.
~ Ray Kurzweil (born: 1948-02-12 age: 61)
 
R

Roedy Green

This is probably what System.arraycopy does.

System.arraycopy is a native method.

I would guess for long strings it tries to do the bulk of the copying
with maximally aligned chunks. One nice side effect of Java is every
object is likely paragraph aligned naturally, which makes copying
naturally efficient.

The Pentium has a:
REP MOVSD to move 32 bits at a pop.
REP MOVSW to move 16 bits at a pop
REP MOVSB to move 8 bits at a pop.

The last time I counted cycles, these instructions were not well
implemented, and it was faster to code them with more primitive
instructions. I don't know about the most recent generation of CPUs.

I presume the AMD 64 bit instruction set has something similar to copy
64 or more bits at pop. You don't have to copy char[] a char a time.

Circa 1984 I wrote such copy routines for the 8086 that copied the
bulk of a string as aligned character pairs, then tidied up the head
and tail moving individual characters as part of the BBL Forth
runtime.
--
Roedy Green Canadian Mind Products
http://mindprod.com

There is one brain organ that is optimised for understanding and articulating logical processes, and that is the outer layer of the brain, called the cerebral cortex. Unlike the rest of the brain, this relatively recent evolutionary development is rather flat, only about 0.32 cm (0.12 in) thick, and includes a mere 6 million neurons. This elaborately folded organ provides us with what little competence we do possess for understanding what we do and who we do it.
~ Ray Kurzweil (born: 1948-02-12 age: 61)
 
R

Roedy Green

Interesting number: That's a Web site for every thirty people on
the planet, or about one for every 7.5 Internet-connected people.

<http://en.wikipedia.org/wiki/World_population>
<http://www.internetworldstats.com/stats.htm>

How reliable is that "225 million" figure?

The figure came from Netcraft, which are the people I find used as the
reference at the various sites that answer the question "how many
websites are there?" in a search engine.

Perhaps they use a very broad definition, so anyone with facebook
page, blog etc. is consider to have a website.
--
Roedy Green Canadian Mind Products
http://mindprod.com

There is one brain organ that is optimised for understanding and articulating logical processes, and that is the outer layer of the brain, called the cerebral cortex. Unlike the rest of the brain, this relatively recent evolutionary development is rather flat, only about 0.32 cm (0.12 in) thick, and includes a mere 6 million neurons. This elaborately folded organ provides us with what little competence we do possess for understanding what we do and who we do it.
~ Ray Kurzweil (born: 1948-02-12 age: 61)
 
T

Tom Anderson

In the ordinary case, the only methods of StringBuilder you use are
append and toString. Consider that server side code is mostly just
appending strings to put together the response.

Shhhh! I won't be able to charge half what i do if people knew that's all
it was!

tom
 
R

Roedy Green

It depends on how append handles the String[]. If it were to just copy
the reference, the result would depend on subsequent changes to the
array. However, I would expect it to append the contents of the String[]
to its own list of fragments.

Oh heavens yes. It would have to copy the references in the ...
array onto the end of its own linear list of pieces. It would not
have to copy the individual characters since they are immutable.
--
Roedy Green Canadian Mind Products
http://mindprod.com

There is one brain organ that is optimised for understanding and articulating logical processes, and that is the outer layer of the brain, called the cerebral cortex. Unlike the rest of the brain, this relatively recent evolutionary development is rather flat, only about 0.32 cm (0.12 in) thick, and includes a mere 6 million neurons. This elaborately folded organ provides us with what little competence we do possess for understanding what we do and who we do it.
~ Ray Kurzweil (born: 1948-02-12 age: 61)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top