String.substring in JDK 1.7.0_6+

J

jlp

The String class was modified in JDK 1.7.0_6.
String.substring that was 0(1) before JDK 1.7.0_6, now becomes O(n)

All is well explained at :
http://java-performance.info/changes-to-string-java-1-7-0_06/

I wrote a small test:
https://gist.github.com/4692960

java -Xms128M -Xmx128M teststring.Main 100000 1000000

On my desktop:
jdk 1.7.0_11 => 33 seconds / 252 KBytes Memory
jdk 1.6.0_38 => 25 milliseconds / 782 KBytes Memory
more than 1000 times faster ! ( Ok! for this stupid test ;-) )

I don't think it is a good improvement ! Uses less memory, but you
retrieve it, when the object is garbaged
What do you think about this ?
 
M

markspace



That's a better test, using production code. However, note that it's
scoped to only one small portion of the code, only while loading a lot
of small scripts.

It's really common for code that was once "working" to suddenly develop
undesirable characteristics as it's exposed to new input or new
environments or new anything. It's just something that happens and part
of the normal maintenance of code.

What do I think of it? It's normal.
 
J

Jan Burse

There seems to be more going on with Strings:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6962931

But I didn't find the original change request (CR)
or request for enhancement concerning the copy
semantics. There was some estimate on applications
impact and allotment for redesigning applications.

I didn't know about hash32 thing in the link below.
Your micro benchmark doesn't test hashCode, does it?
 
K

Kevin McMurtrie

jlp said:
The String class was modified in JDK 1.7.0_6.
String.substring that was 0(1) before JDK 1.7.0_6, now becomes O(n)

All is well explained at :
http://java-performance.info/changes-to-string-java-1-7-0_06/

I wrote a small test:
https://gist.github.com/4692960

java -Xms128M -Xmx128M teststring.Main 100000 1000000

On my desktop:
jdk 1.7.0_11 => 33 seconds / 252 KBytes Memory
jdk 1.6.0_38 => 25 milliseconds / 782 KBytes Memory
more than 1000 times faster ! ( Ok! for this stupid test ;-) )

I don't think it is a good improvement ! Uses less memory, but you
retrieve it, when the object is garbaged
What do you think about this ?

It's an unbelievable change. Buffer sharing in Java 1-6 had the simple
workaround of calling the String(String) constructor. There's no
workaround for Strings getting much slower in Java 7+.

What Oracle did in Java 7 would only make sense if CharSequence had
buffer sharing and better support. I just checked Java 7, and
CharSequence looks more useless than ever. String.subSequence()
allocates a new char[] and the usual parsers (Integer, Long, Float)
still only accept a String. Slow Strings it is.


Side rant:
Sun broke buffer sharing between StringBuffer and String back in Java 5.
The reason was so that AbstractStringBuilder class could support the
implementations for both StringBuffer and StringBuilder. Had they kept
the implementations split, we could still have a very fast
StringBuffer.toString(). As a final F-U, none of classes can be
extended even through there's no buffer sharing that can be hacked.
 
M

markspace

and the usual parsers (Integer, Long, Float)
still only accept a String. Slow Strings it is.

This I agree is a bit of a bummer, it would be useful for the parsers to
take CharSequence for flexibility. Integers aren't hard to parse but
floats and doubles are non-trivial.

Note however that Scanner accepts both Readable (a Reader) and
ReadableByteChannel in its constructors.
Side rant:
Sun broke buffer sharing between StringBuffer and String back in Java 5.

Probably because Strings needed to be immutable and there's no way to do
that when sharing a mutable buffer.
we could still have a very fast
StringBuffer.toString().

Nope, see above.
As a final F-U, none of classes can be
extended even through there's no buffer sharing that can be hacked.

Probably because they don't want you doing stupid broken things, like
trying to share buffers between immutable and mutable objects.

I still agree that CharSequence could be made more useful though, that's
a good idea. Hmmm.
 
K

Kevin McMurtrie

markspace said:
This I agree is a bit of a bummer, it would be useful for the parsers to
take CharSequence for flexibility. Integers aren't hard to parse but
floats and doubles are non-trivial.

Note however that Scanner accepts both Readable (a Reader) and
ReadableByteChannel in its constructors.


Probably because Strings needed to be immutable and there's no way to do
that when sharing a mutable buffer.

The original StringBuffer class was synchronized, final, and protected
its internal char[]. There was no way to trick it into altering the
buffer after turning it into String. At worst it was the same speed as
today's StringBuilder. For the common case of being a single-use
object, it was much faster.
 
M

markspace

The original StringBuffer class was synchronized, final, and protected
its internal char[]. There was no way to trick it into altering the
buffer after turning it into String.


Other than calling append(), you mean? Maybe StringBuffer also cleared
its buffer so it couldn't be reused (although I don't see that in the
docs), however I'd bet that current implementations rely on JIT compiler
to optimize away unneeded buffer copies.

I'm thinking this does mobile a disservice, however, because JIT might
be hard to do on small devices.
 
K

Kevin McMurtrie

markspace said:
The original StringBuffer class was synchronized, final, and protected
its internal char[]. There was no way to trick it into altering the
buffer after turning it into String.


Other than calling append(), you mean? Maybe StringBuffer also cleared
its buffer so it couldn't be reused (although I don't see that in the
docs), however I'd bet that current implementations rely on JIT compiler
to optimize away unneeded buffer copies.

I'm thinking this does mobile a disservice, however, because JIT might
be hard to do on small devices.

The original StringBuffer went to copy-on-write mode after calling
toString(). You can go read the old code for yourself. There was no
JVM trick involved.

Some future JVMs do have JIT tricks to improve String performance. It's
not clear how that would perform or what the side effects would be. One
proposed trick was to make the String(String) constructor a no-op. That
could have disastrous consequences. For example, the code below uses
specific object references as signals. If the String constructor was a
no-op, the signal references would be interned constants that are not
unique.

static final String eofMarker = new String("EOF");
static final String flushMarker = new String("Flush");
final ArrayBlockingQueue<String> queue= new
ArrayBlockingQueue<String>(1000);

void processQueue() throws InterruptedException
{
String str;
while ((str= queue.take()) != eofMarker)
{
if (str == flushMarker)
{
//Flush

}
else
{
//Process string

}
}
}
 
M

markspace

Some future JVMs do have JIT tricks to improve String performance. It's
not clear how that would perform or what the side effects would be. One

The main think I'd like to see as a "trick" would be to spot when an
array is not accessed after a copy, thus negating the need for a copy.

String constructor:

public String( char[] chars ) {
this.buffer = Arrays.copyOf( chars, chars.length );
}

Usage:

public String someMethod() {
char[] myBuff = ... // a local variable
return new String( myBuff );
}

Spotting that the copy isn't needed because myBuff is local and can't be
accessed after the return is one obvious optimization.

If this type of analysis is very hard, I can see the original
implementation of StringBuilder would be advantageous. OTOH, it doesn't
look hard, and I'd bet there's a lot of situations where checking a
"copy-on-write bit" is a bigger performance hit.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,825
Latest member
VernonQuy6

Latest Threads

Top