Serialization Slow?

J

Jason Cavett

I'm doing some serialization of data (as a result of a TransferHandler
- basically, I'm doing a copy and paste) and I am noticing that it
becomes increasingly slow as I copy more and more data.

One piece of data that I copy is roughly 3MB of data. When I paste,
however, it takes relatively long (it's noticeable to the user, that's
for sure).

Any suggestions on how I might speed this up. I realize this is a
pretty generic question and if you need more data, I'd be happy to
provide it.


Thanks for any help.
 
T

Tom Anderson

I'm doing some serialization of data (as a result of a TransferHandler
- basically, I'm doing a copy and paste) and I am noticing that it
becomes increasingly slow as I copy more and more data.

One piece of data that I copy is roughly 3MB of data. When I paste,
however, it takes relatively long (it's noticeable to the user, that's
for sure).

Any suggestions on how I might speed this up. I realize this is a
pretty generic question and if you need more data, I'd be happy to
provide it.

Firstly, i take it you're copy and pasting between applications, right? If
not, i think there's a way to do this without serialization at all.

If you do need to serialize, then you can look for ways to speed up
serialization. Default serialization does have a reputation for being a
bit slow - the problem is that it has to deal correctly with any
imaginable object structure, so it has to be very generic and quite
careful in how it does things. If you don't need everything it does you
can go faster.

The first thing you can do is mark as transient any fields you don't need
to be transported - anything that can be recomputed on the other side,
like caches. That cuts down the amount of stuff serialized, and the time
taken to do it. There might not be anything you can do this to, though.

Something i've heard of, but never tried, and am slightly skeptical about,
is declaring your persistent fields. In a class which will be serialized,
declare:

import java.io_ObjectStreamField ;

private static final ObjectStreamField[] serialPersistFields = {
new ObjectStreamField("firstField", FirstFieldType.class),
new ObjectStreamField("secondField", SecondFieldType.class),
// etc
}

The serialization mechanism can use this array to guide its operation,
rather than reflection, and apparently this is faster. I'm dubious about
this: the thing is, you could fairly easily write a helper function would
would take a class and generate a serialPersistFields array for it using
reflection, and then call that to initialise your static
serialPersistFields variable. It would add a bit of a startup time, but
completely automate the process, and allegedly increase performance. But
if this is so, then why doesn't the serialization mechanism do this
itself?

Note that you can also use the ObjectStreamField objects to indicate that
a field is unshared, ie the object referenced is never pointed to from
elsewhere (or if it is, it doesn't matter), which makes serializing it
marginally faster.

Your next option is to implement Externalizable instead of Serializable,
and then write code to handle loading and storing your objects yourself.
This is tedious, but is pretty definitively the fastest way of doing it.
Read all about it:

http://java.sun.com/developer/TechTips/2000/tt0425.html#tip1

Another tack would be to use a third-party serialization implementation
that claims to be faster than Sun's:

http://www.jboss.org/serialization/
http://jserial.sourceforge.net/

JBoss's even claims to be faster than Externalizable.

tom
 
J

Jason Cavett

I'm doing some serialization of data (as a result of a TransferHandler
- basically, I'm doing a copy and paste) and I am noticing that it
becomes increasingly slow as I copy more and more data.
One piece of data that I copy is roughly 3MB of data.  When I paste,
however, it takes relatively long (it's noticeable to the user, that's
for sure).
Any suggestions on how I might speed this up.  I realize this is a
pretty generic question and if you need more data, I'd be happy to
provide it.

Firstly, i take it you're copy and pasting between applications, right? If
not, i think there's a way to do this without serialization at all.

If you do need to serialize, then you can look for ways to speed up
serialization. Default serialization does have a reputation for being a
bit slow - the problem is that it has to deal correctly with any
imaginable object structure, so it has to be very generic and quite
careful in how it does things. If you don't need everything it does you
can go faster.

The first thing you can do is mark as transient any fields you don't need
to be transported - anything that can be recomputed on the other side,
like caches. That cuts down the amount of stuff serialized, and the time
taken to do it. There might not be anything you can do this to, though.

Something i've heard of, but never tried, and am slightly skeptical about,
is declaring your persistent fields. In a class which will be serialized,
declare:

import java.io_ObjectStreamField ;

private static final ObjectStreamField[] serialPersistFields = {
        new ObjectStreamField("firstField", FirstFieldType.class),
        new ObjectStreamField("secondField", SecondFieldType.class),
        // etc

}

The serialization mechanism can use this array to guide its operation,
rather than reflection, and apparently this is faster. I'm dubious about
this: the thing is, you could fairly easily write a helper function would
would take a class and generate a serialPersistFields array for it using
reflection, and then call that to initialise your static
serialPersistFields variable. It would add a bit of a startup time, but
completely automate the process, and allegedly increase performance. But
if this is so, then why doesn't the serialization mechanism do this
itself?

Note that you can also use the ObjectStreamField objects to indicate that
a field is unshared, ie the object referenced is never pointed to from
elsewhere (or if it is, it doesn't matter), which makes serializing it
marginally faster.

Your next option is to implement Externalizable instead of Serializable,
and then write code to handle loading and storing your objects yourself.
This is tedious, but is pretty definitively the fastest way of doing it.
Read all about it:

http://java.sun.com/developer/TechTips/2000/tt0425.html#tip1

Another tack would be to use a third-party serialization implementation
that claims to be faster than Sun's:

http://www.jboss.org/serialization/http://jserial.sourceforge.net/

JBoss's even claims to be faster than Externalizable.

tom

Excellent information. Thank you.

To provide a little more specifics...

I am overriding the method:

private void readObject(java.io_ObjectInputStream in)
throws IOException, ClassNotFoundException;

from the Serialization interface in a class that is (ultimately) being
Serialized during cut/copy/paste. However, in my readObject
implementation (see below), it is taking a ton of time during pasting
(based on profiling that I have done). That's why this issue came up
in the first place.

private void readObject(ObjectInputStream in) throws IOException,
ClassNotFoundException {
in.defaultReadObject();
this.open = false;
this.attributeLock = new Object();
this.childrenLock = new Object();
this.parentLock = new Object();
}

What I'm not sure of, is how I can change this to use the techniques
you described above. I'll keep working at it, but if you can point me
in the right direction here, I'd appreciate it.

Thanks again.
 
R

Roedy Green

Any suggestions on how I might speed this up. I realize this is a
pretty generic question and if you need more data, I'd be happy to
provide it.

1. use DataOutputStream instead.

2. use reset. Part of the problem is on both write and read, you need
to maintain in RAM a huge amount of history information about
previously written/read objects. You can reset to throw it away when
it gets so big there is little ram left for anything else. I have
found that apps died altogether when the stream got too long, starved
to death of RAM. The problem with reset is you then get multiple
copies of class descriptors, and duplicate restored objects.
--
Roedy Green Canadian Mind Products
http://mindprod.com
Your old road is
Rapidly agin'.
Please get out of the new one
If you can't lend your hand
For the times they are a-changin'.
 
T

Tom Anderson

To provide a little more specifics...

I am overriding the method:

private void readObject(java.io_ObjectInputStream in)
throws IOException, ClassNotFoundException;

from the Serialization interface in a class that is (ultimately) being
Serialized during cut/copy/paste. However, in my readObject
implementation (see below), it is taking a ton of time during pasting
(based on profiling that I have done). That's why this issue came up in
the first place.

private void readObject(ObjectInputStream in) throws IOException,
ClassNotFoundException {
in.defaultReadObject();
this.open = false;
this.attributeLock = new Object();
this.childrenLock = new Object();
this.parentLock = new Object();
}

Since you're doing defaultReadObject, all the normal serialization stuff
is happening, at normal serialization speed (as you're doubtless aware).
What I'm not sure of, is how I can change this to use the techniques you
described above. I'll keep working at it, but if you can point me in
the right direction here, I'd appreciate it.

I'd start by trying the JBoss serialization thing, which doesn't involve
changes to your code, just a different stream.

If you're going to do externalization, you have to do write readExternal
and writeExternal methods which write your fields to the stream, as
described in the docs:

http://java.sun.com/javase/6/docs/api/java/io/Externalizable.html

Basically, if you have a class that looks like:

class MyClass {
private int foo ;
private String bar ;
private HisClass baz ;
}

Then your methods look like:

public void readExternal(ObjectInput in) {
foo = in.readInt() ;
bar = in.readUTF() ;
baz = (HisClass)in,readObject() ;
}

public void writeExternal(ObjectOutput out) {
out.writeInt(foo) ;
out.writeUTF(bar) ;
out.writeObject(baz) ;
}

Although you can (and perhaps should) use read/writeObject for strings
instead of read/writeUTF.

You can take it a step further and take care of writing child objects
manually if you like. For example, if i had:

class Dictionary {
private SortedMap<String, List<String>> words ;
}

I might write this:

public void writeExternal(ObjectOutput out) {
out.writeInt(words.size()) ;
for (Map.Entry<String, List<String>> entry: words.entrySet()) {
String word = entry.getKey() ;
List<String> definitions = entry.getValue() ;
out.writeUTF(word) ;
out.writeInt(definitions.size()) ;
for (String definition: definitions) out.writeUTF(definition) ;
}
}

public void readExternal(ObjectInput in) {
words = new TreeMap() ;
int numWords = in.readInt() ;
for (int i = 0 ; i < numWords ; ++i) {
String word = in.readUTF() ;
int numDefinitions = in.readInt() ;
List<String> definitions = new ArrayList(numDefinitions) ;
words.put(word, definitions) ;
for (int j = 0 ; j < numWords ; ++j) definitions.add(in.readUTF()) ;
}
}

It's a pain in the balls to have to write lots of methods like this, but
it saves space and potentially a *lot* of time, so it can make a huge
difference even if you only apply it to a few classes.

tom
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,997
Messages
2,570,241
Members
46,831
Latest member
RusselWill

Latest Threads

Top