writeObject too slow for big object, any idea?

K

Kaidi

Hi,
Anyone know how to write/read objects to/from file faster than Java's
writeObject?

My program needs to write a ArrayList to file, which I found is very
very
slow and memory consumming using writeObject.
The arraylist is about 5000 size, each item of it is another arraylist
of about 3500 size, and each item of that is a Float. (So it is a
nested arraylist, with inner most object Float)

So I would expect the size of it should be around 5000*3500*2/1000000
= 35M byte, plus overhead of class/object.

However, I found when I try to use this code to write it out:

File thefile = new File(outfile);
FileOutputStream fout = new FileOutputStream(thefile);
ObjectOutputStream oos = new ObjectOutputStream(fout);
oos.writeObject(vectorDB);//vectorDB is the arraylist mentioned
above.
oos.close();
fout.close();

With JBuilder personal 9, P4-2.2, 512M memory PC, with parameter of
"-Xmx512m", the above code will get out of memory java error and quit.
I noticed from WindowsXP's task manager, the system had a highpeak
memory
consume of > 900M (virtual of course). (CPU is also near 100% usage
for that)

So any idea of it? Why it is so slow and memory consuming for such a
35M data?
And any way to deal with it?
Thanks a lot.
 
C

Chris Smith

Kaidi said:
My program needs to write a ArrayList to file, which I found is very
very
slow and memory consumming using writeObject.
The arraylist is about 5000 size, each item of it is another arraylist
of about 3500 size, and each item of that is a Float. (So it is a
nested arraylist, with inner most object Float)

Here's the problem. Each time you write an object to an
ObjectOutputStream, the stream remembers it so that if it's rewritten,
only a reference will be written instead of rewriting the whole
object... but you've got 17.5 million Float objects, which is causing
that table to get very large. You could probably get some improvement
by simply writing in pieces and calling the stream's reset() method
after each object written.

A more complete solution with far better efficiency could probably be
built by using DataOutputStream and designing your own format to meet
your needs.

--
www.designacourse.com
The Easiest Way to Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
C

Chris Uppal

Kaidi said:
The arraylist is about 5000 size, each item of it is another arraylist
of about 3500 size, and each item of that is a Float. (So it is a
nested arraylist, with inner most object Float)

Hmm....

So you have about 5000 * 3500 Float objects. Each of these is the internal
float (4 bytes) plus whatever overhead your JVM uses for the object header.
Assume that's 8 bytes (it could be more). Add in 4 bytes (assuming a 32bit
JVM) for the "pointer" to the Float in the ArrayList (the slot in its internal
Object[] array), and you are talking nearly 300M.

Even before you try to serialise it.

I strongly suggest that you store your data as float[] (or double[]) arrays
internally; that would cut your space use down to about 70M.

-- chris
 
K

Kaidi

Thanks for the advices.
I now output my data using the writeFloat/Double/Int, which
only take seconds. 10000 faster than writeObject!~
(PS: I changed float to double to get more precision in my program)

i=vectorDB.size();
int j, count_j;
oos.writeInt(i);
for (count_i=0;count_i<i;count_i++)
{
tarray=(ArrayList)vectorDB.get(count_i);
j=tarray.size();
oos.writeInt(j);
for (count_j=0;count_j<j;count_j++)
{
oos.writeDouble(((Double)tarray.get(count_j)).doubleValue());
};
};
//

I really wonder why they don't make Java's writeObject (which
of course is more easy to use for beginners) better?
Or, maybe that function is for beginner's fun playing only? :)
 
R

Roedy Green

The arraylist is about 5000 size, each item of it is another arraylist
of about 3500 size, and each item of that is a Float. (So it is a
nested arraylist, with inner most object Float)

I would change that Float to a float. That should both speed it up
considerably and make it more compact.
 
R

Robert Olofsson

: Thanks for the advices.
: I now output my data using the writeFloat/Double/Int, which
: only take seconds. 10000 faster than writeObject!~
: ...
: I really wonder why they don't make Java's writeObject (which
: of course is more easy to use for beginners) better?
: Or, maybe that function is for beginner's fun playing only? :)

Define better. Serialization is very easy to use and is acceptably
fast in most cases. You have big amounts of _simple_ data so
writeObject may not be what you want. But for sending a few complex
objects over the wire can you beat the simplicity of serialization?
For example with serialization you can write a Map<Tree<List>>, where
the list holds references to the outer map, with one line of
code. That is something that I find quite powerful.
Now if all you have is a byte[] it is hard to beat the speed of
stream.write (byte[]) and serialization is not the right tool.

Serialization is also quite hard to use in that it is easy to get a
lot of data you dont want in the stream. It is very easy to write a
container and find out that a whole GUI was also written to the
stream. The other strange thing that may happen is that if you have a
long lived stream and write object A, write some other objects, change
A, write A. You will find that when you read the stream back both A's
will be the same and look like the first. Serialization must handle
circular refs and this can make it hard to use...

You have to remember that serialization do provide two methods to
handle the stream generated: writeObject and writeExternal if you use
theese methods correctly you can get a good speedup from the default,
but you pay a price of complexity.

All in all, serialization is nice for some tasks, but not fall all.

Happy hacking.
/robo
 
A

Andrew Thompson

<snip some good stuff>
....
| Serialization is also quite hard to use in that it is easy to
get a
| lot of data you dont want in the stream.

I would state that somewhat differently..
"Serialization is also so easy to use in that it
encourages lazyness"

| ...It is very easy to write a
| container and find out that a whole GUI was also written to the
| stream.

Writing a container or GUI component?

That _is_ lazyness, I could not believe the
first code I saw where somone did that.

Extract the relevant data from the GUI
(location, size, name of current file etc.)
and write only that!

[ Is there some esoteric app I missed,
where it actually makes sense to serialize
the entire GUI? ]
 
R

Robert Olofsson

: | ...It is very easy to write a
: | container and find out that a whole GUI was also written to the
: | stream.

: Writing a container or GUI component?

More like trying to write a small container, but some object in the
container have a reference to a button and that button have a
reference to the frame. So accidently you get the whole GUI.

It is easy to do. Im not sure I would call it lazyness. but using
serialization correctly requires some thought.

: [ Is there some esoteric app I missed,
: where it actually makes sense to serialize
: the entire GUI? ]

I actually did write a small drawing program where I serialized the
canvas to disk. It was a quick and easy school assignment.
The things painted were components in the canvas, some had actions
bound to click (like the tool to draw a line), some were only graphics
(like the gif-image).
Had to make sure the canvas did not serialize the parent reference...

Serializing a GUI is not something I recomend, but it can work in some
test/example code.

/robo
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,818
Latest member
Brigette36

Latest Threads

Top