Low-latency alternative to Java Object Serialization

G

Giovanni Azua

Hi again :)

I have this lite Client-Server framework based on Blocking IO using classic
java.net.* Sockets (must develop it myself for a grad course project). The
way I am using to pass data over the Sockets is via Serialization i.e.
ObjectOutputStream#writeObject(...) and ObjectInputStream#readObject(...) I
was wondering if anyone can recommend a Serialization framework that would
outperform the vanilla Java default Serialization?

Three years ago I worked for a "high frequency trading" company and they
avoided default Java Serialization like "the devil to the cross" this is a
Spanish idiom btw ... :) due to its latency. However, I must say that their
remoting framework dated back to the Java stone age and my guess is that the
default Serialization must have improved over the years; I don't have hard
numbers to judge though. I remember JBoss Middleware implementation having
some Serialization framework for this very same reason ... have to check
that too.

Can anyone advice what would be best than Java Serialization without
requiring an unreasonably heavy dependency footprint?

Many thanks in advance,
Best regards,
Giovanni
 
L

Lew

Giovanni said:
I have this lite Client-Server framework based on Blocking IO using classic
java.net.* Sockets (must develop it myself for a grad course project). The
way I am using to pass data over the Sockets is via Serialization i.e.
ObjectOutputStream#writeObject(...) and ObjectInputStream#readObject(...)I
was wondering if anyone can recommend a Serialization framework that would
outperform the vanilla Java default Serialization?

Three years ago I worked for a "high frequency trading" company and they
avoided default Java Serialization like "the devil to the cross" this is a
Spanish idiom btw ... :) due to its latency. However, I must say that their
remoting framework dated back to the Java stone age and my guess is that the
default Serialization must have improved over the years; I don't have hard
numbers to judge though. I remember JBoss Middleware implementation having
some Serialization framework for this very same reason ... have to check
that too.

Can anyone advice what would be best than Java Serialization without
requiring an unreasonably heavy dependency footprint?

Side bar: What exactly do you mean by "latency" here?

Serialization assumes no knowledge on the restoring end about the structures to restore, so all knowledge has to reside in the serialization format.

Circular dependencies, inheritance chains, the whole megillah has to be encoded into the serialized stream.

Serialization is designed to store and restore object graphs, not the data in them.

Take a page from web services and create an XML schema to represent the *data* you wish to transfer. This assumes knowledge on both ends of the structures used to hold the data, unlike object serialization, hence much less information must flow between the participants.

Use JAXB to generate the classes used to process that schema and incorporate those classes into the protocol at both ends.

Fast, standard and fairly low effort and low maintenance, assuming you haveversion control and continuous integration (CI).

By "fast" I mean both to develop and to operate.

You will write custom code to jam the data into your JAXB-generated structures and retrieve them therefrom.

But you will be transmitting data via a format that omits the object graph overhead and focuses on just the data to share. The object-graph knowledgeis coded into the application and need not be transferred.

XML is awesome for this kind of task.
 
M

markspace

Three years ago I worked for a "high frequency trading" company and they
avoided default Java Serialization like "the devil to the cross"


Just because "avoid serialization" was a requirement for your previous
work, doesn't mean that it should be a requirement for every project
after that.

Frequently, the low-developer cost of Java serialization overrides all
other concerns. The increase in CPU costs and network bandwidth it
requires is very cheap. DO NOT work around Java serialization unless
you are sure you need to. I.e., after careful analysis (and profiling)
of a working app or prototype.

If you do need to work around Java serialization, look at
Externalizable interface.

http://java.sun.com/developer/technicalArticles/Programming/serialization/

Note the sections on "gotchas" in that article. Esp. both the caching
and the performance considerations.

Totally rolling your own protocol is possible too if you need the utmost
performance. 'Tain't hard. 'Tain't easy either. Data IO Streams are a
good compromise between higher level serialization and raw sockets.

<http://download.oracle.com/javase/7/docs/api/java/io/DataOutputStream.html>
 
R

Robert Klemme

Side bar: What exactly do you mean by "latency" here?

Serialization assumes no knowledge on the restoring end about the structures to restore, so all knowledge has to reside in the serialization format.

Circular dependencies, inheritance chains, the whole megillah has to be encoded into the serialized stream.

Serialization is designed to store and restore object graphs, not the data in them.

Take a page from web services and create an XML schema to represent the *data* you wish to transfer. This assumes knowledge on both ends of the structures used to hold the data, unlike object serialization, hence much less information must flow between the participants.

Use JAXB to generate the classes used to process that schema and incorporate those classes into the protocol at both ends.

Fast, standard and fairly low effort and low maintenance, assuming you have version control and continuous integration (CI).

By "fast" I mean both to develop and to operate.

You will write custom code to jam the data into your JAXB-generated structures and retrieve them therefrom.

But you will be transmitting data via a format that omits the object graph overhead and focuses on just the data to share. The object-graph knowledge is coded into the application and need not be transferred.

XML is awesome for this kind of task.

http://www.json.org/ might also be a good alternative which - depending
on format etc. - can be less verbose. See http://json.org/example.html

Kind regards

robert
 
R

Robert Klemme

JSON is convenient for JavaScript heads, it is not human readable,
this is one reason why XML exists in the first place.

I am not sure why you say JSON is not human readable while XML is.
Remember: for network transfer you would use the most compressed format
of either which means that for XML you would not have line breaks and
indentation. I'd say an XML on one line with a reasonable complex
structure is not human readable.
JSON was
a mistake, instead of coming up with an arcane hacked syntax
to replace XML; JavaScript should have been improved to handle
XML.

That sounds like opinion to me. Can you provide any real arguments why
XML should be chosen for as a data transfer format over JSON?

XML does have some overhead and often uses more bytes to represent the
same structure.

There's also an interesting discussion at stackoverflow:
http://stackoverflow.com/questions/2636245/choosing-between-json-and-xml#2636380

Kind regards

robert
 
R

Robert Klemme

Hi again :)

I have this lite Client-Server framework based on Blocking IO using classic
java.net.* Sockets (must develop it myself for a grad course project). The
way I am using to pass data over the Sockets is via Serialization i.e.
ObjectOutputStream#writeObject(...) and ObjectInputStream#readObject(...) I
was wondering if anyone can recommend a Serialization framework that would
outperform the vanilla Java default Serialization?

Three years ago I worked for a "high frequency trading" company and they
avoided default Java Serialization like "the devil to the cross" this is a
Spanish idiom btw ... :) due to its latency. However, I must say that their
remoting framework dated back to the Java stone age and my guess is that the
default Serialization must have improved over the years; I don't have hard
numbers to judge though. I remember JBoss Middleware implementation having
some Serialization framework for this very same reason ... have to check
that too.

Can anyone advice what would be best than Java Serialization without
requiring an unreasonably heavy dependency footprint?

Btw, there is a completely different option not mentioned so far: CORBA
with IIOP which was specifically designed for remote communication. Of
course this would mean that you had to exchange your complete
communication layer - but I wanted to mention it because I believe CORBA
is used too rarely because it somehow seems out of fashion. But if you
look at network bandwidth used I believe CORBA is a pretty good
contender compared to SOAP for example.

Kind regards

robert
 
T

Tom Anderson

I was wondering if anyone can recommend a Serialization framework that
would outperform the vanilla Java default Serialization?

Swords not words:

https://github.com/eishay/jvm-serializers/wiki/

I sent them a patch to add JBoss Serialization a while ago, but they
haven't taken it. I should try again now the project is on GitHub.
I remember JBoss Middleware implementation having some Serialization
framework for this very same reason ... have to check that too.

It's pretty good. More or less plug-compatible with JDK serialization at
the API level (as in, it doesn't need schema generation or weird
interfaces or anything), and much faster. From what i remember of my
benchmarks, it was faster than any of the textual formats, and only a bit
slower than the schema-based binary formats like Protocol Buffers.

tom
 
T

Tom Anderson

Serialization assumes no knowledge on the restoring end about the
structures to restore, so all knowledge has to reside in the
serialization format.

Circular dependencies, inheritance chains, the whole megillah has to be
encoded into the serialized stream.

Serialization is designed to store and restore object graphs, not the
data in them.

Take a page from web services and create an XML schema to represent the
*data* you wish to transfer. This assumes knowledge on both ends of the
structures used to hold the data, unlike object serialization, hence
much less information must flow between the participants.

Use JAXB to generate the classes used to process that schema and
incorporate those classes into the protocol at both ends.

Fast, standard and fairly low effort and low maintenance, assuming you
have version control and continuous integration (CI).

By "fast" I mean both to develop and to operate.

Interesting. I do not believe this to be true. Specifically, i believe
that: (a) developing an XML-based transfer format using JAXB will take
considerably more effort than using standard serialization, or an equally
convenient library such as JBoss Serialization, although still not a large
amount of effort, certainly; (b) the data will be larger than
with standard serialization (because the "object graph overhead" is not
actually that large, and XML is much less space-efficient than
serialization's binary format); and (c) the speed of operation, even
assuming an infinitely fast network, will be lower.

One get-out clause: for very short streams (one or a few objects), XML
might beat standard serialization for space and speed. Standard
serialization does have some per-class overhead, which is
disproportionately expensive for short streams.

tom
 
R

Roedy Green

JSON is convenient for JavaScript heads, it is not human readable,
JSON is reads much like Java source code. I find it easier
understand than XML even though I know XML much better.

You might have been looking at some compressed JSON, or encrypted SSL
traffic. XML would be just as inscrutable if you compressed it. It
compresses well. (This is not a compliment).
--
Roedy Green Canadian Mind Products
http://mindprod.com
It should not be considered an error when the user starts something
already started or stops something already stopped. This applies
to browsers, services, editors... It is inexcusable to
punish the user by requiring some elaborate sequence to atone,
e.g. open the task editor, find and kill some processes.
 
R

Roedy Green

Specifically, i believe
that: (a) developing an XML-based transfer format using JAXB will take
considerably more effort than using standard serialization

Serialisation handles complex data structures, even loops. XML is
limited to trees.

Serialisation handles any imaginable data type without extra work. XML
requires inventing an external character representation and a way of
converting to chars and back.

Serialisation is hard to upgrade. XML is easy. Serialisation pretty
much requires everyone to stay in sync with identical software. XML
allows clients with out of date software, software in other languages,
or even no software at all.
--
Roedy Green Canadian Mind Products
http://mindprod.com
It should not be considered an error when the user starts something
already started or stops something already stopped. This applies
to browsers, services, editors... It is inexcusable to
punish the user by requiring some elaborate sequence to atone,
e.g. open the task editor, find and kill some processes.
 
R

Roedy Green

The increase in CPU costs and network bandwidth it
requires is very cheap.

I did a system for monitoring security cameras. The boss said
efficiency in transport was the #1 priority because it limited how
many cameras could be monitored at a remote site.

I did it by defining a number of binary records and writing a method
to read/write each type with DataStream. It is conceptually simple --
COBOL think, and had almost no overhead. I could have written a
program to generate the Java code to read and write each method given
a data description, but the formats were stable enough I never
bothered. There were heart beat packets in times of no traffic to let
each side know if the other were still live.

--
Roedy Green Canadian Mind Products
http://mindprod.com
It should not be considered an error when the user starts something
already started or stops something already stopped. This applies
to browsers, services, editors... It is inexcusable to
punish the user by requiring some elaborate sequence to atone,
e.g. open the task editor, find and kill some processes.
 
L

Lew

Tom said:
Interesting. I do not believe this to be true. Specifically, i believe
that: (a) developing an XML-based transfer format using JAXB will take
considerably more effort than using standard serialization, or an equally
convenient library such as JBoss Serialization, although still not a large
amount of effort, certainly; (b) the data will be larger than
with standard serialization (because the "object graph overhead" is not
actually that large, and XML is much less space-efficient than
serialization's binary format); and (c) the speed of operation, even
assuming an infinitely fast network, will be lower.

One get-out clause: for very short streams (one or a few objects), XML
might beat standard serialization for space and speed. Standard
serialization does have some per-class overhead, which is
disproportionately expensive for short streams.

Well, I haven't measured, but let's to a little gedankenexperiment.

Fast to develop - serialization is actually tricky to do right. You can use the absolute defaults, but the world is littered with projects that had maintenance issues because serialization was done simple-mindedly. /Effective Java/ devotes an entire chapter to the topic. JAXB solutions, and I've made several, are very straightforward. Most of the effort goes into schema design, which is parallel to modeling so not even an overhead. I do think JAXB wins, but on balance assess that a competent programmer could do either one well with more-or-less similar effort.

Fast to perform - XML is fast enough. Compressed, its bandwidth is not egregious. Overall I/O considerations should dominate, but I'll take a slightloss for the safety benefits of JAXB.
 
A

Arne Vajhøj

I am not sure why you say JSON is not human readable while XML is.
Remember: for network transfer you would use the most compressed format
of either which means that for XML you would not have line breaks and
indentation. I'd say an XML on one line with a reasonable complex
structure is not human readable.

For WAN network.

For LAN network I doubt that the difference between JSON and XML
will be noticeable (especially if the it is gzipped on the wire).

But internet is usually a lot slower than gigabit inside the
data center.
That sounds like opinion to me. Can you provide any real arguments why
XML should be chosen for as a data transfer format over JSON?

Unless limited bandwidth or ease of use in JavaScript is important,
then XML do seem better.

XML schemas, namespaces etc. makes it a lot more type safe
and reusable among apps.

But limited bandwidth and ease of use in JavaScript applies
to all modern web apps and all smartphone apps, so that is
a very big chunk of development today.

Arne
 
A

Arne Vajhøj

JSON is convenient for JavaScript heads, it is not human readable,
this is one reason why XML exists in the first place.

JSON is often more readable than XML for humans. You have the name
and the value and that is what you ned in most cases. XML with
heavy use of namespaces provide a lot of information for the
programs reading it, but for the human mind all that stuff is
more of a distraction from the essential.

Arne
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,816
Latest member
SapanaCarpetStudio

Latest Threads

Top