Binary and text file performance

  • Thread starter Michael Preminger
  • Start date
M

Michael Preminger

In association with a Programming course, I need to explore the
performance differences between using binary
files (serialized object representation saved to a file) and text files
(text representations of the same object).

I have made a simple experiment:

I created a class Person with some simple attributes (code given)
below). I gave it a text representation (toString()), that covers
everything needed for a unique reconstruction of an object.

I compare the following scenarios:
1. Saving 20000 replica of the text representation of a Person object
into a text file and then reading them and constructing a person object
from the text representation 20000 times.

2. Saving the same number of replica of the serializable Person object
into a binary file and reconstructing the object the same number of time
from that representation.

I measure the performance time (using System.currentTimeMillis() before
and after and subtracting) of each action

What I see, to my surprize, is that the text representation is more
expensive in memory terms but much much faster to process than the
binary representation. This must mean that serialization /
deserialization of objects in a binary representation is slower than
constructing objects. My saving of space is due to the fact that I am
using the same class, and Java, therefore, only codes the class once
into the file.

My questions are:

1. Is my experiment (see code pieces below) valid and general enough to
compare the performances?


2. Are there cases, similar in spirit to the one above
where a binary representation would outperform a text representation
also in processing time?

(A complete code can be downloaded from here:
http://bibin.hio.no/vu/programmering/uke_19/TestLagring_1.java

----------------------------------------------------------
//Person.java - En klasse som modellerer en
// generisk person
package no.hio.bibin.michaelp.person;

public class Person implements java.io.Serializable
{
//felter (klassens egenskaper)
private String fornavn;//first name
private String etternavn;//last name
private String personnr;//id-number
private boolean kvinne;//gender (true for woman)

//accessor-metoder
public String getFornavn(){
return fornavn;
}
public void setFornavn(String fNavn){
fornavn=fNavn;
}
public String toString(){
String ret= "";
/*if (kvinne){
ret+="Fr. ";
}else{
ret+="Herr ";
}*/
String tf=kvinne?"true":"false";
ret+=etternavn+" "+fornavn+" "+personnr+" "+tf;
return ret;
}
public String getEtternavn(){
return etternavn;
}
public void setEtternavn(String eNavn){
etternavn=eNavn;
}
public String getPersonnr(){
return personnr;
}
public void setPersonnr(String pNr){
personnr=pNr;
}
public boolean getKvinne(){
return kvinne;
}
public void setKvinne(boolean kv){
kvinne=kv;
}
/*public Person(){}*/
/* public String toString(){

}*/
public double arbeidsLast(){
return 0;
}

public Person(String fNavn, String eNavn, String pNr, boolean kvn){
fornavn=fNavn;
etternavn=eNavn;
personnr=pNr;
kvinne=kvn;
}
public String getFodselsdato(){
return personnr.substring(0,6);
}
}
-----------------------------------------------------------------------

//Saving the binary representation

public static void lagreBin(int antallGanger) throws IOException{
oos=new ObjectOutputStream(new FileOutputStream(filebin));
for (int i=0;i<antallGanger;i++){
oos.writeObject(p);
}
oos.close();


}

//saving text representation

public static void lagreString(int antallGanger) throws IOException{
bw=new BufferedWriter(new FileWriter(filetext));
for (int i=0;i<antallGanger;i++){
bw.write(p.toString());
bw.newLine();
}
bw.close();


}

//reconstructing from text representation

public static void lesInnString(int antallGanger) throws IOException{
br=new BufferedReader(new FileReader(filetext));
for (int i=0;i<antallGanger;i++){

String ps=br.readLine();
String[] sa=ps.split(" ");
boolean kv=sa[2].equals("true");
Person p=new Person(sa[0], sa[1], sa[2], kv );
}
}

//reconstructing from binary representation

public static void lesInnBin(int antallGanger) throws IOException,
ClassNotFoundException{
ois=new ObjectInputStream(new FileInputStream(filebin));
for (int i=0;i<antallGanger;i++){
Person p= (Person) ois.readObject();
}
ois.close();
}


Michael Preminger
Høgskolen i Oslo
p48/R507
22 45 27 78
 
S

Skip

Object{In|Out}putStream uses Reflection to (de)serialize Objects.

You can design your own binary format, using the java.io.Externalizable
interface instead of java.io.Serializable in combination with
Object{In|Out}putStreams.

This should give you a nice speedboost over both Serializable *and* plain
text.

Example:

public class Person implements java.io.Externalizable
{
// all your class contents
// ...
String name;
byte age;

public void readExternal(ObjectInput in)
{
name = in.readUTF();
age = in.readByte();
}


public void writeExternal(ObjectOutput out)
{
in.writeUTF(name);
in.writeByte(age);
}
}

//

Person p = new Person();
new ObjectOutputStream(...).writeObject(p); // uses your binary format
 
?

=?ISO-8859-1?Q?Daniel_Sj=F6blom?=

Michael said:
In association with a Programming course, I need to explore the
performance differences between using binary
files (serialized object representation saved to a file) and text files
(text representations of the same object).

I have made a simple experiment:

I created a class Person with some simple attributes (code given)
below). I gave it a text representation (toString()), that covers
everything needed for a unique reconstruction of an object.
My questions are:

1. Is my experiment (see code pieces below) valid and general enough to
compare the performances?

No. It is a comparison of two entirely different things. You have
constructed a text format that is specifically tailored towards the
class you constructed and are comparing it with a general binary format
capable of representing any information in a java class. It should be no
surprise that the custom format is faster, whether it is binary or not.

A meaningful comparison would be between a custom binary format and a
custom text format, or a comparison between a general binary format and
a general text format. For instance you could compare XDR (external data
representation) and an XML representation of similar capabilities (I'm
2. Are there cases, similar in spirit to the one above
where a binary representation would outperform a text representation
also in processing time?

As I said above, you cannot draw any kind of meaningful conclusions from
the experiment you did.
 
R

Ross Bamford

In association with a Programming course, I need to explore the
performance differences between using binary
files (serialized object representation saved to a file) and text files
(text representations of the same object).

I have made a simple experiment:

I created a class Person with some simple attributes (code given)
below). I gave it a text representation (toString()), that covers
everything needed for a unique reconstruction of an object.

I compare the following scenarios:
1. Saving 20000 replica of the text representation of a Person object
into a text file and then reading them and constructing a person object
from the text representation 20000 times.

2. Saving the same number of replica of the serializable Person object
into a binary file and reconstructing the object the same number of time
from that representation.

I measure the performance time (using System.currentTimeMillis() before
and after and subtracting) of each action

What I see, to my surprize, is that the text representation is more
expensive in memory terms but much much faster to process than the
binary representation. This must mean that serialization /
deserialization of objects in a binary representation is slower than
constructing objects. My saving of space is due to the fact that I am
using the same class, and Java, therefore, only codes the class once
into the file.

My questions are:

1. Is my experiment (see code pieces below) valid and general enough to
compare the performances?

You are comparing performance of two different things - your text format
is your own (tight) code that creates a specific class and sets
properties, whereas the serialization mechanism does similar, but only
*after* lots of other stuff, for example to find the appropriate Class
definition from those available, to check the serial versions, get all
your streams ready, and so on.

You could make your test a *little* more fair (on the surface) by using
the XMLEncoder / XMLDecoder classes to serialize to XML as well as
binary, but even they you aren't really testing the difference between
text and binary - more the difference between the Java serialization
system (with all the overhead of XML parsing). I don't recommend this at
all.

2. Are there cases, similar in spirit to the one above
where a binary representation would outperform a text representation
also in processing time?

What I would do instead is to have your object type, and create two
files, into one of which you write a textual representation of your
object's fields (use a Writer), and the other you write a binary
representation (use a DataOutputStream or similar).

You'll notice more performance difference if you have numeric primitive
properties, since your binary version can simply read the byte(s),
whereas your text representation will have to Integer.parseInt() them.

You still aren't guaranteed to get the results you expect, since modern
systems (and JVMs) have all kinds of clever caching and other tricks
that can always throw a spanner in the works.

Cheers,
Ross
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,817
Latest member
DicWeils

Latest Threads

Top