Speeding up 1.1.8 accessing a JarFile using ZipFile()

N

No6

Hi,

I have an application that runs under Java 1.1.8 thru 1.4. All is well except
for the slow performance of the 1.1.8 environments getting an entry from a JAR
file.

It appears that under 118 the call to ZipFile() is taking a lot more time than
under 1.4. I am using no command line argyuments to java when running.

Is this a known issue with 118 ?

Anyone got any ideas on how to speed this up that is workable in Java 118 thru
1.4 ?

Thanks

IAP

----

import java.io.*;
import java.util.*;
import java.net.*;
import java.util.zip.*;

// The JAR classes were not available in 118 (Apple MAC OS 9 limit)
//
//import java.util.jar.*;

public class JarTest
{
public static void main(String[] args) throws IOException
{
String jarFile = "./Forum04.jar";
String fileName = "Forum04/101895.htm";

method1(jarFile, fileName);
} // main

private static void method1(String jar, String file) throws IOException
{
ZipFile zipFile = null;
ZipEntry zipEntry = null;

long start = System.currentTimeMillis();
zipFile = new ZipFile(jar);
long end = System.currentTimeMillis();
System.out.println("IAP M1ZipFile = " + (int)(end - start) + " ms");

start = System.currentTimeMillis();
zipEntry = zipFile.getEntry(file);
end = System.currentTimeMillis();
System.out.println("IAP M1ZipFile.getEntry = " +
(int)(end - start) + " ms");
zipFile.close();
}
}

----

java version "1.4.1_01"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1_01-b01)
Java HotSpot(TM) Client VM (build 1.4.1_01-b01, mixed mode)
IAP M1ZipFile = 6 ms
IAP M1ZipFile.getEntry = 0 ms

java version "1.1.8"
IAP M1ZipFile = 886 ms
IAP M1ZipFile.getEntry = 1 ms
 
A

Andrew Thompson

I have an application that runs under Java 1.1.8 thru 1.4. All is well except
for the slow performance of the 1.1.8 environments getting an entry from a JAR
file...

How are you doing that?

Their are different ways to retrieve Jar
and Zip entries, and they can vary vastly
in performance.
 
N

No6

How are you doing that?

Their are different ways to retrieve Jar
and Zip entries, and they can vary vastly
in performance.

I included sample code and run times in my original post. Here it is again.

import java.io.*;
import java.util.*;
import java.net.*;
import java.util.zip.*;

// The JAR classes were not available in 118 (Apple MAC OS 9 limit)
//
//import java.util.jar.*;

public class JarTest
{
public static void main(String[] args) throws IOException
{
String jarFile = "./Forum04.jar";
String fileName = "Forum04/101895.htm";

method1(jarFile, fileName);
} // main

private static void method1(String jar, String file) throws IOException
{
ZipFile zipFile = null;
ZipEntry zipEntry = null;

long start = System.currentTimeMillis();
zipFile = new ZipFile(jar);
long end = System.currentTimeMillis();
System.out.println("IAP M1ZipFile = " + (int)(end - start) + " ms");

start = System.currentTimeMillis();
zipEntry = zipFile.getEntry(file);
end = System.currentTimeMillis();
System.out.println("IAP M1ZipFile.getEntry = " +
(int)(end - start) + " ms");
zipFile.close();
}
}

java version "1.4.1_01"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1_01-b01)
Java HotSpot(TM) Client VM (build 1.4.1_01-b01, mixed mode)
IAP M1ZipFile = 6 ms
IAP M1ZipFile.getEntry = 0 ms

java version "1.1.8"
IAP M1ZipFile = 886 ms
IAP M1ZipFile.getEntry = 1 ms
 
A

Andrew Thompson

....
I included sample code and run times in my original post. Here it is again.

Well what a dill, I completely missed it, sorry.
zipFile = new ZipFile(jar);
long end = System.currentTimeMillis();
System.out.println("IAP M1ZipFile = " + (int)(end - start) + " ms");

start = System.currentTimeMillis();
zipEntry = zipFile.getEntry(file);
end = System.currentTimeMillis();
System.out.println("IAP M1ZipFile.getEntry = " +
(int)(end - start) + " ms");
zipFile.close();
}
}

You are using the 'safe' form of enumerating the entries
of the Zip file as Chris Uppal describes here..
<http://google.com/groups?thl=943029...75689,942667204,942658182,942639231,942383598>

There is an alternate way to get them, described
earlier in the thread, that may be much quicker.

In a nutshell, you get the enumeration of the
Zip entries and request each zip'd file separately,
using the ZipEntry's from the enumeration.

There is a lot more detail sprinkled
throughout that thread.

HTH

(let us know how you go)
 
N

No6

Andrew Thompson said:
In a nutshell, you get the enumeration of the
Zip entries and request each zip'd file separately,
using the ZipEntry's from the enumeration.

Thanks for the reply and link to the thread. I implemented the below code

----

private static void method2(String jar, String file) throws IOException
{

long start = System.currentTimeMillis();
ZipInputStream zis = new ZipInputStream(
new BufferedInputStream(
new FileInputStream(jar)));
long end = System.currentTimeMillis();
System.out.println("IAP M2ZipInputStream = " + (int)(end - start) + " ms
");

ZipEntry entry;
byte[] buffer = new byte[1024];

long end1 = 0;
long start1 = System.currentTimeMillis();
int got;
while ((entry = zis.getNextEntry()) != null)
{
//System.out.println("IAP - " + entry.getName());
if (entry.getName().equals(file))
{
System.out.println("IAP - " + entry.getName());
end1 = System.currentTimeMillis();
continue;
}
} // while()

System.out.println("IAP M2Search = " + (int)(end1 - start1) + " ms");
System.out.println("IAP M2Total = " + (int)(end1 - start) + " ms");
} // method2
--------------

java is /opt/j2sdk1.4.1_01/bin//java
java version "1.4.1_01"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1_01-b01)
Java HotSpot(TM) Client VM (build 1.4.1_01-b01, mixed mode)
IAP M2ZipInputStream = 5 ms
IAP - Forum04/101895.htm
IAP M2Search = 1652 ms
IAP M2Total = 1662 ms

---

java is /app/home/newMwr/MwrCdr/CdrTools/jdk1.1.8/bin/java
java version "1.1.8"
IAP M2ZipInputStream = 11 ms
IAP - Forum04/101895.htm
IAP M2Search = 1348 ms
IAP M2Total = 1361 ms

------------

The run-time is now more consistent between 118 and 141, but is actually worse
than using ZipFile(). The additional time spent iterarting through the entries
kills me.

The JAR file I am opening is around 4M in size and contains around 25,000
entries. The file I am searching for is one of the last in the jar file
according to 'jar -tvf'.

IAP
 
N

No6

Thanks for the reply and link to the thread. I implemented the below code

Just revisited the code and added a more suitable exit condition for the while()
loop :)

It hasn't helped with the run times much as the file was very near the end of
the jar file anyway. The code with the bogus 'continue;' was doing the
worst-case scenario of accessing the last file in the jar file.

IAP
 
A

Andrew Thompson

On 19 Aug 2004 16:48:38 -0700, No6 wrote:

I had been meaning to ask how many hundreds/thousands of
entries your were retrieving, since (even an humungous)
886 millisec is not too bad a hit for the user if you
can time it correctly and not do it too often..

In fact, I had bean meaning to ask more of the plan
of what you wish to achieve, since their are often
better ways to approach particularly confounding problems.

But..
The JAR file I am opening is around 4M in size and contains around 25,000
entries.

That's a few..
The file I am searching for is one

One! It is just one single file that
you require from this archive?!?

What is the rest of it for?
...of the last in the jar file
according to 'jar -tvf'.

OK.. I want to know the answers to the above
questions (more for curiosity than anything)
but let us assume for a moment it *is* a good
idea to distribute a 4Meg Zip for the sake of
a single file.

Here is how I might approach it.

First, try this..

Find how long it takes to retrieve the
single entry, ONCE YOU HAVE THE ZipEntry..

If the ZipEntry retrieval itself is a
short time, consider this strategy.

*Once*
a) get the enumeration of Zip entries
b) iterate them till you find the correct one.
c) XML'ise that single entry.

Whenever reading..
a) read the XNL'd entry back to a ZipEntry
b) retrieve the entry.

Make sense?
 
N

No6

Andrew Thompson said:
OK.. I want to know the answers to the above
questions (more for curiosity than anything)
but let us assume for a moment it *is* a good
idea to distribute a 4Meg Zip for the sake of
a single file.

Here is the method behind my madness.

I am providing an archive of an online web-forum. To date, there is somewhere in
the region of 200,000 posts/messages in the archive. I index the posts using
Lucene and then burn everything onto DVD-R and provide a multi-platform (Mac
OS9/OSX, Win32 and Unix) application.

I use jar files for several reasons.

#1 It remove the need for 200,000+ individual files on the DVD-R, which would be
slow to access. I split the files into batches of 25,000 and jar them up, to
help keep the size of the jar files reasonable.

#2 It allows me to compress data efficiently - compressing many files into one
Jar file offers better compression than compressing many individual files.

#3 It insulates me against the vagueries of filenaming conventions under the
different OSs

#4 I can distribute new content fairly easily - new jar files of posts + Lucene
indexes
Find how long it takes to retrieve the
single entry, ONCE YOU HAVE THE ZipEntry..

This would be quick, as most posts are short, less than 1K.
If the ZipEntry retrieval itself is a
short time, consider this strategy.

I'll try and implement that this weekend and see how it goes.

Thinking aloud - This problem really only affects the Mac OS 9 users, as they
are capped at Java 118. If they upgraded to OS X my problem would disappear. I
wonder if there is an unofficial port of Java 1.3.1 for OS 9 anywhere out there
?

Thanks for the continued suggestions.

IAP
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,185
Members
46,737
Latest member
Georgeengab

Latest Threads

Top