bob said:
So, I wrote some code, but it is slow as molasses. Any easy ways to
speed this up? [...]
Scanner s = new Scanner(is);
[...]
After measuring, buffering and considering double, helpfully
suggested in adjacent answers, StreamTokenizer [1] may prove
measurably faster than Scanner [2]. One caveat: StreamTokenizer
can't parse scientific notation. Workaround's are possible [3,4],
although I haven't tested any.
Focusing on just the parsing, the example below produces the
following results:
Token: 5165000
Scan: 681903000
Token: 427000
Scan: 185379000
Token: 878000
Scan: 63467000
Token: 398000
Scan: 63480000
Token: 570000
Scan: 62084000
import java.io.IOException;
import java.io.StreamTokenizer;
import java.io.StringReader;
import java.util.Random;
import java.util.Scanner;
public class ScannerTest {
private static final Random r = new Random();
private static final int N = 10000;
public static void main(String[] args) {
StringBuilder sb = new StringBuilder(N);
for (int i = 0; i < N; i++) {
sb.append(r.nextGaussian());
sb.append('\n');
}
String s = sb.toString();
for (int i = 0; i < 5; i++) {
long start = System.nanoTime();
tokenize(s);
System.out.println("Token: " + (System.nanoTime()- start));
start = System.nanoTime();
scan(s);
System.out.println("Scan: " + (System.nanoTime() - start));
}
}
private static void tokenize(String s) {
StreamTokenizer st = new StreamTokenizer(new StringReader(s));
int token = 0;
try {
while ((token = st.nextToken())
== StreamTokenizer.TT_NUMBER){
double d = st.nval;
}
} catch (IOException e) {
e.printStackTrace(System.err);
}
}
private static void scan(String s) {
Scanner scanner = new Scanner(s);
while (scanner.hasNextDouble()) {
double d = scanner.nextDouble();
}
scanner.close();
}
}
[1]<
http://download.oracle.com/javase/7/docs/api/java/io/StreamTokenizer.....>
[2]<
http://stackoverflow.com/questions/2082174>
[3]<
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4079180>
[4]<
http://www.resplendent.com/StlFileParser.java>