It would be interesting to repeat this experiment with different
character encodings and see if using UTF-16 versus UTF-8 makes a
difference here.
With UTF-16 there are twice as many byte to read, but the 50% magic
ratio still prevails.
best 5 five trials shown.
Using a random sample data file of 209,715,200 chars 419,430,402
bytes.
Using aggregate buffersize of 65,536 bytes.
Using charset UTF-16
BufferedReader backed with BufferedInputStream ratio 0.10 buffsize
65536 bytes 2.64 seconds
BufferedReader backed with BufferedInputStream ratio 0.20 buffsize
65536 bytes 2.63 seconds
BufferedReader backed with BufferedInputStream ratio 0.30 buffsize
65536 bytes 2.64 seconds
BufferedReader backed with BufferedInputStream ratio 0.40 buffsize
65536 bytes 2.64 seconds
BufferedReader backed with BufferedInputStream ratio 0.50 buffsize <--
65536 bytes 2.64 seconds
BufferedReader backed with BufferedInputStream ratio 0.60 buffsize
65536 bytes 2.68 seconds
BufferedReader backed with BufferedInputStream ratio 0.70 buffsize
65536 bytes 2.71 seconds
BufferedReader backed with BufferedInputStream ratio 0.80 buffsize
65536 bytes 2.79 seconds
BufferedReader backed with BufferedInputStream ratio 0.90 buffsize
65536 bytes 2.70 seconds
HunkIO 2.70 seconds
Using a random sample data file of 209,715,200 chars 209,715,200
bytes.
Using aggregate buffersize of 65,536 bytes.
Using charset UTF-8
HunkIO 0.73 seconds
BufferedReader backed with BufferedInputStream ratio 0.10 buffsize
65536 bytes 0.89 seconds
BufferedReader backed with BufferedInputStream ratio 0.20 buffsize
65536 bytes 0.88 seconds
BufferedReader backed with BufferedInputStream ratio 0.30 buffsize
65536 bytes 0.88 seconds
BufferedReader backed with BufferedInputStream ratio 0.40 buffsize
65536 bytes 0.89 seconds
BufferedReader backed with BufferedInputStream ratio 0.50 buffsize <--
65536 bytes 0.83 seconds
BufferedReader backed with BufferedInputStream ratio 0.60 buffsize
65536 bytes 0.91 seconds
BufferedReader backed with BufferedInputStream ratio 0.70 buffsize
65536 bytes 0.92 seconds
BufferedReader backed with BufferedInputStream ratio 0.80 buffsize
65536 bytes 0.96 seconds
BufferedReader backed with BufferedInputStream ratio 0.90 buffsize
65536 bytes 0.92 seconds
There is something strange. UTF-16 should be faster to convert to
Strings, and at worst, taking twice as long for physical I/O.