Perhaps it was your assertion that 64K individual writes of one byte would
proceed faster than a single write of 64K bytes that gave that impression.
Here is how I have rewritten the section. I hope this makes
everything clear:
Because of hard disk latency, when you do I/O, it will go faster if
you do it in a few big physical I/O chunks rather than a number of
small ones. If you wrote data one byte at a time, you would have to
wait for the disk arms to snap to the correct cylinder, and for the
platter to rotate round the correct spot every time you wrote a byte.
If you buffered at 64,000 characters, you would have to do this wait
only once every 64,000 characters. Mechanical motion is in the order
of 1000 times slower than electronics.
If you wrote a byte at a time, since the hardware works in 512-byte
sectors at a time, the OS would need to read the sector, plop your
byte into it and write the entire sector back. This would take at
least 2 disk rotations, perhaps 3. Even if you wrote your data 512
bytes at a time, when you went to write the next sector, its spot
would have just past the head, so you would have to wait an entire
rotation for its spot to come round. If you wrote 131,072 bytes (still
less than 1 physical track) at a pop, you could do that all in one
rotation.
Ideally, if you have enough RAM, you do the I/O in one whacking huge
file-sized unbuffered chunk. Java has a number of classes that let you
process a file buffered in convenient small logical chunks, often line
by line. The buffered classes transparently handle the physical I/O in
bigger chunks, typically 4096 bytes. The classes store each large
chunk for physical I/O in a separate piece of RAM called a buffer.
Unless the buffer size for the physical I/O is at least twice as big
as the size of the logical chunks you process, there is not much point
in buffering. The extra buffering copying overhead will just slow you
down.
The File I/O Amanuensis will teach you how to do I/O either buffered
or unbuffered. You can try it both ways, and see which works faster.
You can also experiment with buffer sizes. The bigger the buffer, the
fewer the physical I/Os you need to process the file. However, the
bigger the buffer, the more virtual RAM you will use, which may
trigger more swapping I/O. Further, there is not much point in having
a whacking big buffer for a tiny file. It will take only a few I/Os to
process the file anyway.
You will find that buffer sizes that are a power of two tend to work
faster than other sizes. This is because disk and RAM hardware are
designed around some magic sizes, typically 256, 512, 1024, 2048,
4096, 8192, 16,384, 32,768, 65,536, 131,072 and 262,144 bytes. Buffers
that are powers of two naturally do I/O in physical chunks that align
on powers of two boundaries in the file. This too makes the I/O more
efficient because the hardware works typically in 512 byte sector
chunks. If you do unbuffered I/O, likewise try to start your I/Os on
boundaries that are even multiples of some power of two, the higher
the power of two the better. e.g. it is better to start I/O on
boundaries that are even multiples of 8096 rather than just 128.
Sometimes it pays to pad your fixed-length records up to the next
power of two. If you can help it, arrange your logical record size and
buffer size so that logical records are aligned so that they never (or
rarely) span two buffer fulls. It also helps to have your buffers
aligned on physical RAM addresses that are even powers of two as well,
though you have no control of that in Java.
In the olden days, CØBØL programs used double buffering. They used two
or more buffers per file. The computer would read ahead filling
buffers while the program was busy processing one of the previous
buffers. Oddly, Java does not support this efficient serial processing
technique, though sometimes the operating system maintains its own
private set of read-ahead buffers behind the scenes. Unfortunately,
the OS's cascaded buffering is less efficient than using a single
layer. You have the overhead of copying plus the wasted RAM for the
buffers that are not actually used for physical I/O. Java never has
more than one buffer per file and hence cannot simultaneously process
and do physical I/O, unless of course it uses Threads. Even with
Threads, you can’t pull off double buffering with any ease.
The term double buffering also refers to a technique of constructing
Images off screen then blasting them onscreen once they are complete,
as a way of creating smoother animation.
If you wrote 128K a byte at a time using a 64K buffer there would be
only two physical 64K I/Os. This would be slightly slower that using
unbuffered I/O to write the entire 128K in one I/O because of the
extra physical I/O, the RAM overhead for the buffer and the CPU
overhead of copying the data to the buffer.
When To Buffer
To process a file whole file a time, read the entire file in one giant
unbuffered I/O.
If a file is too large to process all in RAM, read it buffered, and
process it a chunk, line, field or char at a time.
To copy files or download streams use the FileTransfer class which
reads unbuffered a large chunk at a time.
If you need the readLine method, you must use buffering.