Threads reading a file at the same time

B

Boris Punk

I'm not sure about this one. The basic IDE hard drive hasn't got the
capability to read from two disk locations at the same time has it? Modern
SSD drives may have.

This lock is stating that multiple reads are ok, but just one write at a
time is ok:
http://download.oracle.com/javase/6/docs/api/java/util/concurrent/locks/ReadWriteLock.html

How can that be? Imagine Thread A calling seek on the disk, then Thread B
calling seek. Thread A then reads from B's location surely?
 
A

Arne Vajhøj

I'm not sure about this one. The basic IDE hard drive hasn't got the
capability to read from two disk locations at the same time has it? Modern
SSD drives may have.

This lock is stating that multiple reads are ok, but just one write at a
time is ok:
http://download.oracle.com/javase/6/docs/api/java/util/concurrent/locks/ReadWriteLock.html

How can that be? Imagine Thread A calling seek on the disk, then Thread B
calling seek. Thread A then reads from B's location surely?

The ReadWriteLock is not related to the actual disk IO - it only
coordinates between two or more threads.

Arne
 
B

Boris Punk

Arne Vajhøj said:
The ReadWriteLock is not related to the actual disk IO - it only
coordinates between two or more threads.

Arne
Fair enough - I was just read somewhere someone tried using it for a disk IO
 
K

Knute Johnson

Fair enough - I was just read somewhere someone tried using it for a disk IO

That is a common usage for RWL. The reading threads can be blocked when
a write is about to occur. The actual read/write to the disk is another
matter altogether. Caching and other things can affect that as well.
 
D

Daniel Pitts

That is a common usage for RWL. The reading threads can be blocked when
a write is about to occur. The actual read/write to the disk is another
matter altogether. Caching and other things can affect that as well.
Perhaps its a common usage for the general concept of read/write lock,
but in Java, the ReadWriteLock wouldn't be useful, since it doesn't
prevent external processes from accessing the file.

One place that the Java ReadWriteLock is useful for a Many-Access,
Few-Write data-structure (eg, configuration object which changes rarely,
but is used in many places).
 
A

Alan Gutierrez

Daniel said:
Perhaps its a common usage for the general concept of read/write lock,
but in Java, the ReadWriteLock wouldn't be useful, since it doesn't
prevent external processes from accessing the file.

One place that the Java ReadWriteLock is useful for a Many-Access,
Few-Write data-structure (eg, configuration object which changes rarely,
but is used in many places).

It is still useful for guarding a file within the process, though, if
the underlying file will only be read and written to by your one Java
process.
 
D

Daniel Pitts

It is still useful for guarding a file within the process, though, if
the underlying file will only be read and written to by your one Java
process.
That is true, but I wouldn't say it is a "common" usage for ReadWriteLock.
 
B

BGB / cr88192

Knute Johnson said:
That is a common usage for RWL. The reading threads can be blocked when a
write is about to occur. The actual read/write to the disk is another
matter altogether. Caching and other things can affect that as well.

a typical OS, such as Windows or Linux, will almost entirely abstract the
matter of (actual) disk IO from the app. so, when apps do IO, they are
usually doing it against structures within the OS, rather than anything near
the actual disk.

similarly, IO libraries (C's "stdio" system being an example) may in-turn
cache data, to reduce the overhead of system calls (doing a system call for
every little read or write operation could become expensive).

so, for example:
Java app does IO;
request goes to the JVM, which may in turn pass IO requests to C-level
libraries on the OS on which it is running (glibc or MSVCRT, or maybe Win32
API calls?...), which may do their own buffering;
these in turn perform system calls, to pass these requests to the kernel;
the kernel may in turn do some of its own buffering (a request may be read
from a buffer, written to a dirty buffer, or queued to be serviced later);
at its leisure, the kernel may send its requests to a device-driver, which
may in turn redirect said requests to the actual HW;
in turn, the drive may itself do some of its own buffering (similar to the
OS kernel), writing disk-blocks when it has a chance, or reading disk blocks
and notifying the OS that it has done so.

note that threads at one level, need not necessarily exist at another level;
the threads in Java code need not actually exist in the VM (a VM could
conceivably be single-threaded, and implement its own scheduler to simulate
multithreading in the running code);
the threads in the underlying app need not manifest in the actual OS (the
kernel could easily see only a single thread per core or similar, and from
its point of view, it simply passes control temporarily to the running app,
and when it gets control back, may in turn pass control to the next app, and
so on, and eventually get back around to the first app, ...);
the drive likely doesn't give a crap about threads, only that the commands
it recieves make sense.

in many systems, the OS has typically only provided a single process
(naturally single-threaded), and the existence of threading has in such
systems often been provided via library code (typically relying on some
signal from the OS, such as a timer, to facilitate context switching). also
common is the app having to perform some action (such as sleeping or
checking an event queue), which gives the OS or thread-library a chance to
switch threads (this having been done on Win 3.x, and in some MS-DOS
threading libraries).


note that threading doesn't require any specific HW support, and so could be
done on an 8088 or similar if so desired (although care is needed, as the
ROM BIOS and MS-DOS are not generally thread-safe...).

real HW-level multithreading didn't generally take place until the rise of
multi-core systems and similar, and before this, it was nearly all done in
software.


granted, a typical OS such as Windows, is AFAIK multithreaded internally,
but these threads are unrelated to those existing in the app. (and,
actually, the kernel typically lives within its own, partially disjoint,
address space, which need not even necessarily be in the same operating mode
or using the same word-width as the currently running app, such as Windows
x64 running a 32-bit app, or Win32 running a 16-bit app).

or such...
 
A

Arne Vajhøj

a typical OS, such as Windows or Linux, will almost entirely abstract the
matter of (actual) disk IO from the app. so, when apps do IO, they are
usually doing it against structures within the OS, rather than anything near
the actual disk.

similarly, IO libraries (C's "stdio" system being an example) may in-turn
cache data, to reduce the overhead of system calls (doing a system call for
every little read or write operation could become expensive).

so, for example:
Java app does IO;
request goes to the JVM, which may in turn pass IO requests to C-level
libraries on the OS on which it is running (glibc or MSVCRT, or maybe Win32
API calls?...), which may do their own buffering;
these in turn perform system calls, to pass these requests to the kernel;
the kernel may in turn do some of its own buffering (a request may be read
from a buffer, written to a dirty buffer, or queued to be serviced later);
at its leisure, the kernel may send its requests to a device-driver, which
may in turn redirect said requests to the actual HW;
in turn, the drive may itself do some of its own buffering (similar to the
OS kernel), writing disk-blocks when it has a chance, or reading disk blocks
and notifying the OS that it has done so.

note that threads at one level, need not necessarily exist at another level;
the threads in Java code need not actually exist in the VM (a VM could
conceivably be single-threaded, and implement its own scheduler to simulate
multithreading in the running code);
the threads in the underlying app need not manifest in the actual OS (the
kernel could easily see only a single thread per core or similar, and from
its point of view, it simply passes control temporarily to the running app,
and when it gets control back, may in turn pass control to the next app, and
so on, and eventually get back around to the first app, ...);
the drive likely doesn't give a crap about threads, only that the commands
it recieves make sense.

It is true that IO is usually split in many layers, but none of
them interacts with the ReadWriteLock class. That class is
for cooperative locking.

Arne
 
L

Lothar Kimmeringer

Boris said:
How can that be? Imagine Thread A calling seek on the disk, then Thread B
calling seek.

This scenario are quite common in multitasking operating
systems. If you open a file on OS-level you get a so called
file handle (a number more or less). Every subsequent operation
(like a seek) need this handle. That way you can distinguish
between the two reads even if they happen to be on the same
file.
Thread A then reads from B's location surely?

No. You don't expect that to happen if you receive data
via TCP/IP from the same host via two different TCP-
connections. They also go through the cable in a serial way.


Regards, Lothar
--
Lothar Kimmeringer E-Mail: (e-mail address removed)
PGP-encrypted mails preferred (Key-ID: 0x8BC3CD81)

Always remember: The answer is forty-two, there can only be wrong
questions!
 
O

Olaf Klischat

I'm not sure about this one. The basic IDE hard drive hasn't got the
capability to read from two disk locations at the same time has it? Modern
SSD drives may have.

This lock is stating that multiple reads are ok, but just one write at a
time is ok:
http://download.oracle.com/javase/6/docs/api/java/util/concurrent/locks/ReadWriteLock.html

How can that be? Imagine Thread A calling seek on the disk, then Thread B
calling seek. Thread A then reads from B's location surely?

As somebody else said, RWL is not directly related to file IO.

For accessing the same file from multiple threads, each thread would
normally create its own FileInputStream or FileOutputStream for that
file. The seek position is a property of such a stream, not of the file
itself, so each thread ends up having its own seek position and the
threads won't interfere (in fact, this will also allow you to have
multiple seek positions on the same file in just one thread -- just
create multiple streams as necessary).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top