Absolutely! Now we only need to convince Tom to disclose the usage
pattern...
This all goes back to my first large Java project, writing a library
to read and write FITS format files. FITS is the standard format for
astronomical data and it has always allowed very large arrays and
tables. However, a dozen or so years ago when I wrote the package,
gigabyte files were very uncommon so that few if any users were
inconvenienced by the circumlocutions that were needed to handle very
large files. Now they are getting much more common and the limits s
on array sizes are beginning to pinch in a number of places: there are
files with tables or arrays with dimensions of more than 2^31 and
internal FITS structures that my implementation uses which need large
byte arrays to implement.
The library supports both streaming and random access to the internals
of the data structures. Different users may use both or either.
Inputs can be files, pipes, URLs or compressed files. So I need
something that's pretty flexible -- I may or may not be able to do
random access on the input.
I very much appreciate the comments that everyone has made. While I'd
hoped that I'd be able to piggy back on someone else's work, I think
the discussion has given me some ideas for building my own... Right
now I'm thinking of something like the following using Andreas' ideas
from above. In a second iteration this should be easily adaptable to
supporting backing files of some kind (perhaps using the input if
that's feasible or using some external storage if not). Many of you
suggested various ways of doing this kind of file backup and I suspect
I'll need to use more than one. Then BigList might allow only a
certain number of sublists to be in memory at a given time. That
would make it possible to scale the code to much larger sizes than are
available in memory, not just exceed the 2G array limit which was my
original goal.
Thanks to all,
Tom McGlynn
---- Code sketch follows, not intended for compilation ---
class<T> BigList {
final long PAGE_SIZE = 16Meg;
private List<SubList<T>> subLists;
BigList(long size) {
int listCount = size/PAGE_SIZE + 1;
subLists = new ArrayList<SubList>(listCount);
for (i=0; i<listCount; i += 1) {
subLists.put(i, new NullList());
}
}
void put(long index, T val) {
subLists.get((int)(index/PAGE_SIZE)).put((int)(index
%PAGE_SIZE), val);
}
T get(long index) {
subLists.get((int)(index/PAGE_SIZE)).get((int)(index
%PAGE_SIZE));
}
Interface SubList<T> {
T get(int i);
void put(int i, T val);
}
class NullList<T> implements SubList {
int index;
NullList(int index) {
this.index = index;
}
T get(int i) {
throw IllegalStateException("Can't do get before
put");
}
void put(int i, T val) {
// Replace the null list with a real list.
RealList<T> list = new RealList(index);
list.put(i,val);
subLists.put(index, list);
}
}
class RealList<T> implments SubList {
int index;
T[] data;
RealList(int index) {
this.index = index;
data = new T[PAGE_SIZE];
... get the data for this page ...
}
T get(int i) {
return data
;
}
void put(int i, T val) {
data = val;
}
}
}
}