[I've just seen this thread. Although it might be a bit late, let me
state a couple of precisions]
Hi Maarten,
I respectfully disagree that HDF5 is not a DB. Its true that HDF5 on
its prima facie is not relational but rather hierarchical.
Yeah. This largely depends on what we understand by a DB. Lately,
RDBMs are used everywhere, and we tend to believe that they are the
only entities that can be truly called DBs. However, in a less
restrictive view, even a text file can be considered a DB (in fact,
many DBs have been implemented using text files as a base). So, I
wouldn't say that HDF5 is not a DB, but just that it is not a RDBM
Hierarchical is truely a much more natural/elegant[1] design from my
perspective. HDF has always had meta-data capabilities and with the
new 1.8beta version available, it is increasing its ability with
'references/links' allowing for pure/partial relational datasets,
groups, and files as well as storing self implemented indexing.
The C API is obviously much more low level, and Pytables does not yet
support these new features.
That's correct. And although it is well possible that we, PyTables
developers, would end incorporating some relational features to it, we
also recognize that PyTables does not intend (and was never in our
plans) to be a competitor of a pure RDBMS, but rather a *teammate* (as
it is clearly stated in the
www.pytables.org home page).
In our opinion, PyTables opens the door to a series of capabilities
that are not found in typical RDBMS, like hierarchical classification,
multidimensional datasets, powerful table entities that are able to
deal with multidimensional columns or nested records, but must
specially, the ability to work with extremely large amounts of data in
a very easy way, without having to renounce to first-class speed.
[1] Anything/everything that is physical/virtual, or can be conceived
is hierarchical... if the system itself is not random/chaotic. Thats a
lovely revelation I've had... EVERYTHING is hierarchical. If it has
context it has hierarchy.
While I agree that this sentence has a part of truth, it is also known
that a lot of things (perhaps much more than we think) in the universe
enter directly in the domain of random/chaotic
IMO, the wisest path should be recognizing the strengths (and
weaknesses) of each approach and use whatever fits better to your
needs. If you need the best of both then go ahead and choose a RDBMS
in combination with a hierarchical DB, and utilize the powerful
capabilities of Python to take the most out of them.
Cheers,
Francesc Altet