I'm a newbie to RDF and have been facing a fundamental question as read
more about RDF.
It seems like you have a fairly good grasp of things.
RDF positions itself away from plain XML
representations of data saying XML suited for representing data with
containment hierarchies, and where "order" is important, whereas RDF
has a flatter structure, represents only references among different
entities.
Yes. Think of RDF as a data model first, and the RDF/XML serialisation
syntax very much second.
That sounds just like what a relational database is supposed
to do, and those are critieria when deciding whether to used an XML DB
or a relational DB to store your data.
This very much _isn't_ what a relational DB is about.
A relational database (as they're currently implemented, not however
Codd might have seen them) imposes two constraints on the data:
- It's stored in rectangular row/column tables, and there are
facilities for joining across these tables to make similar row/column
views. There may be nulls, repetition and lack of normalisation in
these views.
- There is a pre-existing data model that must be configured before
data can be stored.
Both of these are pretty much contrary to RDF's graph model of data.
Hasn't it all gone quiet about XML DBs since 5 years ago?
IMHO, XML DBs are a bad idea. They're just not very useful. The XML
data model is tree structured and has poor (albeit some, using ID and
IDREF) facilities for graphs outside this. XML also depends on having
the whole universe represented in a single document and falls down
badly on trying to refer to entities outside this document.
IMHE, XML DBs were all too often the last gasp of vendors with tired
old non-relational hierarchical databases to sell. It's very easy to
produce an XML database, it's just hard to make a useful one.
The "real world" doesn't often have data in it that conforms to XML's
constraints, and this is really why XML databases aren't ever going to
become useful. "Real world" data is either highly constrained (we're
all told how our tax records will be held, and this is the same for all
of us) or data isn't constrained. The first of these is a good fit for
the RDBMS model, which is why commercial operations buy so many RDBMS.
They're powerful, cheap and appropriate.
The second case is closer to RDF. Data floats around as loosely
connected entities that we can just about fit onto a graph model. It's
also "semi-structured" data, which means we can usually describes its
structure when we see it, but we can't predict its structure _before_
we've seen it. A "database" that can store such information must
necessarily allow a highly dynamic data model.
Where does RDF fit in, and how does it compare to relational databases.
A data model is not a query mechanism, nor an engine / process for
executing this. Therefore RDF is simply a different class of thing to
databases, let alone a relational database. You can compare an "RDF
database" to a relational database, but first you must imagine an RDF
database -- and this concrete product would itself be different from
the abstract notion of RDF as just a model.
For one thing, our "RDF database" must not only incorporate an RDF data
model and some execution processor, but it must also implement an RDF
query language so we can communicate with it. Which query language will
it use? There are a number of these, and the choice would itself
influence our RDF database -- so assuming that there's only one
possible "RDF database" would be like saying that all relational
databases would also use SQL.
I keep hearing that databases are not good for "semi-structured" data,
but am not yet able to understand how RDF addresses that. Mozilla for
example uses RDF for very structured (table of content) data.
Why should an ability to work with the difficult case of "semi
structured" data make a tool inappropriate for the easier case of
structured data ?
Secondly, "semi structured" data is often highly structured, it's just
that we don't know what this structure will be until run-time, when we
see the data (and not always then). it certainly doesn't mean that the
data "has no structure", it just means that the structure isn't
available to us so easily, or so early-on.
RDBMS can store some things, and they're powerful and efficient. The
things they can store represent most of the things we have wanted to
store so far.
RDFDBMS can store anything from our current IT worldview, but they're
complicated and problematic.
XMLDBMS store more sorts of thing than RDBMS, but not usefully so.
They're simpler than RDFDBMS, but they're still not as developed or
powerful as RDBBMS.
What would be points of comparison where RDF is better suited to store
and query my data?
Look into current work, like SPARQL (and other work done by its
contributors)