That's a straw man argument. Copying an already built database to
another copy of the database won't be significantly longer than copying
an already built file. In fact, it's the same operation.
I don't understand what you're trying to get at.
Each bit of data follows a particular path through the system. Each
bit of data has its own requirements for availability and consistency.
No, relational DBs don't have the same performance characteristic as
other data systems because they do different things.
If you have data that fits a particular style well, then I suggest
using that style to manage that data.
Let's say I had data that needs to hang around for a little while then
disappear into the archives. Let's say you hardly ever do random
access on this data because you always work with it serially or in
large batches. This is exactly like the recipient d
Another straw man. I'm sure you can come up with many contrived
examples to show one particular operation faster than another.
Benchmark writers (bad ones) do it all the time. I'm saying that in
normal, real world situations where you are collecting billions of data
points and need to actually use the data that a properly designed
database running on a good database engine will generally be better than
using flat files.
You're thinking in the general. Yes, RDBMs do wonderful things in the
general cases. However, in very specific circumstances, RDBMS do a
whole lot worse.
Think of the work involved in sharding an RDBMS instance. You need to
properly implement two-phase commit above and beyond the normal work
involved. I haven't run into a multi-master replication system that is
trivial. When you find one, let me know, because I'm sure there are
caveats and corner cases that make things really hard to get right.
Compare this to simply distributing flat files to one of many
machines. It's a whole lot easier to manage and easier to understand,
explain, and implement.
You should use the right tool for the job. Sometimes the data doesn't
fit the profile of an RDBMs, or the RDBMs overhead makes managing the
data more difficult than it needs to be. In those cases, it makes a
whole lot of sense to try something else out.
Not sure what a "shadow page operation" is but index operations are
only needed if you have to have fast access to read back the data. If
it doesn't matter how long it takes to read the data back then don't
index it. I have a hard time believing that anyone would want to save
billions of data points and not care how fast they can read selected
parts back or organize the data though.
I don't care how the recipients for the email campaign were indexed. I
don't need an index because I don't do random accesses. I simply need
the list of people I am going to send the email campaign to, properly
filtered and de-duped, of course. This doesn't have to happen within
the database. There are wonderful tools like "sort" and "uniq" to do
this work for me, far faster than an RDBMS can do it. In fact, I don't
think you can come up with a faster solution than "sort" and "uniq".
Not with the database engines that I use. Sure, speed and load are
connected whether you use databases or flat files but a proper database
will scale up quite well.
I know for a fact that "sort" and "uniq" are far faster than any
RDBMs. The reason why is obvious.