I’ve done this. Not millions, but certainly on the order of tens of
thousands.
It's a scaling issue. Millions of INSERT or UPDATE requests can
take hours. That's when you need the speedup of bulk loading.
May not always be flexible enough.
True; you can't do anything with LOAD DATA INFILE but load data.
If you need selects or joins within inserts, you may have to do it
the long way.
Why should that be? The database cannot read a text file any faster than I
can.
Because the indices are updated in one big update, rather than
after each change to the database.
Also note that there are some issues with doing a huge volume of
updates in one MySQL InnoDB transaction. The system has to keep the
data needed to undo the updates, and there's a limit on the amount of
pending transaction history that can be stored.
It's common to load data into a new, empty table, then, once
the reload has succeeded, do a RENAME like CURRENT->OLD, NEW->CURRENT.
Rename of multiple databases is atomic and interlocked with other
operations, so you can replace an entire table on a live server.
I have some bulk databases which are updated from external
sources. The PhishTank database is updated with UPDATE statements
every three hours. But the Open Directory database is updated by
downloading a big gzipped file of XML, creating a new database
table, then renaming. That load takes hours, once a week.
(All this applies to MySQL, and to some extent, Postgres.
If you're using SQLite, it's different. But a million records
is big for SQLite, which is, after all, a "lite" database.
At 10,000 records, you don't have to worry about any of this
stuff. At 1,000,000 records, you do.)
John Nagle