C
ccc31807
We have a csv source file of many thousands of records, with two
columns, the ID and a
status field. It has very recently come to my attention that
occasionally the status of a record will change, with the change being
significant enough that the record must be updated before the process
runs. The update files consist of a small subset, sometimes a very
small subset, of the records in the source file. (The update file has
a number of other fields that can change also, but I'm only concerned
with the status field.)
My first inclination is to open the update file, create a hash with
the ID as the key and the status as value, then open the source file,
read each line, update the line if it exists in the hash, and write
each line to a new output file. However, I can think of several
different ways to do this -- I just don't know which way would be
best. I don't particularly want to read every line and write every
line of a source file when only a few lines (if any) need to be
modified.
My second inclination would be to use a database and write an update
query for the records in the update file. But this seems a heavy
weight solution to a light weight problem -- I would only be using the
database to modify records, not to to any of the things we ordinarily
use databases for.
I've never had to do a small number of updates to a large file before,
and it seems too trivial a task to use a database for. Any suggestions
on a better way to do this?
Thanks, CC.
P.S. - The end product of this process is a data file with
approximately 20 fields, written as comma separated, double quote
delimited text, designed to be imported into Excel and Access by end
users in performance of their duties.
columns, the ID and a
status field. It has very recently come to my attention that
occasionally the status of a record will change, with the change being
significant enough that the record must be updated before the process
runs. The update files consist of a small subset, sometimes a very
small subset, of the records in the source file. (The update file has
a number of other fields that can change also, but I'm only concerned
with the status field.)
My first inclination is to open the update file, create a hash with
the ID as the key and the status as value, then open the source file,
read each line, update the line if it exists in the hash, and write
each line to a new output file. However, I can think of several
different ways to do this -- I just don't know which way would be
best. I don't particularly want to read every line and write every
line of a source file when only a few lines (if any) need to be
modified.
My second inclination would be to use a database and write an update
query for the records in the update file. But this seems a heavy
weight solution to a light weight problem -- I would only be using the
database to modify records, not to to any of the things we ordinarily
use databases for.
I've never had to do a small number of updates to a large file before,
and it seems too trivial a task to use a database for. Any suggestions
on a better way to do this?
Thanks, CC.
P.S. - The end product of this process is a data file with
approximately 20 fields, written as comma separated, double quote
delimited text, designed to be imported into Excel and Access by end
users in performance of their duties.