C
Chris Nethery
Hello everyone,
I have a challenging issue I need to overcome and was hoping I might gain
some insights from this group.
I am trying to speed up the process I am using, which is as follows:
1) I have roughly 700 files that are modified throughout the day by users,
within a separate application
2) As modifications are made to the files, I use a polling service and mimic
the lock-file strategy used by the separate software application
3) I generate a single 'load' file and bulk insert into a load table
4) I update/insert/delete from the load table
This is just too time consuming, in my opinion.
At present, users of the separate application can run recalculation
functions that modify all 700 files at once, causing my code to take the
whole ball of wax, rather than just the data that has changed.
What I would like to do is spawn separate processes and load only the delta
data. The data must be 100% reliable, so I'm leary of using something like
difflib. I also want to make sure that my code scales since the number of
files is ever-increasing.
I would be grateful for any feedback you could provide.
Thank you,
Chris Nethery
I have a challenging issue I need to overcome and was hoping I might gain
some insights from this group.
I am trying to speed up the process I am using, which is as follows:
1) I have roughly 700 files that are modified throughout the day by users,
within a separate application
2) As modifications are made to the files, I use a polling service and mimic
the lock-file strategy used by the separate software application
3) I generate a single 'load' file and bulk insert into a load table
4) I update/insert/delete from the load table
This is just too time consuming, in my opinion.
At present, users of the separate application can run recalculation
functions that modify all 700 files at once, causing my code to take the
whole ball of wax, rather than just the data that has changed.
What I would like to do is spawn separate processes and load only the delta
data. The data must be 100% reliable, so I'm leary of using something like
difflib. I also want to make sure that my code scales since the number of
files is ever-increasing.
I would be grateful for any feedback you could provide.
Thank you,
Chris Nethery