P
Paul Boddie
I have a program that is generated from a generic process. It's job is to
check to see whether records (replicated from another system) exist in a
local table, and if it doesn't, to add them. I have 1 of these programs for
every table in the database.
Some people have discussed whether you need one or many programs. In
practice, you're going to need to execute many "tasks" to process all
your tables, and potentially this could mean generating many SQL
statements, as I will describe below.
[...]
I am thinking that dynamically generating the programs to run might not be
such a good idea. It would be a shame to drop it because the system needs to
be generic and it runs from an XML file so the resulting code could be
pretty complex, and I am new to Python. The program did generate a pyc so it
was able to compile.
Thoughts anyone?
This problem sounds as if it could be solved adequately using only the
database system, although I accept that sometimes some logic or
additional processing could be done in Python. If I were to receive
updates from a system, presumably as files, I'd want to bulk copy them
into the database and then perform the necessary inserts using SQL. If
the raw updates were not directly compatible with the database,
perhaps because data representations might differ, then I would want
to process them before doing the bulk copy.
In other words, the workflow for a single table would look like this:
1. Obtain update files.
2. Process files in order to make the data compatible with the
database.
3. Create temporary tables for the updates in the database.
4. Bulk copy the files into the temporary tables (using COPY or LOAD
DATA commands).
5. Do the necessary inserts (where you have to either use the non-
standard
INSERT OR UPDATE or an INSERT involving a join between existing
and temporary tables).
6. Drop the temporary tables (if not done automatically).
Now, it's almost certain that the details of the above workflow would
vary from table to table and from update to update. Even in the case
where you don't have to process the files, you still need to copy the
files into the database and to work with a different table each time.
This is where generating "something" for each table is unavoidable,
and one solution might be to have a generic template and to substitute
the table name and update filename into that template as you process
each table and its associated data. Whether you actually generate an
SQL command file for each table, or whether you have a program issue
the commands directly, is a matter of preference.
Anyway, from what you've described, that's how I would approach this
problem. Certainly, it's a lot more straightforward than dealing with
object-relational abstractions in programs generated by other
programs.
Paul