Responding to Timasmith...
While I agree to the concept of having a stable set of interfaces
between the business layer and data layer I have found that generating
business objects which directly map to database tables works for me.
That will be true for CRUD/USER processing but it will tend to break
down for more complex applications...
The application needs to solve a particular problem in the most
efficient manner possible. That often requires abstractions and
relationships that are tailored to the problem in hand rather than the
database. For example, one may have to minimize searches or at least
the scope of searches.
The application needs to manage behavior as well as data and ensure
proper synchronization between them. To do that one often needs to
abstract the problem space differently than one would just to represent
the problem space's knowledge. For example, one may need to provide
subclassing just to accommodate different behaviors for tuples in a
single DB table.
The application is likely to be abstracted at a different level of
abstraction than the database. For example, the database is likely to
abstract a telephone number as a simple domain but a telemarketing
program may need to understand area and country codes.
The application is likely to abstract invariants from the problem space
that are not relevant to a database. That may require different
abstractions but it is more likely to require additional configuration
data that would not normally be in the database. For example,
depreciation can be computed by a formula so the only data needed is the
base asset value. But one can eliminate the formula computation by
providing tables of fractions of the base asset value. Those fractions
would be defined once for each combination of formula and time periods
and need to be stored somewhere. It may or may not be appropriate to
put them in an enterprise database.
The notions of identity and delegation are often quite different between
an application view and a database view. For example, an Account table
with a Balance attribute would be represented in a <reusable> GUI
subsystem as instances of a Window class and related Control classes.
The only Account/Balance semantics that would appear in the GUI view
would be text string values for titles and labels. The Window and
Control classes can be instantiated for any number of different tables
while the Account table has been split up into multiple classes.
Bottom line: once one is outside the realm of USER/CRUD processing, one
wants to solve the problem in hand first and then worry about how the
problem solution talks to the persistence mechanisms.
You see in the past I had a large application which we rewrote a large
amount of functionality on top of a legacy database. The database
schema was horrible and we pulled our hair out dealing with it. In
that instance we had business objects representing the domain we were
modelling and a large amount of code which mapped the database to the
objects. Implementing the business layer was relatively easy since the
objects modelled the domain so well.
However we spent an inordinate amount of time mapping objects to the
data layer and again a huge amount of time troubleshooting bugs
introduced by this methodology. The business objects and database were
so radically different there was far too much logic to make that work.
No tool could do it - it was hand written.
How much of that was because of a poor schema in the legacy system? How
much was due to poor encapsulation of the database mechanisms and
failure to capture RDM invariants? Generally it is not very difficult
to map data requests into a well-formed RDB. That's what interfaces
like SQL exist to do.
If the persistence access is properly encapsulated in a subsystem
dedicated to talking to persistence, one can provide a subsystem
interface that is tailored to the problem solution's needs ("Save this
pile of data I call 'X'" and "Give me back the pile of data I saved as
'X'"). Then it is not difficult to recast those requests in terms of
queries; one just needs a mapping between 'X' and tables/tuples.
In addition, if one abstracts the subsystem to capture the invariants of
the RDM, one has quite generic objects like Table, Tuple, Attribute
(rather than Account, Customer, etc.). Those essentially become holders
of identity strings that one plugs into SQL strings. That allows them
to be initialized at startup from external configuration data (possible
directly from the RDB schema). So one only needs a mapping of identity
between the interface messages' data packets and the generic instances.
If one captures RDM invariants in the persistence access subsystem, then
the subsystem becomes reusable across applications and databases. All
one needs is a local interface implementation (think: Facade pattern) to
talk to it and local configuration data for the database view.
Now I have an opportunity to start from scratch a large application and
this time around I get to build a very well designed database with well
abstracted tables, carefully seperated reference and activity tables.
I wrote a code generator which creates
a) The data layer - SQL insert/update/delete, mapping from business
objects to prepared stmt
b) Business objects with collections, abstract table models, code to
id modified fields etc.
c) EJB beans etc...
While some might say a down side in that if the database schema changes
(it does) so the objects change affecting the business layer. This is
true but I actually like it because it **identifieds code affecting by
the change**
But what if the problem solution's basic data needs are unchanged? By
tying business abstractions to the database schema you are /forcing/ the
solution to be touched when just the database changes. That is
generally not a Good Thing because it presents an opportunity to break
the existing problem solution. The problem solution should be
indifferent to whether the data is stored in an RDB, and OODB, flat
files, or on clay tablets.
If the application is properly partitioned, then the problem solution
talks to a generic interface that reflects its needs for data. So long
as the requirements for the problem solution do not change, that
interface does not change -- regardless of what happens to the database
schema. Any changes to the database schema are isolated to the
persistence access subsystem whose job if to convert the solution view
in the interface to the DB view.
If the solution requirements change, then there may be different data
needs and that would be reflected in changes to the persistence
subsystem interface. Then one would have modify the subsystem and/or
the database. But that is exactly the way it should be because the
database exists to serve the needs of the application, not the other way
around.
My main challenge now is to determine how to represent tables with
parent/child relationships - so far having the child as a collection
for the parent is working for the data layer - the only challenge is
the having the parent abstract tabel model the ability to add child
model columns as columns on the parent. I think I will be able to
implement that though.
This is one of the problems of trying to map 1:1. Database tables in a
subclassing relationship can and are instantiated separately but in the
OO context a single object instance instantiates the entire tree. So
you always need some sort of conversion. It is better to isolate such
conversions in a systematic manner. The use of decoupling interfaces
will tend to make that easier on the solution side.
[Customer]
+ name
+ address
A
|
+------+--------+
| |
[Individual] [Corporate]
+ limit + division
When the solution side needs to instantiate a corporate customer it asks
the persistence access for one by name and gets back {address,
division}. The solution then routinely instantiates a single instance
of [Corporate]. All the drudge work of forming a join query across two
tables is handled by the persistence subsystem.
So what do you all think of this - anyone done this for enterprise
applications - are there any other challenges I should prepare myself
for. Perhaps performance although I do use views where read
performance is needed and the odd update sql to do particular
functions. Given that I track which fields are changing on the models
I could generate the SQL on the fly to reduce the number of fields
updated.
Generating SQL on the fly is a fine idea, but only in a context where
SQL is relevant and one can abstract the RDM problem space properly.
One of the goals of encapsulating persistence access is that one can
create SQL on the fly -- but in a generic and reusable fashion.
Of course there is no free lunch. The price of isolating persistence
access is that one must encode and decode messages on each side of the
interface. So there is a trade-off between later maintenance effort and
current performance. For CRUD/USER applications that usually isn't
worthwhile because the application doesn't process the data in any
significant fashion; it just passes it in a pipeline between the DB and
the UI and performance is limited by DB access. Hence the popularity of
RAD IDEs that hide the grunt work.
But for larger, more complex applications one typically accesses the
data multiple times in various ways. Then the isolation of
encode/decode to one-time read/write of the data is worthwhile when
compared to decoupling the complex solution from persistence issues.
*************
There is nothing wrong with me that could
not be cured by a capful of Drano.
H. S. Lahman
(e-mail address removed)
Pathfinder Solutions -- Put MDA to Work
http://www.pathfindermda.com
blog:
http://pathfinderpeople.blogs.com/hslahman
Pathfinder is hiring:
http://www.pathfindermda.com/about_us/careers_pos3.php.
(888)OOA-PATH