database drive code generated software architecture

T

timasmith

While I agree to the concept of having a stable set of interfaces
between the business layer and data layer I have found that generating
business objects which directly map to database tables works for me.

You see in the past I had a large application which we rewrote a large
amount of functionality on top of a legacy database. The database
schema was horrible and we pulled our hair out dealing with it. In
that instance we had business objects representing the domain we were
modelling and a large amount of code which mapped the database to the
objects. Implementing the business layer was relatively easy since the
objects modelled the domain so well.

However we spent an inordinate amount of time mapping objects to the
data layer and again a huge amount of time troubleshooting bugs
introduced by this methodology. The business objects and database were
so radically different there was far too much logic to make that work.
No tool could do it - it was hand written.

Now I have an opportunity to start from scratch a large application and
this time around I get to build a very well designed database with well
abstracted tables, carefully seperated reference and activity tables.

I wrote a code generator which creates
a) The data layer - SQL insert/update/delete, mapping from business
objects to prepared stmt
b) Business objects with collections, abstract table models, code to
id modified fields etc.
c) EJB beans etc...

While some might say a down side in that if the database schema changes
(it does) so the objects change affecting the business layer. This is
true but I actually like it because it **identifieds code affecting by
the change**

My main challenge now is to determine how to represent tables with
parent/child relationships - so far having the child as a collection
for the parent is working for the data layer - the only challenge is
the having the parent abstract tabel model the ability to add child
model columns as columns on the parent. I think I will be able to
implement that though.

So what do you all think of this - anyone done this for enterprise
applications - are there any other challenges I should prepare myself
for. Perhaps performance although I do use views where read
performance is needed and the odd update sql to do particular
functions. Given that I track which fields are changing on the models
I could generate the SQL on the fly to reduce the number of fields
updated.
 
J

Jon Martin Solaas

As long as changing the actual business logic is easier than re-mapping
the databasetables<->business data objects, I'd agree. But most of the
time re-mapping is trivial (sometimes still labour-intensive), and
business logic is not ...
 
F

Frans Bouma [C# MVP]

While I agree to the concept of having a stable set of interfaces
between the business layer and data layer I have found that generating
business objects which directly map to database tables works for me.
[...]

Now I have an opportunity to start from scratch a large application
and this time around I get to build a very well designed database
with well abstracted tables, carefully seperated reference and
activity tables.

I wrote a code generator which creates
a) The data layer - SQL insert/update/delete, mapping from business
objects to prepared stmt
b) Business objects with collections, abstract table models, code to
id modified fields etc.
c) EJB beans etc...

While some might say a down side in that if the database schema
changes (it does) so the objects change affecting the business layer.
This is true but I actually like it because it **identifieds code
affecting by the change**

As long as you reverse engineer the relational model to an internal
abstract model which is equal to an abstract model like what's used in
NIAM / ORM (http://www.orm.net , object role modelling), you're fine.
You see, the abstraction level on which your relational model is living
on is the same abstraction level from which you would start when
designing your domain entities (thus with inheritance etc.).
My main challenge now is to determine how to represent tables with
parent/child relationships - so far having the child as a collection
for the parent is working for the data layer - the only challenge is
the having the parent abstract tabel model the ability to add child
model columns as columns on the parent. I think I will be able to
implement that though.

There's no such thing as 'parent / child'. There is such a thing as a
relationship between entities. The parent/child concept quickly gives
problems semantically, when you have an order entity and you add it to
myCustomer.Orders, which makes myCustomer the 'parent'. however if I
add the same order entity to myEmployee.FiledOrders, myEmployee also is
the parent. confusing. So keep it at the level of 'relationships' and
with that, the different types of relationships: 1:1/1:n/m:1/m:n.
(where m:n can be seen as a view on 2 m:1 relationships from the
intermediate entity to 1 or 2 other entities)

What you can do is define the concept of 'field mapped onto a
relationship'. This then means that in the case of Customer - Order,
you have in Customer a field called 'Orders' and in Order a field
called 'Customer' which are respectively mapped onto Customer 1:n Order
and Order m:1 Customer.

You physically represent these fields with the construct related to
the relationship type (1:n -> collection, m:1 -> single object
reference) and your mapper core then should know how to deal with the
data, where to store it etc.

What you should drop is the pre-fab query generation. The thing is
that if I want to fetch:
SELECT C.*
FROM Customer C
WHERE CustomerID IN
(
SELECT CustomerID FROM Order WHERE
Month(OrderDate) = @month AND Year(OrderDate) = @year
)

I have a problem
So what do you all think of this - anyone done this for enterprise
applications - are there any other challenges I should prepare myself
for. Perhaps performance although I do use views where read
performance is needed and the odd update sql to do particular
functions. Given that I track which fields are changing on the models
I could generate the SQL on the fly to reduce the number of fields
updated.

I'd do that indeed. It can greatly enhance the update speeds in some
areas.

FB

--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------
 
H

H. S. Lahman

Responding to Timasmith...
While I agree to the concept of having a stable set of interfaces
between the business layer and data layer I have found that generating
business objects which directly map to database tables works for me.

That will be true for CRUD/USER processing but it will tend to break
down for more complex applications...

The application needs to solve a particular problem in the most
efficient manner possible. That often requires abstractions and
relationships that are tailored to the problem in hand rather than the
database. For example, one may have to minimize searches or at least
the scope of searches.

The application needs to manage behavior as well as data and ensure
proper synchronization between them. To do that one often needs to
abstract the problem space differently than one would just to represent
the problem space's knowledge. For example, one may need to provide
subclassing just to accommodate different behaviors for tuples in a
single DB table.

The application is likely to be abstracted at a different level of
abstraction than the database. For example, the database is likely to
abstract a telephone number as a simple domain but a telemarketing
program may need to understand area and country codes.

The application is likely to abstract invariants from the problem space
that are not relevant to a database. That may require different
abstractions but it is more likely to require additional configuration
data that would not normally be in the database. For example,
depreciation can be computed by a formula so the only data needed is the
base asset value. But one can eliminate the formula computation by
providing tables of fractions of the base asset value. Those fractions
would be defined once for each combination of formula and time periods
and need to be stored somewhere. It may or may not be appropriate to
put them in an enterprise database.

The notions of identity and delegation are often quite different between
an application view and a database view. For example, an Account table
with a Balance attribute would be represented in a <reusable> GUI
subsystem as instances of a Window class and related Control classes.
The only Account/Balance semantics that would appear in the GUI view
would be text string values for titles and labels. The Window and
Control classes can be instantiated for any number of different tables
while the Account table has been split up into multiple classes.

Bottom line: once one is outside the realm of USER/CRUD processing, one
wants to solve the problem in hand first and then worry about how the
problem solution talks to the persistence mechanisms.
You see in the past I had a large application which we rewrote a large
amount of functionality on top of a legacy database. The database
schema was horrible and we pulled our hair out dealing with it. In
that instance we had business objects representing the domain we were
modelling and a large amount of code which mapped the database to the
objects. Implementing the business layer was relatively easy since the
objects modelled the domain so well.

However we spent an inordinate amount of time mapping objects to the
data layer and again a huge amount of time troubleshooting bugs
introduced by this methodology. The business objects and database were
so radically different there was far too much logic to make that work.
No tool could do it - it was hand written.

How much of that was because of a poor schema in the legacy system? How
much was due to poor encapsulation of the database mechanisms and
failure to capture RDM invariants? Generally it is not very difficult
to map data requests into a well-formed RDB. That's what interfaces
like SQL exist to do.

If the persistence access is properly encapsulated in a subsystem
dedicated to talking to persistence, one can provide a subsystem
interface that is tailored to the problem solution's needs ("Save this
pile of data I call 'X'" and "Give me back the pile of data I saved as
'X'"). Then it is not difficult to recast those requests in terms of
queries; one just needs a mapping between 'X' and tables/tuples.

In addition, if one abstracts the subsystem to capture the invariants of
the RDM, one has quite generic objects like Table, Tuple, Attribute
(rather than Account, Customer, etc.). Those essentially become holders
of identity strings that one plugs into SQL strings. That allows them
to be initialized at startup from external configuration data (possible
directly from the RDB schema). So one only needs a mapping of identity
between the interface messages' data packets and the generic instances.

If one captures RDM invariants in the persistence access subsystem, then
the subsystem becomes reusable across applications and databases. All
one needs is a local interface implementation (think: Facade pattern) to
talk to it and local configuration data for the database view.
Now I have an opportunity to start from scratch a large application and
this time around I get to build a very well designed database with well
abstracted tables, carefully seperated reference and activity tables.

I wrote a code generator which creates
a) The data layer - SQL insert/update/delete, mapping from business
objects to prepared stmt
b) Business objects with collections, abstract table models, code to
id modified fields etc.
c) EJB beans etc...

While some might say a down side in that if the database schema changes
(it does) so the objects change affecting the business layer. This is
true but I actually like it because it **identifieds code affecting by
the change**

But what if the problem solution's basic data needs are unchanged? By
tying business abstractions to the database schema you are /forcing/ the
solution to be touched when just the database changes. That is
generally not a Good Thing because it presents an opportunity to break
the existing problem solution. The problem solution should be
indifferent to whether the data is stored in an RDB, and OODB, flat
files, or on clay tablets.

If the application is properly partitioned, then the problem solution
talks to a generic interface that reflects its needs for data. So long
as the requirements for the problem solution do not change, that
interface does not change -- regardless of what happens to the database
schema. Any changes to the database schema are isolated to the
persistence access subsystem whose job if to convert the solution view
in the interface to the DB view.

If the solution requirements change, then there may be different data
needs and that would be reflected in changes to the persistence
subsystem interface. Then one would have modify the subsystem and/or
the database. But that is exactly the way it should be because the
database exists to serve the needs of the application, not the other way
around.
My main challenge now is to determine how to represent tables with
parent/child relationships - so far having the child as a collection
for the parent is working for the data layer - the only challenge is
the having the parent abstract tabel model the ability to add child
model columns as columns on the parent. I think I will be able to
implement that though.

This is one of the problems of trying to map 1:1. Database tables in a
subclassing relationship can and are instantiated separately but in the
OO context a single object instance instantiates the entire tree. So
you always need some sort of conversion. It is better to isolate such
conversions in a systematic manner. The use of decoupling interfaces
will tend to make that easier on the solution side.

[Customer]
+ name
+ address
A
|
+------+--------+
| |
[Individual] [Corporate]
+ limit + division

When the solution side needs to instantiate a corporate customer it asks
the persistence access for one by name and gets back {address,
division}. The solution then routinely instantiates a single instance
of [Corporate]. All the drudge work of forming a join query across two
tables is handled by the persistence subsystem.
So what do you all think of this - anyone done this for enterprise
applications - are there any other challenges I should prepare myself
for. Perhaps performance although I do use views where read
performance is needed and the odd update sql to do particular
functions. Given that I track which fields are changing on the models
I could generate the SQL on the fly to reduce the number of fields
updated.

Generating SQL on the fly is a fine idea, but only in a context where
SQL is relevant and one can abstract the RDM problem space properly.
One of the goals of encapsulating persistence access is that one can
create SQL on the fly -- but in a generic and reusable fashion.

Of course there is no free lunch. The price of isolating persistence
access is that one must encode and decode messages on each side of the
interface. So there is a trade-off between later maintenance effort and
current performance. For CRUD/USER applications that usually isn't
worthwhile because the application doesn't process the data in any
significant fashion; it just passes it in a pipeline between the DB and
the UI and performance is limited by DB access. Hence the popularity of
RAD IDEs that hide the grunt work.

But for larger, more complex applications one typically accesses the
data multiple times in various ways. Then the isolation of
encode/decode to one-time read/write of the data is worthwhile when
compared to decoupling the complex solution from persistence issues.


*************
There is nothing wrong with me that could
not be cured by a capful of Drano.

H. S. Lahman
(e-mail address removed)
Pathfinder Solutions -- Put MDA to Work
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
Pathfinder is hiring:
http://www.pathfindermda.com/about_us/careers_pos3.php.
(888)OOA-PATH
 
F

frebe73

The application needs to solve a particular problem in the most
efficient manner possible. That often requires abstractions and
relationships that are tailored to the problem in hand rather than the
database.

Why couldn't the database schema be tailored to the problem in hand?
For example, one may have to minimize searches or at least
the scope of searches.
Why?

The application needs to manage behavior as well as data and ensure
proper synchronization between them.

Doesn't a RDB has behavior?
The application is likely to be abstracted at a different level of
abstraction than the database. For example, the database is likely to
abstract a telephone number as a simple domain but a telemarketing
program may need to understand area and country codes.

What stops you from having two different columns or use substring?
The notions of identity and delegation are often quite different between
an application view and a database view. For example, an Account table
with a Balance attribute would be represented in a <reusable> GUI
subsystem as instances of a Window class and related Control classes.
The only Account/Balance semantics that would appear in the GUI view
would be text string values for titles and labels. The Window and
Control classes can be instantiated for any number of different tables
while the Account table has been split up into multiple classes.

Are you talking about some kind generic user interface here? Normally a
GUI use to be tailored to the business problem (or use case). If the
database schema is tailored to the business problem, what is the
problem with a hard-wired mapping between the schema and the GUI?
one
wants to solve the problem in hand first and then worry about how the
problem solution talks to the persistence mechanisms.

Do you claim that a RDB is only about persistence? Have you been
reading comp.object the last weeks?
If the persistence access is properly encapsulated in a subsystem
dedicated to talking to persistence, one can provide a subsystem
interface that is tailored to the problem solution's needs ("Save this
pile of data I call 'X'" and "Give me back the pile of data I saved as
'X'"). Then it is not difficult to recast those requests in terms of
queries; one just needs a mapping between 'X' and tables/tuples.

To do this, you obviously not need a RDB. Queries is overkill in this
scenario, just use a simple directory service or low-level file index
system.
In addition, if one abstracts the subsystem to capture the invariants of
the RDM, one has quite generic objects like Table, Tuple, Attribute
(rather than Account, Customer, etc.). Those essentially become holders
of identity strings that one plugs into SQL strings. That allows them
to be initialized at startup from external configuration data (possible
directly from the RDB schema). So one only needs a mapping of identity
between the interface messages' data packets and the generic instances.

Do you have any web links to detailed examples of this?
If one captures RDM invariants in the persistence access subsystem, then
the subsystem becomes reusable across applications and databases.

Before you said that the interface to the "persistence access
subsystem" should be "tailored to the specific problem at hand" for the
current application. How can such interface be reusable accross
different applications?

I assume that you already know that if you want your application
reusable accross different SQL databases, there are much more efficient
methods? The only reason for encapsulation is if you want your
application to be reusable accross different database paradigms
(relational, hierachial, network, etc).
By
tying business abstractions to the database schema you are /forcing/ the
solution to be touched when just the database changes.

How can the database schema change if the business abstractions is not
changed?
The problem solution should be
indifferent to whether the data is stored in an RDB, and OODB, flat
files, or on clay tablets.

Why? It is a high cost associated to this. If you have an
implementation using flat files, why would you also make an
implementation using a RDB?
If the application is properly partitioned, then the problem solution
talks to a generic interface that reflects its needs for data.

This is a good description of SQL.
So long
as the requirements for the problem solution do not change, that
interface does not change -- regardless of what happens to the database
schema.

The database schema doesn't have to change unless the requirements for
the problem solution change.
If the solution requirements change, then there may be different data
needs and that would be reflected in changes to the persistence
subsystem interface. Then one would have modify the subsystem and/or
the database. But that is exactly the way it should be because the
database exists to serve the needs of the application, not the other way
around.
Indeed.

This is one of the problems of trying to map 1:1. Database tables in a
subclassing relationship can and are instantiated separately but in the
OO context a single object instance instantiates the entire tree. So
you always need some sort of conversion. It is better to isolate such
conversions in a systematic manner. The use of decoupling interfaces
will tend to make that easier on the solution side.

[Customer]
+ name
+ address
A
|
+------+--------+
| |
[Individual] [Corporate]
+ limit + division

The corresponding tables would look like.
customer(customerid, name, address)
individual_customer(customerid, limit)
corporate_customer(customerid, division)
When the solution side needs to instantiate a corporate customer it asks
the persistence access for one by name and gets back {address,
division}. The solution then routinely instantiates a single instance
of [Corporate]. All the drudge work of forming a join query across two
tables is handled by the persistence subsystem.

select address, division
from corporate_customer cc
join customer c on cc.customerid=c.customerid
where name=?

How can it possibly be simpler?
Of course there is no free lunch.
Indeed.

The price of isolating persistence
access is that one must encode and decode messages on each side of the
interface. So there is a trade-off between later maintenance effort and
current performance.

Did someone prove that later maintenance would be easier using a
persistence subsystem?
For CRUD/USER applications that usually isn't
worthwhile because the application doesn't process the data in any
significant fashion; it just passes it in a pipeline between the DB and
the UI and performance is limited by DB access.

But isn't it a lot of applications that are 70% CRUD/USER and 30% more
complex features? How do we know what approach to use when starting
designing an application?

In you opinion, can we use a framework as described in this thread, or
can we only use RAD IDEs? Do you have some pointers to existing RAD IDE
products (except from MS Access).
But for larger, more complex applications one typically accesses the
data multiple times in various ways.

Different SELECT statements?

Fredrik Bertilsson
http://moonbird.sourceforge.net
 
T

timasmith

There's no such thing as 'parent / child'. There is such a thing as a
relationship between entities. The parent/child concept quickly gives
problems semantically, when you have an order entity and you add it to
myCustomer.Orders, which makes myCustomer the 'parent'. however if I
add the same order entity to myEmployee.FiledOrders, myEmployee also is
the parent. confusing. So keep it at the level of 'relationships' and
with that, the different types of relationships: 1:1/1:n/m:1/m:n.
(where m:n can be seen as a view on 2 m:1 relationships from the
intermediate entity to 1 or 2 other entities)

Just to clarify when I refer to parent child table relationships I do
not mean customer/order - rather a relationship which is mapped to
objects which contain other objects.

So perhaps the Order has details stored in an ORDER_DETAILS table.
Calling out the relationship allows me to have a class

public class Order {
private OrderDetailCollection orderDetails;
...
}

In the data layer I can optionally always populate my order if order
details exist. Also when it comes to updating an order I can also
cycle through the details and update those at the same time.

Assuming in the domain this model exists, this further simplifies
synchronizing the objects with the database using generated code.

In my case the OrderCollection implements an AbstractTableModel which
eases displaying orders in a multi column list. I would like to
specify the order details which are flattened into the list as well. I
think I can do it assuming there is key specified that identifies which
detail is used in the list.
 
T

timasmith

Nice reply and you have prompted food for thought. How about this:

Currently I have an object - say an OrderDataModel with fields matching
database columns. However I do extend OrderDataModel with OrderModel
and all business logic takes place on OrderModel. Any business logic
contained by the OrderModel is there - not on the DataModel which is
recreated with each data model update. Adding fields will not really
affect anything - it is only if I break up a table that impact is felt.

However if my real business logic, the algorithms such as depreciation
which has little to do with persistence as in completly separate
objects which can be tested on their own and have no direct ties to the
data model - surely that solves the majority of the issues you have
described?



H. S. Lahman said:
Responding to Timasmith...
While I agree to the concept of having a stable set of interfaces
between the business layer and data layer I have found that generating
business objects which directly map to database tables works for me.

That will be true for CRUD/USER processing but it will tend to break
down for more complex applications...

The application needs to solve a particular problem in the most
efficient manner possible. That often requires abstractions and
relationships that are tailored to the problem in hand rather than the
database. For example, one may have to minimize searches or at least
the scope of searches.

The application needs to manage behavior as well as data and ensure
proper synchronization between them. To do that one often needs to
abstract the problem space differently than one would just to represent
the problem space's knowledge. For example, one may need to provide
subclassing just to accommodate different behaviors for tuples in a
single DB table.

The application is likely to be abstracted at a different level of
abstraction than the database. For example, the database is likely to
abstract a telephone number as a simple domain but a telemarketing
program may need to understand area and country codes.

The application is likely to abstract invariants from the problem space
that are not relevant to a database. That may require different
abstractions but it is more likely to require additional configuration
data that would not normally be in the database. For example,
depreciation can be computed by a formula so the only data needed is the
base asset value. But one can eliminate the formula computation by
providing tables of fractions of the base asset value. Those fractions
would be defined once for each combination of formula and time periods
and need to be stored somewhere. It may or may not be appropriate to
put them in an enterprise database.

The notions of identity and delegation are often quite different between
an application view and a database view. For example, an Account table
with a Balance attribute would be represented in a <reusable> GUI
subsystem as instances of a Window class and related Control classes.
The only Account/Balance semantics that would appear in the GUI view
would be text string values for titles and labels. The Window and
Control classes can be instantiated for any number of different tables
while the Account table has been split up into multiple classes.

Bottom line: once one is outside the realm of USER/CRUD processing, one
wants to solve the problem in hand first and then worry about how the
problem solution talks to the persistence mechanisms.
You see in the past I had a large application which we rewrote a large
amount of functionality on top of a legacy database. The database
schema was horrible and we pulled our hair out dealing with it. In
that instance we had business objects representing the domain we were
modelling and a large amount of code which mapped the database to the
objects. Implementing the business layer was relatively easy since the
objects modelled the domain so well.

However we spent an inordinate amount of time mapping objects to the
data layer and again a huge amount of time troubleshooting bugs
introduced by this methodology. The business objects and database were
so radically different there was far too much logic to make that work.
No tool could do it - it was hand written.

How much of that was because of a poor schema in the legacy system? How
much was due to poor encapsulation of the database mechanisms and
failure to capture RDM invariants? Generally it is not very difficult
to map data requests into a well-formed RDB. That's what interfaces
like SQL exist to do.

If the persistence access is properly encapsulated in a subsystem
dedicated to talking to persistence, one can provide a subsystem
interface that is tailored to the problem solution's needs ("Save this
pile of data I call 'X'" and "Give me back the pile of data I saved as
'X'"). Then it is not difficult to recast those requests in terms of
queries; one just needs a mapping between 'X' and tables/tuples.

In addition, if one abstracts the subsystem to capture the invariants of
the RDM, one has quite generic objects like Table, Tuple, Attribute
(rather than Account, Customer, etc.). Those essentially become holders
of identity strings that one plugs into SQL strings. That allows them
to be initialized at startup from external configuration data (possible
directly from the RDB schema). So one only needs a mapping of identity
between the interface messages' data packets and the generic instances.

If one captures RDM invariants in the persistence access subsystem, then
the subsystem becomes reusable across applications and databases. All
one needs is a local interface implementation (think: Facade pattern) to
talk to it and local configuration data for the database view.
Now I have an opportunity to start from scratch a large application and
this time around I get to build a very well designed database with well
abstracted tables, carefully seperated reference and activity tables.

I wrote a code generator which creates
a) The data layer - SQL insert/update/delete, mapping from business
objects to prepared stmt
b) Business objects with collections, abstract table models, code to
id modified fields etc.
c) EJB beans etc...

While some might say a down side in that if the database schema changes
(it does) so the objects change affecting the business layer. This is
true but I actually like it because it **identifieds code affecting by
the change**

But what if the problem solution's basic data needs are unchanged? By
tying business abstractions to the database schema you are /forcing/ the
solution to be touched when just the database changes. That is
generally not a Good Thing because it presents an opportunity to break
the existing problem solution. The problem solution should be
indifferent to whether the data is stored in an RDB, and OODB, flat
files, or on clay tablets.

If the application is properly partitioned, then the problem solution
talks to a generic interface that reflects its needs for data. So long
as the requirements for the problem solution do not change, that
interface does not change -- regardless of what happens to the database
schema. Any changes to the database schema are isolated to the
persistence access subsystem whose job if to convert the solution view
in the interface to the DB view.

If the solution requirements change, then there may be different data
needs and that would be reflected in changes to the persistence
subsystem interface. Then one would have modify the subsystem and/or
the database. But that is exactly the way it should be because the
database exists to serve the needs of the application, not the other way
around.
My main challenge now is to determine how to represent tables with
parent/child relationships - so far having the child as a collection
for the parent is working for the data layer - the only challenge is
the having the parent abstract tabel model the ability to add child
model columns as columns on the parent. I think I will be able to
implement that though.

This is one of the problems of trying to map 1:1. Database tables in a
subclassing relationship can and are instantiated separately but in the
OO context a single object instance instantiates the entire tree. So
you always need some sort of conversion. It is better to isolate such
conversions in a systematic manner. The use of decoupling interfaces
will tend to make that easier on the solution side.

[Customer]
+ name
+ address
A
|
+------+--------+
| |
[Individual] [Corporate]
+ limit + division

When the solution side needs to instantiate a corporate customer it asks
the persistence access for one by name and gets back {address,
division}. The solution then routinely instantiates a single instance
of [Corporate]. All the drudge work of forming a join query across two
tables is handled by the persistence subsystem.
So what do you all think of this - anyone done this for enterprise
applications - are there any other challenges I should prepare myself
for. Perhaps performance although I do use views where read
performance is needed and the odd update sql to do particular
functions. Given that I track which fields are changing on the models
I could generate the SQL on the fly to reduce the number of fields
updated.

Generating SQL on the fly is a fine idea, but only in a context where
SQL is relevant and one can abstract the RDM problem space properly.
One of the goals of encapsulating persistence access is that one can
create SQL on the fly -- but in a generic and reusable fashion.

Of course there is no free lunch. The price of isolating persistence
access is that one must encode and decode messages on each side of the
interface. So there is a trade-off between later maintenance effort and
current performance. For CRUD/USER applications that usually isn't
worthwhile because the application doesn't process the data in any
significant fashion; it just passes it in a pipeline between the DB and
the UI and performance is limited by DB access. Hence the popularity of
RAD IDEs that hide the grunt work.

But for larger, more complex applications one typically accesses the
data multiple times in various ways. Then the isolation of
encode/decode to one-time read/write of the data is worthwhile when
compared to decoupling the complex solution from persistence issues.


*************
There is nothing wrong with me that could
not be cured by a capful of Drano.

H. S. Lahman
(e-mail address removed)
Pathfinder Solutions -- Put MDA to Work
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
Pathfinder is hiring:
http://www.pathfindermda.com/about_us/careers_pos3.php.
(888)OOA-PATH
 
H

H. S. Lahman

Responding to Frebe73...

We have been around on these issues before. There has been far too much
of this stuff on comp.object recently so I am not going to feed the DBMS
trolls by responding.

*************
There is nothing wrong with me that could
not be cured by a capful of Drano.

H. S. Lahman
(e-mail address removed)
Pathfinder Solutions -- Put MDA to Work
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
Pathfinder is hiring:
http://www.pathfindermda.com/about_us/careers_pos3.php.
(888)OOA-PATH
 
H

H. S. Lahman

Responding to Timasmith...
Currently I have an object - say an OrderDataModel with fields matching
database columns. However I do extend OrderDataModel with OrderModel
and all business logic takes place on OrderModel. Any business logic
contained by the OrderModel is there - not on the DataModel which is
recreated with each data model update. Adding fields will not really
affect anything - it is only if I break up a table that impact is felt.

Why do you need OrderDataModel at all? There is nothing to prevent
OrderModel from having both knowledge (presumably initialized from RDB
fields) and behavior (to solve the problem in hand).

My concern is with how the OrderModel gets instantiated. To do that one
needs data. For a new instance it may come from the UI. But more
commonly it will come from the DB. It is the mechanics of getting that
data that I am concerned about. For example, you might have a code
fragment that looked like:

DBAccess.getOrderInfo (orderNo, &customer, &itemCount, ...);
myOrderModel = new OrderModel (orderNo, customer, itemCount, ...);

where DBAccess is a GoF Facade pattern class that acts as interface to
the persistence access subsystem.

At this point it doesn't matter if OrderModel in the application maps
1:1 to an OrderModel table in the RDB or not. All the boilerplate of
query formation, dataset extraction, etc. has been delegated away
through the DBAccess interface. You don't even care if the data is in an
RDB, much less whether the legacy schema is well-formed.

At this level of solution abstraction you are only interested in what is
critical to solving this part of the problem in hand. Specifically: (A)
get some needed data from the data store and (B) instantiate an
OrderModel. The code fragment is very well focused on those two things
without any distracting detail about legacy schemas and their access
mechanisms.
However if my real business logic, the algorithms such as depreciation
which has little to do with persistence as in completly separate
objects which can be tested on their own and have no direct ties to the
data model - surely that solves the majority of the issues you have
described?

There are some practical reasons why testing will be easier with my
solution. That is, building a test harness infrastructure for just
DBAccess Facade will be easier than providing such infrastructure for a
bunch of OrderDataModel objects (or their creators).

However, the big reason is related to maintenance when the schema
changes. Where are those changes? In my solution there are all in the
DB access subsystem. The interface to that subsystem does not change
and the solution logic itself is not touched in any way. Therefore,
once testing can demonstrate that the DBAccess subsystem gets the same
data from the new DB for the interface (which you can demonstrate with a
regression test of the subsystem), you can be absolutely sure that the
solution business logic still works correctly.

However, if you have to make changes to the OrderDataModel objects (or
their creators) to reflect the new DB schema, you cannot have that
certainty. That's because you are mucking with object implementations
/in/ the solution subsystem but you are testing it at the system or
subsystem level. No matter how isolated they are supposed to be from
the business objects like DataModel, you cannot guarantee that those
changes don't break the solution logic. If you were very careful (and
somebody looked over the maintainer's shoulder for any interim
maintenance) the probability of breaking the business logic will be low.
But it will still be non-zero. [I can give you some marvelous
examples of what-could-possibly-go-wrong? situations that did. B-)]

Why would the DB schema change without requirements changes to the
business logic? This is actually fairly common for enterprise data
where requirements changes for some client applications require schema
changes but those requirements aren't relevant to other applications
using the same data. However, in your case I would guess that is quite
likely.

You indicated the legacy DB is ill-formed. If so, I would expect there
to be mounting pressure to fix it or replace it with a well-formed DB.
No matter how you solve your current problem, an ill-formed DB is going
to lead to problems later. Fixing those in a kludged solution is going
to require disproportionate effort. The more problems that show up and
the more resources it takes to fix them, the more pressure there will be
to replace it.

<aside>
Note that there is an added bonus for my solution here. Once you have
the interface in place to talk to the database it becomes much easier to
replace the database. That's because (A) all the changes are in one
place and (B) the changes are isolated from the business logic. In
fact, what I have suggested is part of a technique for piecemeal legacy
replacement in general. That is, the subsystems don't have to be
DBAccess; they can be any part of the legacy code. Basically the steps are:

(1) Design the new system's ideal partitioning into subsystems. This is
where you want to be with the final replacement application.

(2) Select a subsystem, design it, implement it, and test it.

(3) Insert the interface for the new subsystem into the legacy code. It
accesses the legacy functionality. This simply isolates the legacy code
that will be replaced by the new subsystem.

(4) Regression test the legacy code. This ensures that the new
interface works properly.

(5) Excise the legacy code whose requirements are addressed by the new
subsystem.

(6) Rebuild the application with the new subsystem. This should be
trivial because the legacy code already accesses the new interface from (3).

(7) Test the whole application. This should mostly Just Work at this point.

The third step is where all the project risk resides and it will be the
toughest one because it will inevitably require legacy surgery.
However, it is pretty well focused because one really isn't changing
anything; the legacy code is still solving the problem and one is just
isolating it. That is pretty much the problem I think you are facing
with the legacy database. You need to find a way to isolate the legacy
database from the business logic. But once you do that properly,
replacing the database becomes relatively simple.
</aside>

--

*************
There is nothing wrong with me that could
not be cured by a capful of Drano.

H. S. Lahman
(e-mail address removed)
Pathfinder Solutions -- Put MDA to Work
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
Pathfinder is hiring:
http://www.pathfindermda.com/about_us/careers_pos3.php.
(888)OOA-PATH
 
T

timasmith

Hmm, I might be having an epiphany... so case in your point:

I had an FormControl object which extended FormDataControl which is
regenerated from the datamodel and populated by generated database
code.

On the fields inherited by FormControl I had Application, View and Type
- represented by the database table. A module or two used FormControl
to dynamically generate controls on a form.

Later on I moved Application, View and Type into its own table and
called ApplicationView - then I added ApplicationViewId to FormControl
table as a foreign key. All good normalizing for a greater purpose -
but I broke (and quickly fixed) the control generation.

Your point is that if I had a FormControl object with fields
appropriate to its function, I could change the mapping from one to two
tables and retest the subsystem minimizing the impact.

As a complex system grows and grows being able to retest a subsystem
only provides greater value. Certainly I couldnt forsee the need to
normalize the table - it grew as I added fields due to unforseen
requirements.

I guess I fear resorting to hand crafting mapping code but point well
taken about that it is often easier than rewriting (or even just
testing) business logic.

Now moving on to a suggestion that meets both sides...

So my OrderDataModel does not actually know anything about persistence
a OrderDataAdapter (also auto generated) has the generate select,
insert and update and all the mapping from db result sets to the data
model. So in that respect I am covered - I could change the database
quite easily (actually I support several databases).

The reason I use OrderModel for one to isolate the business logic from
the data fields for ease of readability and from a practical standpoint
I recreate the source file OrderDataModel.java.

In my problem above with normalizing a table into two *IF* I had
changed my code generator to continue to recreate FormDataControl but
from a query which combined the two tables - then it is win-win - my
FormControl does not change and I can continue to regenerate the source
files.

Perhaps it only works with simple normalization.

I guess my other realization is that on my client side code I probably
have far too many dependencies on my OrderModel, FormControl etc. I
should replace as many as possible with Interfaces within the local
package - increasing the cohesiveness of a package and reducing
external dependencies.

In some ways that also meets your goal - sure I can recreate the
datamodels as long as they continue to implement the appropriate
interfaces then there is no need to retest those packages.

Tim

H. S. Lahman said:
Responding to Timasmith...
Currently I have an object - say an OrderDataModel with fields matching
database columns. However I do extend OrderDataModel with OrderModel
and all business logic takes place on OrderModel. Any business logic
contained by the OrderModel is there - not on the DataModel which is
recreated with each data model update. Adding fields will not really
affect anything - it is only if I break up a table that impact is felt.

Why do you need OrderDataModel at all? There is nothing to prevent
OrderModel from having both knowledge (presumably initialized from RDB
fields) and behavior (to solve the problem in hand).

My concern is with how the OrderModel gets instantiated. To do that one
needs data. For a new instance it may come from the UI. But more
commonly it will come from the DB. It is the mechanics of getting that
data that I am concerned about. For example, you might have a code
fragment that looked like:

DBAccess.getOrderInfo (orderNo, &customer, &itemCount, ...);
myOrderModel = new OrderModel (orderNo, customer, itemCount, ...);

where DBAccess is a GoF Facade pattern class that acts as interface to
the persistence access subsystem.

At this point it doesn't matter if OrderModel in the application maps
1:1 to an OrderModel table in the RDB or not. All the boilerplate of
query formation, dataset extraction, etc. has been delegated away
through the DBAccess interface. You don't even care if the data is in an
RDB, much less whether the legacy schema is well-formed.

At this level of solution abstraction you are only interested in what is
critical to solving this part of the problem in hand. Specifically: (A)
get some needed data from the data store and (B) instantiate an
OrderModel. The code fragment is very well focused on those two things
without any distracting detail about legacy schemas and their access
mechanisms.
However if my real business logic, the algorithms such as depreciation
which has little to do with persistence as in completly separate
objects which can be tested on their own and have no direct ties to the
data model - surely that solves the majority of the issues you have
described?

There are some practical reasons why testing will be easier with my
solution. That is, building a test harness infrastructure for just
DBAccess Facade will be easier than providing such infrastructure for a
bunch of OrderDataModel objects (or their creators).

However, the big reason is related to maintenance when the schema
changes. Where are those changes? In my solution there are all in the
DB access subsystem. The interface to that subsystem does not change
and the solution logic itself is not touched in any way. Therefore,
once testing can demonstrate that the DBAccess subsystem gets the same
data from the new DB for the interface (which you can demonstrate with a
regression test of the subsystem), you can be absolutely sure that the
solution business logic still works correctly.

However, if you have to make changes to the OrderDataModel objects (or
their creators) to reflect the new DB schema, you cannot have that
certainty. That's because you are mucking with object implementations
/in/ the solution subsystem but you are testing it at the system or
subsystem level. No matter how isolated they are supposed to be from
the business objects like DataModel, you cannot guarantee that those
changes don't break the solution logic. If you were very careful (and
somebody looked over the maintainer's shoulder for any interim
maintenance) the probability of breaking the business logic will be low.
But it will still be non-zero. [I can give you some marvelous
examples of what-could-possibly-go-wrong? situations that did. B-)]

Why would the DB schema change without requirements changes to the
business logic? This is actually fairly common for enterprise data
where requirements changes for some client applications require schema
changes but those requirements aren't relevant to other applications
using the same data. However, in your case I would guess that is quite
likely.

You indicated the legacy DB is ill-formed. If so, I would expect there
to be mounting pressure to fix it or replace it with a well-formed DB.
No matter how you solve your current problem, an ill-formed DB is going
to lead to problems later. Fixing those in a kludged solution is going
to require disproportionate effort. The more problems that show up and
the more resources it takes to fix them, the more pressure there will be
to replace it.

<aside>
Note that there is an added bonus for my solution here. Once you have
the interface in place to talk to the database it becomes much easier to
replace the database. That's because (A) all the changes are in one
place and (B) the changes are isolated from the business logic. In
fact, what I have suggested is part of a technique for piecemeal legacy
replacement in general. That is, the subsystems don't have to be
DBAccess; they can be any part of the legacy code. Basically the steps are:

(1) Design the new system's ideal partitioning into subsystems. This is
where you want to be with the final replacement application.

(2) Select a subsystem, design it, implement it, and test it.

(3) Insert the interface for the new subsystem into the legacy code. It
accesses the legacy functionality. This simply isolates the legacy code
that will be replaced by the new subsystem.

(4) Regression test the legacy code. This ensures that the new
interface works properly.

(5) Excise the legacy code whose requirements are addressed by the new
subsystem.

(6) Rebuild the application with the new subsystem. This should be
trivial because the legacy code already accesses the new interface from (3).

(7) Test the whole application. This should mostly Just Work at this point.

The third step is where all the project risk resides and it will be the
toughest one because it will inevitably require legacy surgery.
However, it is pretty well focused because one really isn't changing
anything; the legacy code is still solving the problem and one is just
isolating it. That is pretty much the problem I think you are facing
with the legacy database. You need to find a way to isolate the legacy
database from the business logic. But once you do that properly,
replacing the database becomes relatively simple.
</aside>

--

*************
There is nothing wrong with me that could
not be cured by a capful of Drano.

H. S. Lahman
(e-mail address removed)
Pathfinder Solutions -- Put MDA to Work
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
Pathfinder is hiring:
http://www.pathfindermda.com/about_us/careers_pos3.php.
(888)OOA-PATH
 
F

frebe73

We have been around on these issues before. There has been far too much
of this stuff on comp.object recently so I am not going to feed the DBMS
trolls by responding.

Yes, it is ofcourse much easier making a lot of claims without having
to support them. You prefer writing about how to use a RDBMS without
having to debate with people that actually knows about how to use a
RDBMS.

Fredrik Bertilsson
http://moonbird.sourceforge.net
 
H

H. S. Lahman

Responding to Timasmith...
I had an FormControl object which extended FormDataControl which is
regenerated from the datamodel and populated by generated database
code.

On the fields inherited by FormControl I had Application, View and Type
- represented by the database table. A module or two used FormControl
to dynamically generate controls on a form.

Later on I moved Application, View and Type into its own table and
called ApplicationView - then I added ApplicationViewId to FormControl
table as a foreign key. All good normalizing for a greater purpose -
but I broke (and quickly fixed) the control generation.

Your point is that if I had a FormControl object with fields
appropriate to its function, I could change the mapping from one to two
tables and retest the subsystem minimizing the impact.

Exactly. Even better, you should not even need to retest the solution
logic if it is encapsulated in its own subsystem. Since it isn't
touched all you need to demonstrate is that the DB access subsystem
interface still provides the same data values. [Prudence would suggest
a system regression test was in order just in case the analysis that the
interface was didn't need to change was flawed. B-)]
As a complex system grows and grows being able to retest a subsystem
only provides greater value. Certainly I couldnt forsee the need to
normalize the table - it grew as I added fields due to unforseen
requirements.

Quite so. Aside from the DB access issues application partitioning into
subsystems encapsulated by pure message-based data transfer interfaces
are a very good idea in general. By separating concerns and
encapsulating them one can perform fairly intense functional testing of
the individual subsystems. In effect one can do large scale unit tests.
Better yet, it can be done in complete isolation from any other parts
of the application because the interfaces are simple. (If I didn't
mention it before, my blog has a category on Application Partitioning.)
I guess I fear resorting to hand crafting mapping code but point well
taken about that it is often easier than rewriting (or even just
testing) business logic.

It's really not that difficult. You are going to have to hand craft SQL
queries and whatnot anyway. The encode/decode of message data packets
is somewhat tedious but it is pretty mundane and pretty much the same
everywhere so one is unlikely to screw up other than typos.

How much effort you put into making the subsystem reusable is another
question. At one extreme for can just hard-code the SQL queries in the
DB access subsystem just like they would have been sprinkled in the
solution code. You just isolate them around Facade interface method
implementations. [It is not uncommon for the DB access "subsystem" to
consist solely of the Facade interface class for simple data stores.
The "subsystem" exists in the Facade class' implementation. Your
compromise solution below is getting very close to that already.]

At the other extreme you provide generic identity mapping through
external configuration data so that you can reuse the subsystem for
other applications. That's not so difficult to code but it requires
more design effort to get it right and there will be more infrastructure
code. You can also get exotic with read/write caching and whatnot.
Whether that is worthwhile is a basic development trade-off between
today and tomorrow.
Now moving on to a suggestion that meets both sides...

So my OrderDataModel does not actually know anything about persistence
a OrderDataAdapter (also auto generated) has the generate select,
insert and update and all the mapping from db result sets to the data
model. So in that respect I am covered - I could change the database
quite easily (actually I support several databases).

If you are not encapsulating these in a persistence access subsystem you
still have a test problem. The objects actually doing the DB access may
be logically separate from the solution classes _in theory_, but you
can't be sure. That's because the testing scope still includes both
solution objects and DB access objects. So when you modify the DB
access objects you still need to test the solution logic thoroughly
because you can't be sure you didn't break the solution logic despite
all the good intentions. That problem just gets worse as someone
performs additional maintenance subsequently (e.g., a maintainer
modifying solution functionality may put it in one of your pristine DB
access objects).

If you put the DB access in a separate subsystem, then you don't have
that problem because the solution logic is no longer within the
necessary test scope. All you need to demonstrate is that the DB access
subsystem provides the same data through the interface as it provided
prior to changing its guts. [The caveat about prudence above still
applies, though.]

Another benefit is that the scope of maintenance is better isolated.
That doesn't sound like a big deal but a lot of problems stemming from
maintenance can be traced to making changes in the wrong place. When
the schema changes there is no doubt where the change needs to go if you
have a DB access subsystem (i.e., no debate about putting it it
OrderDataModel or OrderDataAdapter). In addition, there is no
distracting code in the DB access subsystem because all it does is DB
access so it is easier to focus on getting the changes right.

<Hot Button>
Testing can get one to 5-Sigma reliability. But to achieve 6-Sigma and
beyond requires militant defect prevention. One needs to eliminate
/opportunities/ for inserting defects. Isolating DB access (or any
other significantly complex functionality) in a subsystem tends to
foster better focus for both design and maintenance. That translates
into less likelihood of getting things wrong because the context is
simpler. IOW, it is harder to break code that isn't visible in the
scope of change than it is to break code that is visible.
The reason I use OrderModel for one to isolate the business logic from
the data fields for ease of readability and from a practical standpoint
I recreate the source file OrderDataModel.java.

In my problem above with normalizing a table into two *IF* I had
changed my code generator to continue to recreate FormDataControl but
from a query which combined the two tables - then it is win-win - my
FormControl does not change and I can continue to regenerate the source
files.

If I understand your concern here, I would argue that encapsulation in a
subsystem would provide even better readability and maintainability.
The separation of concerns is better because it eliminates distractions
when dealing with either the business logic or the DB access logic.
That is, /all/ the objects are business objects or /all/ the objects are
DB objects, depending on which subsystem one is in.

I think maintainability would improve because of better isolation and
decoupling. The decoupling through subsystem interfaces is much
stronger because the objects in each subsystem do not even know that the
objects in the other subsystem even exist.

As far as the files are concerned, the code has to go somewhere so you
are going to have .java files lying around everywhere. However, a
separate subsystem also offers some advantages for deployment, source
control, and configuration management -- such as separately deliverable
DLLs.
Perhaps it only works with simple normalization.

The subsystem approach works for any changes on the DB side. You can
switch to clay tables using CODASYL if you want. Which segues to
another advantage. Some changes are more extensive than others.
Suppose you decide that to enhance DB performance you need to provide
read-ahead caching of large joins. How difficult is that going to be to
do with an object like OrderDataAdapter? You are probably going to need
some major surgery and a few more objects -- all right in the middle of
your business logic. My point is that once one has encapsulated in a
subsystem, one has open-ended enhancement capability independent of the
business logic.


*************
There is nothing wrong with me that could
not be cured by a capful of Drano.

H. S. Lahman
(e-mail address removed)
Pathfinder Solutions -- Put MDA to Work
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
Pathfinder is hiring:
http://www.pathfindermda.com/about_us/careers_pos3.php.
(888)OOA-PATH
 
M

Michael Gaab

While I agree to the concept of having a stable set of interfaces
between the business layer and data layer I have found that generating
business objects which directly map to database tables works for me.

When I originally replied to this post I disagreed with you, but since
then I have reconsidered. For some applications, mapping your
business (or otherwise) objects directly from the db tables is probably
the most pragmatic course to take. Coupling exists between the data in
the database and the UI regardless of how layered your design is.

Mike
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,228
Members
46,818
Latest member
SapanaCarpetStudio

Latest Threads

Top