Data Model:

A

Anthony

I'm struggling on whether or not to implement GroupItem (below) with
two separate models, or with one model that has a distinguishing key:

Given:
class ParentGroup:
a group of values represented by class GroupItem

class ChildGroup:
a group of values represented by class GroupItem
foreign-key to ParentGroup (many Children sum to one Parent)

Option A:
class GroupItem:
foreign-key to ParentGroup
foreign-key to ChildGroup
GroupItemType in (ParentItem, ChildItem)
value
value-type

Option B:
class ParentGroupItem
foreign-key to ParentGroup
value
value-type

class ChildGroupItem
foreign-key to ChildGroup
value
value-type

What are my considerations when making this decision?

Thanks!
 
A

Aaron Watters

I'm struggling on whether or not to implement GroupItem (below) with
two separate models, or with one model that has a distinguishing key:

Given:
class ParentGroup:
    a group of values represented by class GroupItem

class ChildGroup:
    a group of values represented by class GroupItem
    foreign-key to ParentGroup (many Children sum to one Parent)

Option A:
class GroupItem:
    foreign-key to ParentGroup
    foreign-key to ChildGroup
    GroupItemType in (ParentItem, ChildItem)
    value
    value-type

Option B:
class ParentGroupItem
    foreign-key to ParentGroup
    value
    value-type

class ChildGroupItem
    foreign-key to ChildGroup
    value
    value-type

What are my considerations when making this decision?

Thanks!

It looks to me that the two designs
might be useful for different
purposes. What are you trying to do?

-- Aaron Watters

====
whiff.sourceforge.net
http://aaron.oirt.rutgers.edu/myapp/root/misc/erdTest
 
A

Anthony

It looks to me that the two designs
might be useful for different
purposes.  What are you trying to do?

  -- Aaron Watters

====
whiff.sourceforge.nethttp://aaron.oirt.rutgers.edu/myapp/root/misc/erdTest

The group values represent statistics that I'm tracking, based on
activity groups. Some samples:

Group: Johnson, Total Units Produced = 10, Total Units Consumed = 5
Chris Johnson, Units Produced = 6, Units Consumed = 3
Jim Johnson, Units Produced = 4, Units Consumed = 2

Group: Smith, Total Units Produced = 15, Total Units Consumed = 8
Mark Smith, Units Produced = 7, Units Consumed = 5
Bob Smith, Units Produced = 8, Units Consumed = 3

The groups will be responsible for entering their own statistics, so I
will have to do some validation at data entry. The ability to create
new statistic types (e.g. Units Broken) for new groups in the future
is important.

What would be the advantages of using option A versus option B?

Thanks for the quick response.
 
A

Aaron Brady

I'm struggling on whether or not to implement GroupItem (below) with
two separate models, or with one model that has a distinguishing key:

Given:
class ParentGroup:
    a group of values represented by class GroupItem

class ChildGroup:
    a group of values represented by class GroupItem
    foreign-key to ParentGroup (many Children sum to one Parent)

Option A:
class GroupItem:
    foreign-key to ParentGroup
    foreign-key to ChildGroup
    GroupItemType in (ParentItem, ChildItem)
    value
    value-type

Option B:
class ParentGroupItem
    foreign-key to ParentGroup
    value
    value-type

class ChildGroupItem
    foreign-key to ChildGroup
    value
    value-type

What are my considerations when making this decision?

Thanks!

You want a ChildItem to have membership in two collections:
ParentGroup and ChildGroup. You also want a ParentItem to have
membership in one collection. For example:

parentA: itemPA1, itemPA2, childA, childB
childA: itemCA1, itemCA2
childB: itemCB1, itemCB2

Or, listing by child,

itemPA1: parentA
itemPA2: parentA
itemCA1: childA
itemCA2: childA
itemCB1: childB
itemCB2: childB
childA: parentA
childB: parentA

Correct so far?
 
A

Anthony

You want a ChildItem to have membership in two collections:
ParentGroup and ChildGroup.  You also want a ParentItem to have
membership in one collection.  For example:

parentA: itemPA1, itemPA2, childA, childB
childA: itemCA1, itemCA2
childB: itemCB1, itemCB2

Or, listing by child,

itemPA1: parentA
itemPA2: parentA
itemCA1: childA
itemCA2: childA
itemCB1: childB
itemCB2: childB
childA: parentA
childB: parentA

Correct so far?

Thanks for the insightful response.

Yes, everything you say is correct, with one clarification: The
ChildItem can be a member of ParentGroup OR ChildGroup, but never both
at the same time.
 
A

Aaron Brady

Thanks for the insightful response.

Yes, everything you say is correct, with one clarification:  The
ChildItem can be a member of ParentGroup OR ChildGroup, but never both
at the same time.

I see. You described a collection class. Its members are items or
other collections. They are never nested more than two levels deep.

However, in your example, you implied a collection class whose
attributes are aggregates of its members'. For simplicity, you can
use methods to compute the aggregate attributes.

class Group:
def calculate_total_produced( self ):
total= sum( x.total_produced for x in self.members )

If you want to cache them for performance, the children will have to
notify the parent when one of their attributes changes, which is at
least a little more complicated. The class in the simpler structure
could even derive from 'set' or other built-in collection if you
want. Are you interested in the more complicated faster technique?
 
A

Anthony

I see.  You described a collection class.  Its members are items or
other collections.  They are never nested more than two levels deep.

However, in your example, you implied a collection class whose
attributes are aggregates of its members'.  For simplicity, you can
use methods to compute the aggregate attributes.

class Group:
  def calculate_total_produced( self ):
    total= sum( x.total_produced for x in self.members )

If you want to cache them for performance, the children will have to
notify the parent when one of their attributes changes, which is at
least a little more complicated.  The class in the simpler structure
could even derive from 'set' or other built-in collection if you
want.  Are you interested in the more complicated faster technique?

Yes, in my example, the top level collection class is implicitly the
aggregate of the lower level class. However, data entry will take
place at the top level, not necessarily at the lower level. This
means that the lower level values will never drive the top level
value. Instead, the aggregate of the lower levels will be validated
against the top level. If there is a discrepancy, then the remainder
will be applied to an additional "Unregistered" instance of the lower
level.

e.g.

Group: Johnson - Total Units Produced 25; Units Consumed 18;
Chris Johnson - Units Produced 18; Units Consumed 10;
Jim Johnson - Units Produced 3; Units Consumed 5;

The group totals are the basis for any validations. In this example,
another entry will be created to account for the discrepancy:

Unregistered - Units Produced 4; Units Consumed 3

As far as child notification of the parent, I plan to only allow data
entry on a form that includes both parent and child level values.
Validation of top level to child level aggregates can happen at this
time. This should remove the need for notification, right?

Am I looking at 6 of one and half dozen of the other between options A
and B at this point? I'm currently leaning towards option B. Is
there anything I will be losing performance-wise by not choosing
option A?

Thanks again for conversing with me on this.
 
A

Aaron Brady

Yes, in my example, the top level collection class is implicitly the
aggregate of the lower level class.  However, data entry will take
place at the top level, not necessarily at the lower level.  This
means that the lower level values will never drive the top level
value.  Instead, the aggregate of the lower levels will be validated
against the top level.  If there is a discrepancy, then the remainder
will be applied to an additional "Unregistered" instance of the lower
level.

e.g.

Group: Johnson - Total Units Produced 25;  Units Consumed 18;
  Chris Johnson - Units Produced 18; Units Consumed 10;
  Jim Johnson   - Units Produced 3;  Units Consumed 5;

The group totals are the basis for any validations.  In this example,
another entry will be created to account for the discrepancy:

  Unregistered -  Units Produced 4;  Units Consumed 3

As far as child notification of the parent, I plan to only allow data
entry on a form that includes both parent and child level values.
Validation of top level to child level aggregates can happen at this
time.  This should remove the need for notification, right?

Am I looking at 6 of one and half dozen of the other between options A
and B at this point?  I'm currently leaning towards option B.  Is
there anything I will be losing performance-wise by not choosing
option A?

Thanks again for conversing with me on this.

It sounds like you want your total to also be degenerate instance of
the Item class. It has a name and two numeric attributes. I once
learned that it "meant the right thing" to merely derive the Total
class from the Item class, and raise an exception when and if non-
aggregate values are attempted to access.

There is a little wasted space on A, which you probably needn't worry
about. In C, you can create a 'union' type, that holds the parent
foreign-key -or- the child foreign-key, and use the "GroupItemType in"
flag to signal which, but not both. The space required is whichever
is larger. </trivia> In Python, you can just use one object, and
treat it differently depending on the 'itemtype in' flag.

From what I understand so far, it's more consistent with object-
oriented ideals to write a separate class for separate behavior.

If I was writing a relation structure, i.e. a database in a relational
database, I would probably create a separate table for the totals,
since they don't have the same or equivalent (or isomorphic) semantics
to the items. Does that help?

On the other hand, you could just have a separate Unregistered class
for unregistered entries, or just leave the name blank or set to None
on a normal item instance.

It doesn't sound like you need to modify both totals and individuals
at the same time, but you do need to modify both at different times.
Some incoming data is 'total' data, and some is individual.
As far as child notification of the parent, I plan to only allow data
entry on a form that includes both parent and child level values.
Validation of top level to child level aggregates can happen at this
time. This should remove the need for notification, right?

Yes. If you won't be changing child members, or will be doing so from
a uniform subset of code, then they won't need to notify their
parents. You can do that from the input section.

On child entry:
create child
add child
update parent

This is more risky:
On multiple child entries:
for each entry:
create child
add child
update parent

If it occurs more than once, you might consider creating a function to
guarantee they all accomplish the same, as usual.
I'm currently leaning towards option B. Is
there anything I will be losing performance-wise by not choosing
option A?

If you have to recalculate the totals every time you need them, that
will be slower than just accessing a data field, but the alternative
will take more space. It is another 'time-space' trade-off.
 
P

Peter Otten

Anthony said:
whiff.sourceforge.nethttp://aaron.oirt.rutgers.edu/myapp/root/misc/erdTest

The group values represent statistics that I'm tracking, based on
activity groups. Some samples:

Group: Johnson, Total Units Produced = 10, Total Units Consumed = 5
Chris Johnson, Units Produced = 6, Units Consumed = 3
Jim Johnson, Units Produced = 4, Units Consumed = 2

Group: Smith, Total Units Produced = 15, Total Units Consumed = 8
Mark Smith, Units Produced = 7, Units Consumed = 5
Bob Smith, Units Produced = 8, Units Consumed = 3

The groups will be responsible for entering their own statistics, so I
will have to do some validation at data entry. The ability to create
new statistic types (e.g. Units Broken) for new groups in the future
is important.

What would be the advantages of using option A versus option B?

I may be missing something, but your example looks more like option C:

class Group:
name

class Member:
group # foreign key to Group
name

class Item:
member # foreign key to Member
type
value

You can calculate the totals for members or groups on the fly; the classical
tool would be a relational database.

Peter
 
A

Anthony

I may be missing something, but your example looks more like option C:

class Group:
    name

class Member:
    group # foreign key to Group
    name

class Item:
    member # foreign key to Member
    type
    value

You can calculate the totals for members or groups on the fly; the classical
tool would be a relational database.

Peter

You're absolutely right. This is also probably why I'm struggling
with my two options, with neither one of them "feeling right."

What got me going down this road is the fact that users will almost
always only have the total level values to work with. The detail
entries would be almost optional, from their point of view. This
means that if I were to set it up as option C, then the default data
entry use case would be to set up an "Unregistered" instance of the
child.

Now that you've highlighted that important point though, it seems to
me like the proper course to follow. It implies a little bit of
redesign on my part, but better now than further down the road. I
just need to make sure now that there are no exception cases for top
level calculation:

i.e.

Most of the time:

Total Units Consumed = sum of (all Child Units Consumed)


I need to develop a contingency plan for when the top level is not a
straight sum of the child levels.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,293
Messages
2,571,501
Members
48,189
Latest member
StaciLgf76

Latest Threads

Top