Design question: polymorphism after object creation

M

Marcel Müller

Hi,

I am seeking for a neat solution for the following problem:

There are objects of different types with a common base class, let's say
folders and files (although that does not completely hit the nail on the
head). These objects have a common primary key, the full qualified path.
But the polymorphic type of the objects emerges only *after* the object
has been referenced. See below.

The implementation currently uses a static instance repository of all
objects. The objects are reference counted and only instantiated by a
global factory function. Two objects with the same key must not coexists
in memory.

// int_ptr<T> is mostly the same than boost::intrusive_ptr<T>

class Base;

class Repository
{ ...
public:
// Singleton!
static Repository Instance;

// Seek whether an object is already loaded.
// May return NULL
int_ptr<Base> FindByURL(const string& url);
// FACTORY! Get a new or an existing instance of this URL.
// Does not return NULL
int_ptr<Base> GetByURL(const string& url);
};

When instantiating the objects by their path (e.g. while enumerating a
folders content) the exact type of the object is not yet known. Only if
the application requests certain information about the object, the type
has to be specialized. Unfortunately this is not always a cheap
operation. In fact it cannot be done synchronously at the time the
factory is called.

class Base
{public:
const string URL;

enum InfoFlags // Different kind of information
{ IF_None = 0,
IF_Metadata = 1,
IF_Content = 2,
IF_Type = 4, // The polymorphic type of the object
...
};

// Retrieve information asynchronously if the requested information is
// not yet available. The function returns the flags of the requested
// information that is immediately available.
// For all the missing bits you should wait for the matching
// InfoChange event to get the requested information.
// If no information is yet available, the function returns IF_None.
// Of course, you have to register the event handler before the call
// to EnsureInfoAsync. If EnsureInfoAsync returned all requested bits
// the InfoChange event is not raised as result of this call.
virtual InfoFlags EnsureInfoAsync(InfoFlags what);

// Observable pattern
event<InfoFlags> InfoChange;

// Retrieve the current meta information. Returns only valid info
// if EnsureInfoAsync has returned IF_Metadata or the InfoChange event
// has fired with IF_Metadata set.
virtual Metadata GetMetadata() const;
...

protected:
Base(const string& url) : URL(url) {}
};

class File : public Base
{public:
virtual Metadata GetMetadata() const;
...

protected:
File(const string& url) : Base(url) {}
friend class Repository; // For the factory
};

class Folder : public Base
{public:
virtual Metadata GetMetadata() const;
...

protected:
Folder(const string& url) : Base(url) {}
friend class Repository; // For the factory
};


At the very first no information is available about the object. Only if
some information is requested, I/O operations take place to specialize
the object. Of course, once specialized, the type will never change.

Now the problem is at the time the object is created, I have only an
empty shell, that forwards EnsureInfoAsync to plug-ins to determine the
effective type of the object. But at this time observers are already
registered and strong references to the object are around in the
application. Therefore it is not sufficient to atomically replace the
generalized object by the specialized one in the repository.

The only idea I have so far is to introduce another level of indirection
and replace all occurrences of int_ptr<Base> by int_ptr<BaseProxy>.
BaseProxy contains the int_ptr<Base> and forwards the functionality. But
this is neither clear nor of high-performance. Furthermore some services
have to be provided by BaseProxy only, e.g. the observable pattern or
the synchronization. This causes a briskly interaction between BaseProxy
and Base, although these classes are unrelated from the C++ point of
view. No well designed OOP.

So, are there any other ideas to implement something like that?


Marcel
 
B

Balog Pal

Marcel Müller said:
I am seeking for a neat solution for the following problem:

There are objects of different types with a common base class, let's say
folders and files (although that does not completely hit the nail on the
head). These objects have a common primary key, the full qualified path.
But the polymorphic type of the objects emerges only *after* the object
has been referenced. See below.
[...]

I get this, but the following seem contradictory, especially the observer
part.
If the object is not there yet why on earth is it observed?

From my limited understanding of the problem, I'd have something like:

map<string, shared_ptr<Base> > repository;

As the only info up front is PK, in this phase you just put that in the map,
the ptr is null, signaling the not-accessed state.

When access comes, you have the extra info, create the object by factory and
put it in the ptr.

If your initial info is more elaborate, you can create a struct of it, and
use instead of string in the map. To avoid duplication, the object body
can have a pointer to the PK part, passed in ctor. If the object lives only
in this repository, and it is a map, the nodes are stable. (certainly I
would internally firewall the access to PK by a function, so later it can be
simply converted in a local copy, or a shared_ptr instead of raw, etc...)
 
M

Marcel Müller

Balog said:
Marcel Müller said:
I am seeking for a neat solution for the following problem:

There are objects of different types with a common base class, let's say
folders and files (although that does not completely hit the nail on the
head). These objects have a common primary key, the full qualified path.
But the polymorphic type of the objects emerges only *after* the object
has been referenced. See below.
[...]

I get this, but the following seem contradictory, especially the observer
part.
If the object is not there yet why on earth is it observed?

The object is existing, but not specialized. You may request different
kinds of information. This triggers a worker thread to obtain the
requested information. If the object is not yet specialized (e.g. File
or Folder) the first step is to determine the object type. This may
require an analysis of the file content or the http object. Than an
instance of the matching class is instantiated and this class fills the
requested kind of informations. Once completed, the observer is notified
that the information is now available. If some information is available
fast and other information take longer the event may also fire twice
with different flags.

In the current implementation there is a hard-coded function that guess
the object type from the PK without any I/O. So File and Folder in the
example can derive from Base.
But this guess sometimes fails. This results in an object with invalid
state.
From my limited understanding of the problem, I'd have something like:

map<string, shared_ptr<Base> > repository;

Yes, similar. But the PK is intrusive part of the object.
So it is more like set said:
As the only info up front is PK, in this phase you just put that in the map,
the ptr is null, signaling the not-accessed state.

Even an object, that is not fully complete, has some basic properties.
It may be selected, it may be referenced, it has a changeable display
name and so on. NULL is not sufficient.
When access comes, you have the extra info, create the object by factory and
put it in the ptr.

Access is done from the GUI thread. There must not be any I/O within
this context, to keep the application responsive. All I/O is done
asynchronously. In fact hundreds of objects may be in pending state at
the same time. A set of parallel workers do the jobs.
If your initial info is more elaborate, you can create a struct of it, and
use instead of string in the map. To avoid duplication, the object body
can have a pointer to the PK part, passed in ctor.

OK, you say that what I called BaseProxy could be the key part of a map
and the polymorphic part is the value. This is possible. But I see no
difference to my idea with BaseProxy that owns the polymorphic part. I
have really many calls from the specialized classes to the Base. Most of
them belong to the observable pattern and the synchronization. This
services have to be stable for the observable to work properly.
Each kind of information has a conditional variable. So I can lock
individual parts of the object without the need of thousands of mutexes.
All these calls have to go through the pointer to the PK part. This
makes the code look very ugly.
However, if I have no other choice, I will go that way.
If the object lives only
in this repository, and it is a map, the nodes are stable.

Yes. The objects are non-copyable anyway.
(certainly I
would internally firewall the access to PK by a function, so later it can be
simply converted in a local copy, or a shared_ptr instead of raw, etc...)

This is required anyway, because the repository docks to that interface
to lookup the PK for comparison.


A union with a few types and a stable common base class would be nice.
If the base is virtual, the memory layout could match the requirements
in theory - but only in theory.
Like this:
+------------------------+
| Base |
+-------+--------+-------+
| File | Folder | Other |
+-------+ | |
+ free | +-------+
+-------+--------+-------+

This could turn into 4 different classes (if Base is not abstract)
without the need to change the Pointer to Base.


Marcel
 
A

Alf P. Steinbach

* Marcel Müller:
Hi,

I am seeking for a neat solution for the following problem:

There are objects of different types with a common base class, let's say
folders and files (although that does not completely hit the nail on the
head). These objects have a common primary key, the full qualified path.
But the polymorphic type of the objects emerges only *after* the object
has been referenced. See below.

The implementation currently uses a static instance repository of all
objects. The objects are reference counted and only instantiated by a
global factory function. Two objects with the same key must not coexists
in memory.

// int_ptr<T> is mostly the same than boost::intrusive_ptr<T>

class Base;

class Repository
{ ...
public:
// Singleton!
static Repository Instance;

// Seek whether an object is already loaded.
// May return NULL
int_ptr<Base> FindByURL(const string& url);
// FACTORY! Get a new or an existing instance of this URL.
// Does not return NULL
int_ptr<Base> GetByURL(const string& url);
};

When instantiating the objects by their path (e.g. while enumerating a
folders content) the exact type of the object is not yet known. Only if
the application requests certain information about the object, the type
has to be specialized. Unfortunately this is not always a cheap
operation. In fact it cannot be done synchronously at the time the
factory is called.

class Base
{public:
const string URL;

enum InfoFlags // Different kind of information
{ IF_None = 0,
IF_Metadata = 1,
IF_Content = 2,
IF_Type = 4, // The polymorphic type of the object
...
};

// Retrieve information asynchronously if the requested information is
// not yet available. The function returns the flags of the requested
// information that is immediately available.
// For all the missing bits you should wait for the matching
// InfoChange event to get the requested information.
// If no information is yet available, the function returns IF_None.
// Of course, you have to register the event handler before the call
// to EnsureInfoAsync. If EnsureInfoAsync returned all requested bits
// the InfoChange event is not raised as result of this call.
virtual InfoFlags EnsureInfoAsync(InfoFlags what);

// Observable pattern
event<InfoFlags> InfoChange;

// Retrieve the current meta information. Returns only valid info
// if EnsureInfoAsync has returned IF_Metadata or the InfoChange event
// has fired with IF_Metadata set.
virtual Metadata GetMetadata() const;
...

protected:
Base(const string& url) : URL(url) {}
};

class File : public Base
{public:
virtual Metadata GetMetadata() const;
...

protected:
File(const string& url) : Base(url) {}
friend class Repository; // For the factory
};

class Folder : public Base
{public:
virtual Metadata GetMetadata() const;
...

protected:
Folder(const string& url) : Base(url) {}
friend class Repository; // For the factory
};


At the very first no information is available about the object. Only if
some information is requested, I/O operations take place to specialize
the object. Of course, once specialized, the type will never change.

Now the problem is at the time the object is created, I have only an
empty shell, that forwards EnsureInfoAsync to plug-ins to determine the
effective type of the object. But at this time observers are already
registered and strong references to the object are around in the
application. Therefore it is not sufficient to atomically replace the
generalized object by the specialized one in the repository.

The only idea I have so far is to introduce another level of indirection
and replace all occurrences of int_ptr<Base> by int_ptr<BaseProxy>.
BaseProxy contains the int_ptr<Base> and forwards the functionality. But
this is neither clear nor of high-performance. Furthermore some services
have to be provided by BaseProxy only, e.g. the observable pattern or
the synchronization. This causes a briskly interaction between BaseProxy
and Base, although these classes are unrelated from the C++ point of
view. No well designed OOP.

So, are there any other ideas to implement something like that?

Yes, you should do the natural thing instead of trying to force an unnatural
abstraction.

The natural thing is that objects that can be created for free are one type,
those that are expensive to create are another type.

The point of static types is to support type checking. The point of static type
checking is to help you. Using it to create silly problems is, well, just silly.


Cheers & hth.,

- Alf
 
B

Boris Rasin

A union with a few types and a stable common base class would be nice.
If the base is virtual, the memory layout could match the requirements
in theory - but only in theory.
Like this:
+------------------------+
|          Base          |
+-------+--------+-------+
| File  | Folder | Other |
+-------+        |       |
+ free  |        +-------+
+-------+--------+-------+

This could turn into 4 different classes (if Base is not abstract)
without the need to change the Pointer to Base.

Just out of curiosity, here is what I came up with:

------------------ polymorph.h ------------------
#include <cstddef>
#include <type_traits> // or <boost\tr1\type_traits.hpp>
using std::tr1::aligned_storage;
using std::tr1::alignment_of;

template <typename T1, typename T2>
struct max_size
{
static const std::size_t value = sizeof (T1) > sizeof (T2) ? sizeof
(T1) : sizeof (T2);
};

template <class base, class derived1, class derived2>
class polymorph
{
//static_assert (std::has_virtual_destructor<base>::value);
//static_assert (std::is_base_of<base, derived1>::value);
//static_assert (std::is_base_of<base, derived2>::value);
public:
polymorph() : object_ptr (reinterpret_cast<base*> (&object_space))
{
new ((void*)object_ptr) base;
}

~polymorph()
{
object_ptr->~base();
}

base* operator -> () { return object_ptr; }

template <class T>
void change_type()
{
//base temp (std::move (*object_ptr));
object_ptr->~base();

object_ptr = const_cast<base* volatile> (object_ptr);

new ((void*)object_ptr) T; // (std::move (temp));
}
private:
// Even though object_ptr always points to object_space,
// it is needed to provide memory barrier when object type changes
// (invalidate compiler's cache of object's const members including
vptr).

base* object_ptr;

typename aligned_storage<max_size<derived1, derived2>::value,
alignment_of<base>::value>::type object_space;
//char object_space /*[[align(base)]]*/ [max_size<derived1,
derived2>::value];

};
------------------ polymorph.h ------------------

And the test code:

------------------ polymorph_test.cpp -----------
#include <iostream>
using std::cout;

#include "polymorph.h"

struct base
{
base() { cout << "base()\n"; }
virtual ~base() { cout << "~base()\n"; }
virtual void func() { cout << "base::func()\n"; }
};

struct derived1 : base
{
derived1() { cout << "derived1()\n"; }
~derived1() { cout << "~derived1()\n"; }
virtual void func() { cout << "derived1::func()\n"; }
};

struct derived2 : base
{
derived2() { cout << "derived2()\n"; }
~derived2() { cout << "~derived2()\n"; }
virtual void func() { cout << "derived2::func()\n"; }
};

typedef polymorph<base, derived1, derived2> object_type;

void main()
{
object_type obj;
obj->func();

obj.change_type<derived1>();
obj->func();

obj.change_type<derived2>();
obj->func();
}
------------------ polymorph_test.cpp -----------
Here is the output:

base()
base::func()
~base()
base()
derived1()
derived1::func()
~derived1()
~base()
base()
derived2()
derived2::func()
~derived2()
~base()

When type is changed object is actually destroyed and re-created, so
you need some way to transfer state. But all references remain valid.
This implementation does provide some performance benefits compared to
standard proxy approach, but unless you really need those, I don't
know if I would use something like this in production code.

Boris.
 
J

James Kanze

Balog said:
"Marcel Müller" <[email protected]>
I am seeking for a neat solution for the following problem:
There are objects of different types with a common base
class, let's say folders and files (although that does not
completely hit the nail on the head). These objects have a
common primary key, the full qualified path. But the
polymorphic type of the objects emerges only *after* the
object has been referenced. See below.
[...]
I get this, but the following seem contradictory, especially
the observer part. If the object is not there yet why on
earth is it observed?
The object is existing, but not specialized. You may request
different kinds of information. This triggers a worker thread
to obtain the requested information. If the object is not yet
specialized (e.g. File or Folder) the first step is to
determine the object type. This may require an analysis of the
file content or the http object. Than an instance of the
matching class is instantiated and this class fills the
requested kind of informations. Once completed, the observer
is notified that the information is now available. If some
information is available fast and other information take
longer the event may also fire twice with different flags.

Will the object be called on to change its type later, once it
has established the type the first time. If not, I'd certainly
try to defer actual creation until after the type has been
established. If you can't, or if you need to change the type
later, there's always the letter/envelop idiom. But I have a
sneaking suspicion that what you actually need is to use the
strategy pattern. You're object isn't really polymorphic with
regards to everything (the handling of the observer, for
example); what you want to do is use a delegate for the
polymorphic part, which will only be allocated once the real
type is known (and which can be freed and reallocated with a
different type, if necessary).
In the current implementation there is a hard-coded function
that guess the object type from the PK without any I/O. So
File and Folder in the example can derive from Base. But this
guess sometimes fails. This results in an object with invalid
state.
Yes, similar. But the PK is intrusive part of the object. So
it is more like set<int_ptr<Base> > repository;

From the sound of things, these objects have the logical
equivalent of a static lifetime (i.e. they are created on
program start-up, and live until program shutdown). If this is
the case, why do you want a smart pointer? It doesn't buy you
anything, and it sends the wrong message to the user. (If you
use the strategy pattern, an std::auto_ptr or a
boost::scoped_ptr in the base object might make sense,
especially if you have to change the strategy dynamically.)
Even an object, that is not fully complete, has some basic
properties. It may be selected, it may be referenced, it has
a changeable display name and so on. NULL is not sufficient.

In other words, your base class has some behavior. Logically,
this means either the template method pattern or the strategy
pattern---the latter means that you can defer the decision, and
change the implementation at will.

[...]
A union with a few types and a stable common base class would
be nice. If the base is virtual, the memory layout could
match the requirements in theory - but only in theory.
Like this:
+------------------------+
| Base |
+-------+--------+-------+
| File | Folder | Other |
+-------+ | |
+ free | +-------+
+-------+--------+-------+
This could turn into 4 different classes (if Base is not
abstract) without the need to change the Pointer to Base.

This can be made to work, but only if there are no pointers to
the object which outlive the change of type. In other words, if
you use some sort of smart pointer, with an additional
indirection so that there is only one instance of the pointer to
the actual object, and if all of the type changes go through the
smart pointer---the function which changes the type should
return this, and the smart pointer use the return value to
"update" its pointer.

Quite frankly, it sounds like a maintenance nightmare. I'd go
with the strategy pattern.
 
A

Alf P. Steinbach

* James Kanze:
In other words, your base class has some behavior. Logically,
this means either the template method pattern or the strategy
pattern---the latter means that you can defer the decision, and
change the implementation at will.

As I wrote else-thread, IMHO it's not a good idea to add complexity just for
ideological reasons. Objects that modify themselves to provide richer
functionality, where that metamorphosis is (1) costly and (2) can fail, well
it's just silly. Instead keep the distinction between URI or directory entry or
whatever, on the one hand, and what's referred to, on the other hand. They might
implement some common interface. But I can't for the life of me see the point in
having e.g. a directory entry metamorphose into a file it represents.

Having said that, I also have an issue with this "pattern talk". Some few
patterns are very often needed in C++ because C++ doesn't provide the relevant
support, e.g. visitor pattern. But I had to look up "strategy pattern" and
darned if I can see what's different from "template pattern"; I guess it's some
detail that Someone thinks is important, like maintaining red Fords are
intrinsicially different from blue Fords, hey even a child can see difference.

Also, I really don't see how it can be helpful for the OP's question to be able
to dynamically replace the implemententaion of a method or set of methods. It's
not like he just wants the behavior of methods to change. He wants dynamic
switching between different but overlapping sets of behaviors, which means
complexity and ample opportunities for bugs to creep in and start families.


Cheers,

- Alf
 
M

Marcel Müller

Hi,
Instead keep the distinction between URI or
directory entry or whatever, on the one hand, and what's referred to, on
the other hand. They might implement some common interface. But I can't
for the life of me see the point in having e.g. a directory entry
metamorphose into a file it represents.

it is not that easy, since file/directory was only an example. In fact
almost any URI can turn into a container once the server is connected
and some protocol talk has happened.

Also, I really don't see how it can be helpful for the OP's question to
be able to dynamically replace the implemententaion of a method or set
of methods. It's not like he just wants the behavior of methods to
change.

Exactly. Furthermore it is impossible, since the base class must provide
non-copyable and non-swapable objects. First of all the Mutex to protect
the instance.
He wants dynamic switching between different but overlapping
sets of behaviors, which means complexity and ample opportunities for
bugs to creep in and start families.

No more bugs than any other derived class. In fact I only want to
intercept the construction process at the point where the base object is
fully constructed and before the derived class starts to build. I see no
point where this breaks with some OOP rule, at least as long as the base
is not abstract. Unfortunately the language does not provide a feature
to do this, because a cannot call the constructor of a derived class
without invoking the constructor of the base anew.


In fact, I currently try to design the two class solution. But it turns
out, that most of the services have to be placed in the common 'URI
class', because their state must survive the specialization. On the
other side, the specialized type use these services very often. Any
access from the specialized class have to be done through some owner
reference to the common class. This blows up the code a lot making it
rather unreadable.
At runtime it is similar to a virtual base class, so there will be no
much difference.


Marcel
 
A

Alf P. Steinbach

* Marcel Müller:
Hi,


it is not that easy, since file/directory was only an example. In fact
almost any URI can turn into a container once the server is connected
and some protocol talk has happened.

It's IMHO silly to turn URI's into what they refer to.

Exactly. Furthermore it is impossible, since the base class must provide
non-copyable and non-swapable objects. First of all the Mutex to protect
the instance.


No more bugs than any other derived class. In fact I only want to
intercept the construction process at the point where the base object is
fully constructed and before the derived class starts to build. I see no
point where this breaks with some OOP rule, at least as long as the base
is not abstract. Unfortunately the language does not provide a feature
to do this,

It's trivially easy: last statement of base class constructor body.

because a cannot call the constructor of a derived class
without invoking the constructor of the base anew.

This may be some other problem, or more in the direction of the real problem. I
have the feeling your descriptions are less than complete, because it doesn't
fit in with what you have described so far. Anyway see the FAQ about how to do
derived class specific initialization.

In fact, I currently try to design the two class solution. But it turns
out, that most of the services have to be placed in the common 'URI
class', because their state must survive the specialization. On the
other side, the specialized type use these services very often. Any
access from the specialized class have to be done through some owner
reference to the common class. This blows up the code a lot making it
rather unreadable.
At runtime it is similar to a virtual base class, so there will be no
much difference.

What's difficult about copying state, or referring to it.


Cheers & hth.,

- Alf
 
M

Marcel Müller

Alf said:
It's IMHO silly to turn URI's into what they refer to.

URI is a property of what they refer to. Not more, not less.

It's trivially easy: last statement of base class constructor body.

If and only if I can block the thread that executes the constructor.
Unfortunately having one thread per object is absolutely not an option.

This may be some other problem, or more in the direction of the real
problem. I have the feeling your descriptions are less than complete,
because it doesn't fit in with what you have described so far.

Well, reducing the application code to a minimal example is always
complicated. Maybe I did not hit the nail on the head.
Anyway
see the FAQ about how to do derived class specific initialization.

I don't know which part you refer to.

What's difficult about copying state, or referring to it.

I cannot copy the state of a mutex. I have to refer to it. Similar
things applies to the observable property (But this one can be swapped).

But referring from the specialized class to it's 'base' is not that
straight forward. Also the base has to forward all calls to 'virtual
functions' to it's specialized counter part.


Marcel
 
A

Alf P. Steinbach

* Marcel Müller:
URI is a property of what they refer to. Not more, not less.

I'm sorry, that's not meaningful to me.

If and only if I can block the thread that executes the constructor.
Unfortunately having one thread per object is absolutely not an option.

Again, I'm sorry, that's not meaningful to me.

Well, reducing the application code to a minimal example is always
complicated. Maybe I did not hit the nail on the head.


I don't know which part you refer to.

How about checking the FAQ table of contents.

I cannot copy the state of a mutex. I have to refer to it. Similar
things applies to the observable property (But this one can be swapped).

But referring from the specialized class to it's 'base' is not that
straight forward. Also the base has to forward all calls to 'virtual
functions' to it's specialized counter part.

I'm sorry, that's not meaningful to me.

In particular, there's nothing in what you have explained about your problem
that would make it meaningful to have a mutex object contained in an URI object,
to make URI's non-copyable and to make your super-multi-threaded scenario
essentially single-threaded -- but still with the overhead of threading.

This sounds like a spaghetti nightmare.


Cheers,

- Alf
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,968
Messages
2,570,153
Members
46,699
Latest member
AnneRosen

Latest Threads

Top