ABC vs. CRTP?

M

Mike Smith

Sorry about the multiple post, but I just realized my prior post was in
a thread that dates back a couple of days, so people might not see it.

Which is considered better for writing interfaces in C++: abstract base
classes, or templates - and why? I.e. when is one preferable over the
other?

Let's suppose, just by way of an example, that I want to specify an
interface for classes that can "pickle" themselves; i.e. read/write
their state information to a "flat" binary storage (for serialization,
or message-passing, etc.) It occurs to me that there are two ways one
could do this, one using an abstract base class:

typedef std::vector<unsigned char> ByteVector;

struct IPickle
{
virtual void Pickle(ByteVector &bv) = 0; // write state to vector
virtual void Unpickle(ByteVector &bv) = 0; // get state from vector
};

then one could create classes that inherit from IPickle and implement
the functions:

class PickleSample : public IPickle
{
void Pickle(...) {...}
void Unpickle(...) {...}
};

Or, one could use the Curiously Recurring Template Pattern (that is, if
I'm using it correctly):

template <class T> struct TPickle
{
void Pickle(ByteVector &bv) {T::pickle(bv);}
void Unpickle(ByteVector &bv) {T::Unpickle(bv);}
};

class PickleSample : public TPickle<PickleSample>
{
...
};

What are the pros and cons of each? The ABC approach lets you do things
like this:

void PickleAndSendMessage(IPickle *p_obj)
{
ByteVector bv;
p_obj->Pickle(bv);
SendMessageUsingSomeTransport(&bv[0], bv.size());
}

i.e. where PickleAndSendMessage() can work with any object that
"exposes" the IPickle interface. If I understand correctly, this would
not be possible with the CRTP approach. I thought maybe that an
advantage of the CRTP would be that it would be easier to "bolt on"
Pickle functionality to an existing class, e.g.:

// pre-existing class
class Blorf
{
int foo, bar;
public:
// some functions here
};

// new class with pickling
class PickleBlorf : public Blorf, public TPickle<PickleBlorf>
{
public:
PickleBlorf() : Blorf() {}
void Pickle(ByteVector &bv) {/*write foo and bar into bv*/}
void Unpickle(ByteVector &bv) {/*get foo and bar from bv*/}
};

but then I realized that this could just as easily be done using the ABC
approach - instead of inheriting from TPickle<PickleBlorf> above, you
would just inherit from IPickle instead.

So what's the downside of the ABC approach? What am I missing about the
Curiously Recurring Template Pattern that's got everybody talking about
it? Why not just use ABCs instead? Is it the fact that they lead to
the creation of a vtable? Is that such a big deal?

Thanks,
 
C

christopher diggins

Mike Smith said:
So what's the downside of the ABC approach? What am I missing about the
Curiously Recurring Template Pattern that's got everybody talking about
it? Why not just use ABCs instead? Is it the fact that they lead to the
creation of a vtable? Is that such a big deal?


The answer depends on your needs. An ABC can be slightly slower, and does
have the overhead of an extra pointer to the vtable stored in each object.
Another issue that some people point to, is that CRTP leads to less coupling
between the objects. This is not automatically a good thing. I definitely
see an unhealthy paraonia w.r.t coupling. I believe that coupling is in fact
an important part of design when used appropriately.

Most of the time I prefer ABC's over CRTP (warning: highly subjective list)
- shorter compile times
- easier to trach errors
- better typing
- easier to understand design.
- easier to test

However I prefer BIL Interfaces to ABC's,

see http://www.kangaroologic.com/interfaces/

YMMV
 
M

Mike Smith

christopher said:
The answer depends on your needs. An ABC can be slightly slower, and does
have the overhead of an extra pointer to the vtable stored in each object.

In my case there's a good chance I'll have that anyway; it's not really
a concern.
Another issue that some people point to, is that CRTP leads to less coupling
between the objects.

How so? Because you're "bolting on" the interface instead of building
it in? But you get the choice to either bolt-on or build-in, with both
ABC interfaces and CRTP interfaces.
This is not automatically a good thing. I definitely
see an unhealthy paraonia w.r.t coupling. I believe that coupling is in fact
an important part of design when used appropriately.

I think the paranoia is about *unnecessary* coupling, and I would agree
that's not a good thing.
However I prefer BIL Interfaces to ABC's,

see http://www.kangaroologic.com/interfaces/

Thanks, I'll check this out, but I must admit the first sight of having
to use an IDL turns me off. If I wanted to go heavyweight, I'd just go
COM or CORBA or XPCOM and be done with it.
 
C

Cy Edmunds

Mike Smith said:
Sorry about the multiple post, but I just realized my prior post was in a
thread that dates back a couple of days, so people might not see it.

Which is considered better for writing interfaces in C++: abstract base
classes, or templates - and why? I.e. when is one preferable over the
other?

[snip]

Templates are preferable for template libraries (otherwise they would be
called something else!) and IMHO ABC's are preferable for frameworks. I
usually wind up using a template library for low level stuff and a framework
for the whole application. In both cases I use an existing one if possible
and write it myself otherwise.

Template Library

genericity by templates
low level
optimized for run time performance
minimal error checking
weak coupling of components (useful by themselves)
work process: use components to help build your code

Framework

genericity by polymorphism
high level
optimized for programmer convenience
strong error checking with exception-based error handling
strong coupling of components (useless by themselves)
work process: write a modest amount of code to create an application

HTH
 
D

Dietmar Kuehl

Mike said:
Which is considered better for writing interfaces in C++: abstract base
classes, or templates - and why?

This is pretty much like asking: "Which looks better for
wearing when going to a club: A short, black dress or blue
jeans - and why?" The answers to this question can easily be:
"The short black dress!" or "The blue jeans." with both time
same reason: "Because it looks better!" Of course, you will
notice that the answer strongly depends on the context of the
question: the answer for a gorgeous, young female could be, of
course, either (by definition she looks great) while for me, a
fat, elderly male, the answer is pretty much limited to one
option.
I.e. when is one preferable over the other?

The short answer is: you always use templates! The short reason
is: you want their fexibility, they are sexy, and they are
simply better.

Of course, this short answer is essentially a bait for
everybody stupid enough to take it literally and discuss this
with me :) It still summarizes the essence, though: In most
cases you want to leave the exact types used open and thus you
would use a template. In many cases, there is no inherent need
for dynamic polymorphism and thus you would not use any
abstract base classes. Abstract base classes or abstract base
class templates are only used if you really need the dynamic
polymorphism. However, even if you need dynamic polymorphism,
it is likely that you still have a class template.

This, of course, assumes you are creating a general component.
In a specific application it may not pay off to use an abstract
base class template because it would be instantiated for just
one type.
So what's the downside of the ABC approach?

I discussed this issue in the past several times in some
newsgroups. I will probably forget several details I described
in previous articles so might want to try locating them.

The key difference actually looks quite innocent on first sight
but is much more problematic than it may appear: Abstract base
classes in statically typed languages only have very limited
support for variation of associated types. Essentially, they can
only support covariation on return types and contravariation of
argument types - if they support these kinds of variation at
all. This restriction effictively derives immediately from the
Liskov substitution principle.

The thing about associated types is that they pop up naturally
in signatures of member functions: for a virtual function the
base class has to choose exactly what types the function gets.
It may take its arguments by reference and return its results
by reference (of course, in C++ the latter would be some form
of smart pointer) but this is already a serious restriction
forcing associated types into class hierarchies.

The return type actually highlights another problem: even if
the object is returned as some form of reference, its dynamic
type (i.e. the actual type of the object) is not directly
accessible because the static type is just a base class. That
means, you need some form of cast to access extended
interfaces. In its extreme form, function always take and
return references to the type "object" which is the common base
class (of course, in C++ there is no such thing and the common
base class could only be imposed on a subset of the types
anyway). This was exhibited e.g. by many Java interfaces (at
least prior to the introduction of "generics" which, of course,
do not really solve the problem but this is a tradition in Java
already). That is, you invariably need downcasts to actually
use objects returned from dynamically polymorph functions. This
makes it more error prone and prevents compile time detection
of appropriate algorithms.

Another problem is the abstraction cost: virtual function
cannot be inlined unless it is known that the static type is
identical to the dynamic type. This latter restriction only
holds very rarely in systems relying on dynamic polymorphism
because you generally process all object through reference from
which the compiler has to assume that they refer to derived
objects.

Of course, it is generally argued that a virtual function call
merely takes an additional pointer look-up to the virtual
function table compared to the normal function call. This is
true but not the issue: if the static type is known many
trivial function can be inlined which has multiple effects
related to each other: there is actually no function call at
all immediately saving any function call overhead, not just the
additional pointer look-up which could just cost one processor
cycle (if the function is often called, the necessary data will
be in the processor cache; if this is not the case, the
additional cost does not really matter because it is called
quite infrequently anyway). Even if the function call is cheap
there is a huge benefit: the optimizer gets a bigger chunk of
code he can twist to get optimal performance.

The net effect is that virtual functions are too expensive if
they are more less trivial. It becomes necessary to fold
multiple operations into just one.

The differences of between static and dynamic polymorphism are
clearly exposed by the different approaches to access
sequences. The C++ approach uses three different operations
(actually, there are groups of operations but I just pick on
representive):

- operator==() to test whether iteration is complete.
- operator*() returns [a reference to] the current object.
Note that the static and dynamic type normally match but
this is actually no requirement (the sequence's value type
could be a reference type).
- operator++() to move the iterator forward.

In addition to these basic operations, specialized algorithms
may take advantage of additional properties, e.g. the ability
to also walk backwards because this is easily detected.

In contrast, the typical interface used in languages like Java
or C# use an interface consisting of just one function called
"Next()". This function does all three operations! It moves to
the next element and returns it. The next object is "null" if
there is no such object. Since the functions always take the
corresponding interface, they have no option to use a
specialized algorithm because they are unware of any additional
functionality (the best they could do is to start testing for
special properties by attempting downcasts).

Although this may seem to be just minor nits, they are actually
not. Why should a user of some component suffer the drawbacks
of dynamic polymorphism even if he uses it with a fixed type?
What am I missing about the Curiously Recurring Template
Pattern that's got everybody talking about it?

I don't talk about it nor do I use it frequently enough. It is
rare that I need it (because I rarely have need for dynamic
polymorphism) but sometimes it comes in handy. The key to this
pattern is actually that associated types can be parameterized
for your base class.
Why not just use ABCs instead?

Because they are normally more trouble than they are worth.
Essentially, they are only useful if you really need dynamic
polymorphism. Of course, if you really need dynamic
polymorphism, virtual functions the only reasonable way to go.
Is it the fact that they lead to the creation of a vtable?
Is that such a big deal?

The virtual function table is no problem. Virtual functions do
not necessarily cause a problem but for trivial operations they
can turn into a serious performance problem. Although I think
this should be considered, I consider the semantic implications
to be much more serious.
 
D

Dietmar Kuehl

Cy said:
Framework

genericity by polymorphism
high level
optimized for programmer convenience

Huh? you consider it convenient to downcast to get at your objects?
Actually, I think you need generic approaches for convenience e.g.
to fix-up the associated types. In addition, I consider it to be
more convenient if I have more options for customization because
the default setting is rarely appropriate.
 
C

Cy Edmunds

Dietmar Kuehl said:
Huh? you consider it convenient to downcast to get at your objects?

I almost never use downcasting. Perhaps you have been victimized by some
poor polymorphic designs.
Actually, I think you need generic approaches for convenience e.g.
to fix-up the associated types. In addition, I consider it to be
more convenient if I have more options for customization because
the default setting is rarely appropriate.

The point is that the more complex the interface the more difficult it is to
express using templates. Consider the interface to std::copy. It is very
simple indeed, but the point that the first two iterators are const can only
be expressed in the documentation. If ambiguity creeps in with such a simple
interface, imagine how bad it gets with a large number of complex
interfaces. And the error messages you get when the inevitable mistakes
happen are often extremely difficult to interpret.
 
D

Dietmar Kuehl

Cy said:
I almost never use downcasting. Perhaps you have been victimized by some
poor polymorphic designs.

This may very well be the case but then you will almost certainly use
templatized containers. The occasional use of virtual functions can
be reasonably convenient because it is embedded in a templatized world.
The point is that the more complex the interface the more difficult it is to
express using templates.

I disagree. For example, I consider algorithms on graphs to use a
fairly
complex interface, yet it is much easier to express it using template
approaches.
Consider the interface to std::copy. It is very
simple indeed, but the point that the first two iterators are const can only
be expressed in the documentation.

I entirely agree with you that the iterators used throughout the
standard
C++ library are actually broken! They combine two concepts, positioning
and
data access, into one and should be split into cursors and property
maps.
In the other direction, the current standard C++ library also misses
concepts
combining the two iterators into just one object. However, just because
the
first truely generic library made some unfortunate choices, it does not
mean
that the overall concept does not work. Put into paraphrased words of
yours:
You have been victimized by some poor generic design :) Well, it is
actually
not that bad and conveys the overall idea quite neatly but the current
standard C++ library was created without any real world experience with
generic programming.

BTW, the fact that first sequence is only read and the second one only
written to is actually also represented in the code: trying to use it
wrongly, e.g. passing two input sequences, will cause compiler errors.
That is, I see a problem with the current sequence interface but it is
a different one you are seeing!
And the error messages you get when the inevitable mistakes
happen are often extremely difficult to interpret.

Indeed. It is time for compilers to catch one, possibly with the help
of
appropriate language changes: some form of concept support is worked on
but there is currently no clear idea how this will eventually look
like.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,997
Messages
2,570,241
Members
46,831
Latest member
RusselWill

Latest Threads

Top