use template or non template version?

abir · Apr 15, 2009

Hi,
I have a lot of image processing algorithms working on a vector of
Point.
The Point returns x,y coordinate in different coordinate frame, and
the algorithms are working on the specific coordinate frame.
All of the coordinate frames are known at compile time, and hence all
of the algorithms deep down to the bottom level functions are
templated over coordinate frame.
I also evaluate on all of the coordinate frames always.

a simple compilable program can be given as,
enum Dir //these are coordinate frames (there are 4 coordinate frames)
{
D1, D2, D3, D4
};
struct Point //the basic point class.
{
int x_;
int y_;
int x1_;
int y1_;
Point(int x,int y)
: x_(x)
, y_(y)
, x1_ (2*x_+3*y_)
, y1_ (-3*x_+2*y_)
{

}
template<Dir d>
int x()const;
template<Dir d>
int y()const;
};
template<>
int Point::x<D1>()const
{
return x_;
}
template<>
int Point::x<D2>()const
{
return y_;
};
template<>
int Point::x<D3>()const
{
return x1_;
}
template<>
int Point::x<D4>()const
{
return y1_;
}
template<>
int Point::y<D1>() const
{
return y_;
}
template<>
int Point::y<D2>()const
{
return -x_;
}
template<>
int Point::y<D3>()const
{
return y1_;
}
template<>
int Point::y<D4>()const
{
return -x1_;
}

The actual algorithms are far more complex. Here i only showing how
they are used.
my algorithms are (templated version)

template<Dir d>
void doStuff(Point& p)
{
int i = p.x<d>()*p.x<d>()+ p.y<d>()*p.y<d>();
i+= p.x<d>()*p.y<d>();
}
template<Dir d,typename Itr>
void algo(Itr b,Itr e)
{
for(; b!= e; ++b)
{
doStuff<d>(*b);
}
}

a non- template version (non template over direction)
typedef int (Point::*ptPtr)()const;

void doStuff(Point& p,ptPtr x,ptPtr y)
{
int i = (p.*x)()*(p.*x)()+ (p.*y)()*(p.*y)();
i+= (p.*x)()*(p.*y)();
}
template<typename Itr>
void algo(Itr b,Itr e,ptPtr xPtr,ptPtr yPtr)
{
for(; b!= e;++b)
{
doStuff(*b,xPtr,yPtr);
}
}

I am calling them as,

//get some random points in vector.
int count = 50000;
typedef std::vector<Point> PV;
PV v; v.reserve(5000);
for(PV::size_type i = 0; i!= 5000; ++i)
{
v.push_back(Point(rand()%100,rand()%100));
}

for(int i = 0; i!= count; ++i)
{ algo<D1>(v.begin(),v.end());
algo<D2>(v.begin(),v.end());
algo<D3>(v.begin(),v.end());
algo<D4>(v.begin(),v.end());
}

or
ptPtr xPtr = &Point::x<D1>;
ptPtr yPtr = &Point::y<D1>;
for(int i = 0; i!= count; ++i)
{
algo(v.begin(),v.end(),xPtr,yPtr);
}

xPtr = &Point::x<D2>;
yPtr = &Point::y<D2>;
for(int i = 0; i!= count; ++i)
{
algo(v.begin(),v.end(),xPtr,yPtr);
}
xPtr = &Point::x<D3>;
yPtr = &Point::y<D3>;
for(int i = 0; i!= count; ++i)
{
algo(v.begin(),v.end(),xPtr,yPtr);
}

xPtr = &Point::x<D4>;
yPtr = &Point::y<D4>;
for(int i = 0; i!= count; ++i)
{
algo(v.begin(),v.end(),xPtr,yPtr);
}

Now, in this test case, the templated algorithm executes much faster.
But my concerns are,
1) templated (over direction) always creates 4 binary image for all of
the algorithms.
(for Itr only one version instantiated while for Dir all of 4)
2) in deep down to function hierarchy the templates propagate.
3) the compile time increases & due to slightest change a recompile
triggers (not much concern to me)
4) while the Point::x<D1>() etc methods are one liner, they always
gets inlined for templated version,
while for the member function pointer version, i am not sure whether
they can be(i need to get little more assembly knowledge to check this
case)

in actual implementation number of algorithms are large (so i expect
almost all of the codes will get 4 binary copy) and number of
iterations are huge (1-1.5 million ,so without inline member pointer
will get large penalty)
Though in test case (as shown) the template version is 7-8 times
faster than the member pointer one, but i expect that is not true
measure, and it is difficult to measure in actual code, as i don't
have function pointer version.

What i am interested to know, is there any kind of thumb rule i can
follow for when to run time code over compile time one (even though
information is available at compile time), esp when all of the cases
for template instance are to be generated (i my case it is always 4)

Thanks
abir

Victor Bazarov · Apr 15, 2009

abir said:
I have a lot of image processing algorithms working on a vector of
Point.
The Point returns x,y coordinate in different coordinate frame, and
the algorithms are working on the specific coordinate frame.
All of the coordinate frames are known at compile time, and hence all
of the algorithms deep down to the bottom level functions are
templated over coordinate frame.
I also evaluate on all of the coordinate frames always.

[..code comparison: templates versus pointers to functions..]

What i am interested to know, is there any kind of thumb rule i can
follow for when to run time code over compile time one (even though
information is available at compile time), esp when all of the cases
for template instance are to be generated (i my case it is always 4)

It is the cost to you that you need to consider. As you have
discovered, nothing will come out of nowhere. You have to compromise:
once you use optimal code, you can either improve the memory footprint
or performance. By definition, if you can improve both, your algorithm
is not optimal. Now, you need to decide which one is more important to
you. It is possible that initially the gains in performance will
outweigh the loss in memory footprint. Then, with further inlining of
the code, unrolling of the loops, etc., come diminishing returns. IME
one will always have to stop at some point where the gain in performance
is too low compared to the loss in memory, but nobody can't tell what
that point is.

Also consider, that finding improvements also costs *you* valuable
development time. You need to consider how much time *you* can be
losing while not implementing that new feature in your program which
might save your customers time better than any performance improvement
you can achieve in the same amount of work...

Good luck!

V

P.S. The answer is, as you can see, not C++ specific.

template member specialization of template class	2	Feb 22, 2008
How do you create template array at compile-time?	1	Jul 9, 2010
syntax problem for partial template specialization for a template class.	1	Dec 28, 2006
How do vector choose const or non-const?	3	Sep 2, 2011
Instantiation of template method in template class	1	Aug 27, 2008
template question	4	Nov 3, 2003
dependent template parameter?	9	Sep 24, 2009
explicit calling of member template constructor...	7	Mar 24, 2010

use template or non template version?

abir

Victor Bazarov

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads