A
abir
Hi,
I have a lot of image processing algorithms working on a vector of
Point.
The Point returns x,y coordinate in different coordinate frame, and
the algorithms are working on the specific coordinate frame.
All of the coordinate frames are known at compile time, and hence all
of the algorithms deep down to the bottom level functions are
templated over coordinate frame.
I also evaluate on all of the coordinate frames always.
a simple compilable program can be given as,
enum Dir //these are coordinate frames (there are 4 coordinate frames)
{
D1, D2, D3, D4
};
struct Point //the basic point class.
{
int x_;
int y_;
int x1_;
int y1_;
Point(int x,int y)
: x_(x)
, y_(y)
, x1_ (2*x_+3*y_)
, y1_ (-3*x_+2*y_)
{
}
template<Dir d>
int x()const;
template<Dir d>
int y()const;
};
template<>
int Point::x<D1>()const
{
return x_;
}
template<>
int Point::x<D2>()const
{
return y_;
};
template<>
int Point::x<D3>()const
{
return x1_;
}
template<>
int Point::x<D4>()const
{
return y1_;
}
template<>
int Point::y<D1>() const
{
return y_;
}
template<>
int Point::y<D2>()const
{
return -x_;
}
template<>
int Point::y<D3>()const
{
return y1_;
}
template<>
int Point::y<D4>()const
{
return -x1_;
}
The actual algorithms are far more complex. Here i only showing how
they are used.
my algorithms are (templated version)
template<Dir d>
void doStuff(Point& p)
{
int i = p.x<d>()*p.x<d>()+ p.y<d>()*p.y<d>();
i+= p.x<d>()*p.y<d>();
}
template<Dir d,typename Itr>
void algo(Itr b,Itr e)
{
for(; b!= e; ++b)
{
doStuff<d>(*b);
}
}
a non- template version (non template over direction)
typedef int (Point::*ptPtr)()const;
void doStuff(Point& p,ptPtr x,ptPtr y)
{
int i = (p.*x)()*(p.*x)()+ (p.*y)()*(p.*y)();
i+= (p.*x)()*(p.*y)();
}
template<typename Itr>
void algo(Itr b,Itr e,ptPtr xPtr,ptPtr yPtr)
{
for(; b!= e;++b)
{
doStuff(*b,xPtr,yPtr);
}
}
I am calling them as,
//get some random points in vector.
int count = 50000;
typedef std::vector<Point> PV;
PV v; v.reserve(5000);
for(PV::size_type i = 0; i!= 5000; ++i)
{
v.push_back(Point(rand()%100,rand()%100));
}
for(int i = 0; i!= count; ++i)
{ algo<D1>(v.begin(),v.end());
algo<D2>(v.begin(),v.end());
algo<D3>(v.begin(),v.end());
algo<D4>(v.begin(),v.end());
}
or
ptPtr xPtr = &Point::x<D1>;
ptPtr yPtr = &Point::y<D1>;
for(int i = 0; i!= count; ++i)
{
algo(v.begin(),v.end(),xPtr,yPtr);
}
xPtr = &Point::x<D2>;
yPtr = &Point::y<D2>;
for(int i = 0; i!= count; ++i)
{
algo(v.begin(),v.end(),xPtr,yPtr);
}
xPtr = &Point::x<D3>;
yPtr = &Point::y<D3>;
for(int i = 0; i!= count; ++i)
{
algo(v.begin(),v.end(),xPtr,yPtr);
}
xPtr = &Point::x<D4>;
yPtr = &Point::y<D4>;
for(int i = 0; i!= count; ++i)
{
algo(v.begin(),v.end(),xPtr,yPtr);
}
Now, in this test case, the templated algorithm executes much faster.
But my concerns are,
1) templated (over direction) always creates 4 binary image for all of
the algorithms.
(for Itr only one version instantiated while for Dir all of 4)
2) in deep down to function hierarchy the templates propagate.
3) the compile time increases & due to slightest change a recompile
triggers (not much concern to me)
4) while the Point::x<D1>() etc methods are one liner, they always
gets inlined for templated version,
while for the member function pointer version, i am not sure whether
they can be(i need to get little more assembly knowledge to check this
case)
in actual implementation number of algorithms are large (so i expect
almost all of the codes will get 4 binary copy) and number of
iterations are huge (1-1.5 million ,so without inline member pointer
will get large penalty)
Though in test case (as shown) the template version is 7-8 times
faster than the member pointer one, but i expect that is not true
measure, and it is difficult to measure in actual code, as i don't
have function pointer version.
What i am interested to know, is there any kind of thumb rule i can
follow for when to run time code over compile time one (even though
information is available at compile time), esp when all of the cases
for template instance are to be generated (i my case it is always 4)
Thanks
abir
I have a lot of image processing algorithms working on a vector of
Point.
The Point returns x,y coordinate in different coordinate frame, and
the algorithms are working on the specific coordinate frame.
All of the coordinate frames are known at compile time, and hence all
of the algorithms deep down to the bottom level functions are
templated over coordinate frame.
I also evaluate on all of the coordinate frames always.
a simple compilable program can be given as,
enum Dir //these are coordinate frames (there are 4 coordinate frames)
{
D1, D2, D3, D4
};
struct Point //the basic point class.
{
int x_;
int y_;
int x1_;
int y1_;
Point(int x,int y)
: x_(x)
, y_(y)
, x1_ (2*x_+3*y_)
, y1_ (-3*x_+2*y_)
{
}
template<Dir d>
int x()const;
template<Dir d>
int y()const;
};
template<>
int Point::x<D1>()const
{
return x_;
}
template<>
int Point::x<D2>()const
{
return y_;
};
template<>
int Point::x<D3>()const
{
return x1_;
}
template<>
int Point::x<D4>()const
{
return y1_;
}
template<>
int Point::y<D1>() const
{
return y_;
}
template<>
int Point::y<D2>()const
{
return -x_;
}
template<>
int Point::y<D3>()const
{
return y1_;
}
template<>
int Point::y<D4>()const
{
return -x1_;
}
The actual algorithms are far more complex. Here i only showing how
they are used.
my algorithms are (templated version)
template<Dir d>
void doStuff(Point& p)
{
int i = p.x<d>()*p.x<d>()+ p.y<d>()*p.y<d>();
i+= p.x<d>()*p.y<d>();
}
template<Dir d,typename Itr>
void algo(Itr b,Itr e)
{
for(; b!= e; ++b)
{
doStuff<d>(*b);
}
}
a non- template version (non template over direction)
typedef int (Point::*ptPtr)()const;
void doStuff(Point& p,ptPtr x,ptPtr y)
{
int i = (p.*x)()*(p.*x)()+ (p.*y)()*(p.*y)();
i+= (p.*x)()*(p.*y)();
}
template<typename Itr>
void algo(Itr b,Itr e,ptPtr xPtr,ptPtr yPtr)
{
for(; b!= e;++b)
{
doStuff(*b,xPtr,yPtr);
}
}
I am calling them as,
//get some random points in vector.
int count = 50000;
typedef std::vector<Point> PV;
PV v; v.reserve(5000);
for(PV::size_type i = 0; i!= 5000; ++i)
{
v.push_back(Point(rand()%100,rand()%100));
}
for(int i = 0; i!= count; ++i)
{ algo<D1>(v.begin(),v.end());
algo<D2>(v.begin(),v.end());
algo<D3>(v.begin(),v.end());
algo<D4>(v.begin(),v.end());
}
or
ptPtr xPtr = &Point::x<D1>;
ptPtr yPtr = &Point::y<D1>;
for(int i = 0; i!= count; ++i)
{
algo(v.begin(),v.end(),xPtr,yPtr);
}
xPtr = &Point::x<D2>;
yPtr = &Point::y<D2>;
for(int i = 0; i!= count; ++i)
{
algo(v.begin(),v.end(),xPtr,yPtr);
}
xPtr = &Point::x<D3>;
yPtr = &Point::y<D3>;
for(int i = 0; i!= count; ++i)
{
algo(v.begin(),v.end(),xPtr,yPtr);
}
xPtr = &Point::x<D4>;
yPtr = &Point::y<D4>;
for(int i = 0; i!= count; ++i)
{
algo(v.begin(),v.end(),xPtr,yPtr);
}
Now, in this test case, the templated algorithm executes much faster.
But my concerns are,
1) templated (over direction) always creates 4 binary image for all of
the algorithms.
(for Itr only one version instantiated while for Dir all of 4)
2) in deep down to function hierarchy the templates propagate.
3) the compile time increases & due to slightest change a recompile
triggers (not much concern to me)
4) while the Point::x<D1>() etc methods are one liner, they always
gets inlined for templated version,
while for the member function pointer version, i am not sure whether
they can be(i need to get little more assembly knowledge to check this
case)
in actual implementation number of algorithms are large (so i expect
almost all of the codes will get 4 binary copy) and number of
iterations are huge (1-1.5 million ,so without inline member pointer
will get large penalty)
Though in test case (as shown) the template version is 7-8 times
faster than the member pointer one, but i expect that is not true
measure, and it is difficult to measure in actual code, as i don't
have function pointer version.
What i am interested to know, is there any kind of thumb rule i can
follow for when to run time code over compile time one (even though
information is available at compile time), esp when all of the cases
for template instance are to be generated (i my case it is always 4)
Thanks
abir