Hi,
This is a distilled version of what I'm trying to do (code snipet guaranteed
to compile). I have a Fn which has an operator that I want to cache.
Fn is used as basis for a functional algebra for more complex function
expressions.
The benchmark for the caching performance is EvalCache defined below.
I'm comparing performance of EvalCacheWithVec with EvalCache using ideas
suggested kindly by Ivan Vecerina.
For my application, I'm finding that EvalCache outperforms EvalCacheWithVec
by a factor of close to two.
In fact EvalCacheWithVec runs quicker with smaller cache sizes.
I don't really want to move the cache outside of the class Fn.
Any suggestions on how I can improve the performance of EvalCacheWithVec
further?
Thanks in advance.
class Fn
{
private:
typedef EvalCache<int,int,double> CACHE;
// typedef EvalCacheWithVec<int,int,double> CACHE;
SharedPtr<CACHE> _fnCache;
SharedPtr<blah> _rep;
public:
double operator()( int date_, int evalDate_ ) const
{
//! check to see whether cached value is valid and use if so.
std:
air<int,int> args=std::make_pair(date_, evalDate_);
const double* cached=_fnCache->lookup(args);
if (cached==0)//! ie invalid
{
double res=(*_rep)(date_, evalDate_);
_fnCache->add(args, res);
return res;
}
else
{
return *cached;
}
}
};
template< typename T1, typename T2, typename R >
class EvalCache
{
private:
typedef std:
air<T1,T2> FNARGS;
FNARGS _args;
R _cached;
bool _valid;
public:
~EvalCache(){}
EvalCache():_valid(false){}
//! return value of 0 indicates that the cached value cannot be used.
inline const R* lookup(const FNARGS& args_)
{
if (_valid==false) return 0;
if (_args==args_) return &_cached;
else return 0;
}
inline void add(const FNARGS& args_, const R& res_)
{
_args = args_;
_cached= res_;
_valid = true;
}
inline void clear(){ _valid=false; }
};
template<typename T1, typename T2, typename R>
class EvalCacheWithVec{
private:
typedef std:
air<T1,T2> FNARGS;
typedef std:
air<FNARGS,R> ENTRY;
std::vector<ENTRY> _cached;
R _cachedVal;
enum{kCacheSize=64};
std::bitset<kCacheSize> _valid;
bool _noneValid;
size_t getHashedIndex(T1 a, T2 b)
{ // A LOT OF ROOM FOR IMPROVEMENT HERE
return (17*a + 0x193*b) % kCacheSize;
}
public:
~EvalCacheWithVec(){}
EvalCacheWithVec()
:
_cached(kCacheSize),
_valid(),
_noneValid(true)
{}
//! return value of 0 indicates that the cached value cannot be used.
const R* lookup(const FNARGS& args_)
{
if (_noneValid==true) return 0;
T1 arg1(args_.first);
T2 arg2(args_.second);
size_t idx = getHashedIndex(arg1, arg2);
if (_valid[idx]==false) return 0;
const ENTRY& entry = _cached[idx];
if (entry.first!=args_) return 0;
_cachedVal=entry.second;
return &_cachedVal;
}
void add(const FNARGS& args_, const R& res_)
{
T1 arg1(args_.first);
T2 arg2(args_.second);
size_t idx = getHashedIndex(arg1, arg2);
_cached[idx]=std::make_pair(args_,res_);
_valid[idx]=true;
_noneValid=false;
}
void clear()
{
_noneValid=true;
_valid.reset();
}
};
Ivan Vecerina said:
| I have a routine where I'm using a std::map from a pair of ints to double
| for caching evaluation results.
| The cache depth could be upto say 1000 in depth.
| However, my application needs to clear and replenish the cache repeatedly
| and profiling seems to suggests
| the performance is handicated by repeated calls to new.
| Can anybody advice me on how to improve the performance or suggests
| alternatives please.
As others suggested, some type of hash table is likely to provide
the best results.
Here's a basic example:
double expensiveCompute(int a, int b); // the expensive function
class CachedCompute
{
struct Entry {
Entry(int a_, int b_)
: a(a_), b(b_), result
:expensiveCompute(a_,b_)) {}
int a, b;
double result;
};
enum { kCacheSize = 1024 };
std::vector<Entry> cache_;
static int getHashedIndex(int a, int b)
{ // A LOT OF ROOM FOR IMPROVEMENT HERE
return (17*a + 0x193*b) % kCacheSize;
}
public:
CachedCompute()
: cache_( kCacheSize, Entry(0,0) ) // init cache with dummy contents
{ }
double compute(int a, int b)
{
Entry& entry = cache_[ getHashedIndex(a,b) ];
if( entry.a!=a || entry.b!=b )
entry = Entry(a,b); //overwrite the non-matching entry
return entry.result;
}
};
Create an instance of CachedCompute and use its compute()
member function instead of the global expensiveCompute()
to benefit from the caching.
The getHashedIndex() function should be optimized to avoid
collisions (different values of a&b used repeatedly that
lead to the same index).
An application-specific optimization is sometimes possible.
Alternatively, consider a proven hashing approach such as the
FNV hash:
http://www.isthe.com/chongo/tech/comp/fnv/index.html
The table size (kCacheSize) can also be improved (the std::vector
is anyway more memory-efficient than std::map, the saving is
probably about 50%).
Another improvement might be to look at multiple hash table
entries when calling compute (e.g. the 4 following ones).
There are various ways to handle table refreshes in this case...
Anyway, I would expect the snippet above to provide much better
performance than an approach based on std::map ...
But please let me know of your results.
I hope this helps,
Ivan