Best database for implementing a cache

V

vj

Hi all,

I need to implement a cache of images that ae being fetched from a
remote server. So I am in need of a very fast file based database that
can fulfill my needs. The database should be embeddable with my
application and should store all its data in a file. I have explored
apache derby and hsqldb but they are probably overkill for my problem
since i do not need ACID transaction support moreover all data will be
stored in a single table rather than a single table.
Any suggestion which database might server me well.

Thanks,

VJ
 
E

Eric Sosman

vj wrote On 03/30/07 14:42,:
Hi all,

I need to implement a cache of images that ae being fetched from a
remote server. So I am in need of a very fast file based database that
can fulfill my needs.

How fast is the connection to the remote server? How
much faster than that connection must your database be in
order to qualify as "very fast?" ;-)
The database should be embeddable with my
application and should store all its data in a file. I have explored
apache derby and hsqldb but they are probably overkill for my problem
since i do not need ACID transaction support moreover all data will be
stored in a single table rather than a single table.

A distinction of considerable subtlety ...
Any suggestion which database might server me well.

If I'm asking a question whose answer is obvious to
everyone else, I apologize, but: What do you want the
database to do for you? The only requirement you've
mentioned is that you want to deposit the images in it
"very fast," but you haven't said what you want to do
with them afterwards. Do you need to retrieve the images
in "as received" condition, or do you need to pluck out
bits and pieces of transformed image data? Do you need
to do OCR on the images and index them by all the words
that are recognized? Do you need to do face recognition,
fingerprint matching, ...?

If all you need is rapid writing, you're in luck:
Every Unix variant comes with an absolutely free, VERY
fast, write-only database called /dev/null ;-)
 
D

Daniel Pitts

Hi all,

I need to implement a cache of images that ae being fetched from a
remote server. So I am in need of a very fast file based database that
can fulfill my needs. The database should be embeddable with my
application and should store all its data in a file. I have explored
apache derby and hsqldb but they are probably overkill for my problem
since i do not need ACID transaction support moreover all data will be
stored in a single table rather than a single table.
Any suggestion which database might server me well.

Thanks,

VJ

Why not just write then to the disk directly? Most File Systems can be
treated as a form a database, and it seems like thats kind of what you
want anyway.
 
V

vj

Why not just write then to the disk directly? Most File Systems can be
treated as a form a database, and it seems like thats kind of what you
want anyway.

Yes i am writing the image files directly to the filesystem as it is.
However i need to limit the size of the cache so that it may not grow
arbitarily large. that would require me to maintain access count
metric for evicting existing stored images. Since filesystem itself
would not let me assiciate this info with the image I tought of a
utilizing a database that would store the access count as well as all
the available images in the cache.

How fast is the connection to the remote server? How
much faster than that connection must your database be in
order to qualify as "very fast?" ;-)

The remote server is situated in another continent and will be
connected via a measly 100 Kbps isdn. The average image size is abt 1
Mb that would theoretically take 80 seconds to download. The clients
are connected to the server via 1000Mbps lan that can receive this
image in 0.008 seconds. So for the database connection that would
qualify as 'very fast' should let me acheive this theoretical limits
as close as possible ;-)
 
V

vj

If I'm asking a question whose answer is obvious to
everyone else, I apologize, but: What do you want the
database to do for you? The only requirement you've
mentioned is that you want to deposit the images in it
"very fast," but you haven't said what you want to do
with them afterwards. Do you need to retrieve the images
in "as received" condition, or do you need to pluck out
bits and pieces of transformed image data? Do you need
to do OCR on the images and index them by all the words
that are recognized? Do you need to do face recognition,
fingerprint matching, ...?


Sorry i think that i failed to mark the word cache in bold letter. May
i opine that would also be
obvious to every one else here that a cache server simply stores
frequently accessed data &
usually it does not transforms it in any way.
 
L

Lew

vj said:
Sorry i think that i failed to mark the word cache in bold letter. May
i opine that would also be
obvious to every one else here that a cache server simply stores
frequently accessed data &
usually it does not transforms it in any way.

Not hardly - it wasn't all that obvious to me, anyway. And even knowing what
something "usually" does is no help at all in knowing what you, particularly,
want.

May I opine that snide remarks to people who are trying to help you might have
an unfortunate effect on that help?

So how about you help people who want to help you?

-- Lew
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

vj said:
Yes i am writing the image files directly to the filesystem as it is.
However i need to limit the size of the cache so that it may not grow
arbitarily large. that would require me to maintain access count
metric for evicting existing stored images. Since filesystem itself
would not let me assiciate this info with the image I tought of a
utilizing a database that would store the access count as well as all
the available images in the cache.

What about keeping files on disk, having meta data in memory and
simply rebuilding meta data at startup (list all files on disk
and start with zero counters) ?

Arne
 
V

vj

I apologize for my sneeky comments for what seems as an honest reply ,
unfortunately misinterpreted as sarcastic. Surely i am indeed
greateful to you all becuase you have bailed me out countless times
but honestly the reply posted to my query was not what i was
expecting.
 
V

vj

What about keeping files on disk, having meta data in memory and
simply rebuilding meta data at startup (list all files on disk
and start with zero counters) ?

Arne

Certainly thats a good idea. How aboute mainting a hashtable and a
priority queue. The hashtable will store all the meta data associated
with a file and the priority queue will implement file aviction
policy. as soon as a a file is hit i will increase the hit count. On a
cache miss i will fetch the image , create an entry in the hash table
and push an entry in the queue. if the queue size if creater than a
certain treshhold i will pop the element from its head (with the
lowest access count) delete its corrosponding file / hashtable entry.

This seems to be a good idea but i have a small doubt. Does propirity
queue dynamicaly orders elements. I mean since the priority queue is
implemented as a heap in java hence element ordring only takes place
when elements are added. This will create problems as because if i
update access count in hashtable then they wont be reflected in the
structure of the queue. Hence when i pop element from its head for
eviction it might not be the one with minimum access count.

What do you think, Any other data structure that we can use for fast
lookup of elements
 
E

Eric Sosman

vj said:
Sorry i think that i failed to mark the word cache in bold letter. May
i opine that would also be
obvious to every one else here that a cache server simply stores
frequently accessed data &
usually it does not transforms it in any way.

Fine, but you still haven't explained what this <b>cache</b>
is supposed to do. (Well, you <i>have</i> explained, but saying
that it "simply stores frequently accessed data" is obviously
wrong: if it were right, you'd have already followed my earlier
suggestion to use /dev/null.)

So: What search criterion or criteria is or are used to
retrieve things from this <b>cache</b>? Can the <b>cache</b>
expand indefinitely, or is some kind of housecleaning policy
envisioned (and if so, what kind)? And how much faster than
your network connection must the <b>cache</b> be in order to
meet your requirement of "very fast?"

No one can become a successful programmer without the ability
to describe the problems he or she is trying to solve. More often
than not, getting the problem description right is the hardest
part of doing the job.
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

vj said:
Certainly thats a good idea. How aboute mainting a hashtable and a
priority queue. The hashtable will store all the meta data associated
with a file and the priority queue will implement file aviction
policy. as soon as a a file is hit i will increase the hit count. On a
cache miss i will fetch the image , create an entry in the hash table
and push an entry in the queue. if the queue size if creater than a
certain treshhold i will pop the element from its head (with the
lowest access count) delete its corrosponding file / hashtable entry.

This seems to be a good idea but i have a small doubt. Does propirity
queue dynamicaly orders elements. I mean since the priority queue is
implemented as a heap in java hence element ordring only takes place
when elements are added. This will create problems as because if i
update access count in hashtable then they wont be reflected in the
structure of the queue. Hence when i pop element from its head for
eviction it might not be the one with minimum access count.

What do you think, Any other data structure that we can use for fast
lookup of elements

What about:
HastMape, key=filename, value=(file content, list with access times)
cleanup code triggered either by time or every n'th access iterates over
HashMap and:
- remove entries that have too few or too old accesses
- trim list with access times for too old accesses for entries kept
?

Arne
 
V

vj

What about:
HastMape, key=filename, value=(file content, list with access times)
cleanup code triggered either by time or every n'th access iterates over
HashMap and:
- remove entries that have too few or too old accesses
- trim list with access times for too old accesses for entries kept
?

Arne

but wont the iternation over the hashmap be slow. O(n) ...
 
V

vj

Fine, but you still haven't explained what this <b>cache</b>
is supposed to do. (Well, you <i>have</i> explained, but saying
that it "simply stores frequently accessed data" is obviously
wrong: if it were right, you'd have already followed my earlier
suggestion to use /dev/null.)

So: What search criterion or criteria is or are used to
retrieve things from this <b>cache</b>? Can the <b>cache</b>
expand indefinitely, or is some kind of housecleaning policy
envisioned (and if so, what kind)? And how much faster than
your network connection must the <b>cache</b> be in order to
meet your requirement of "very fast?"

No one can become a successful programmer without the ability
to describe the problems he or she is trying to solve. More often
than not, getting the problem description right is the hardest
part of doing the job.

Ok let me describe the problem that i am working on in detail.

I am building a Map server that provides images from google maps
database. Now i went through the javascript at maps.google.com and was
able to reverse engineer the urls and query formats that they use. The
good ( or the bad) thing is that they provide map image in block
pieces. So i am building another http server that will provide the
clients with these blocks using a comparatively very simple api and
with a side effect of exploiting a cache of these block images in
order to provide quicker service.

So since fetching these blocks will be much faster from a local server
rather than a remote server, i want to implement this cache. The
client application will request appropriate blocks from the server and
display them in a pre described format. target platforms for client
apps are .Net and probably a webapp making use of AJAX.

I hope that it will make thing lot clearer then they previously were.
And would further strengten my point that i just need to save the
images and provide them back as it is rather then doing any
transformation on the server which is why i initially used the term
<strong> cache </strong>.
 
P

Patrick May

vj said:
Ok let me describe the problem that i am working on in detail.

I am building a Map server that provides images from google maps
database. Now i went through the javascript at maps.google.com and
was able to reverse engineer the urls and query formats that they
use. The good ( or the bad) thing is that they provide map image in
block pieces. So i am building another http server that will provide
the clients with these blocks using a comparatively very simple api
and with a side effect of exploiting a cache of these block images
in order to provide quicker service.

So since fetching these blocks will be much faster from a local
server rather than a remote server, i want to implement this
cache. The client application will request appropriate blocks from
the server and display them in a pre described format. target
platforms for client apps are .Net and probably a webapp making use
of AJAX.

Let me preface this with the full disclosure that I work for a
company (GigaSpaces) that produces a commercial version of the
solution I'm going to suggest. I am not responding to push our
product, but because I believe the underlying technology is a good fit
for your requirements.

The underlying technology is Jini (http://www.jini.org), and more
specifically the JavaSpaces service. A JavaSpace makes an ideal cache
for your image blocks, allowing them to be served at in-memory speeds
from your local server. The API is very simple, consisting of the
four core methods write(), read(), take(), and notify().
Pre-processing of your image blocks to the required formats is easily
accomplished through the use of the standard JavaSpaces master-worker
pattern.

The capabilities of Jini in general and JavaSpaces in particular
are not limited to caching, but they do fill that role well. Blitz
(http://www.dancres.org/blitz/) is an excellent open source
implementation and the commercial implementation I mentioned above is
available at http://www.gigaspaces.com.

Regards,

Patrick
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

vj said:
but wont the iternation over the hashmap be slow. O(n) ...

It is O(n). Slow is a relative concept.

When we are talking about network IO and disk IO then
any access to in memory data structures are usually
insignificant.

And remember the iteration is only done every X minute
or for every Y lookup.

Arne
 
S

steve

It is O(n). Slow is a relative concept.

When we are talking about network IO and disk IO then
any access to in memory data structures are usually
insignificant.

And remember the iteration is only done every X minute
or for every Y lookup.

Arne

then do it in another thread.
 
V

vj

then do it in another thread.

ok i agree that iterating over the hash map will be faster than the
Network & disk i/o . But still if i could avoid it will be better.
 
C

Chris Uppal

vj said:
I am building a Map server that provides images from google maps
database. Now i went through the javascript at maps.google.com and was
able to reverse engineer the urls and query formats that they use. The
good ( or the bad) thing is that they provide map image in block
pieces. So i am building another http server that will provide the
clients with these blocks using a comparatively very simple api and
with a side effect of exploiting a cache of these block images in
order to provide quicker service.

Legal question: have you checked this scheme with a real lawyer ? It sounds
very dodgy to me.
http://maps.google.com/intl/en_ALL/help/terms_local.html

So since fetching these blocks will be much faster from a local server
rather than a remote server, i want to implement this cache.

Why not just put a caching proxy server between yourself and maps.google.com ?
That sounds easier than reinventing the wheel, and may work better when it
comes things like expiring cache entries in a timely manner.

-- chris
 
A

alexandre_paterson

Certainly thats a good idea. How about mainting a hashtable and a
priority queue. The hashtable will store all the meta data associated
with a file and the priority queue will implement file aviction
policy. as soon as a a file is hit i will increase the hit count. On a
cache miss i will fetch the image , create an entry in the hash table
and push an entry in the queue. if the queue size if creater than a
certain treshhold i will pop the element from its head (with the
lowest access count) delete its corrosponding file / hashtable entry.
What do you think, Any other data structure that we can use for fast
lookup of elements

I may be misunderstanding but aren't you simply looking
for an LRU / MRU cache? (Least Recently Used / Most Recently
Used cache) If that's the case, you simply use a LinkedHashMap
and override one method to have the exact "aviction policy" [sic]
you want. This is well detailed in LinkedHashMap's JavaDocs.
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

vj said:
ok i agree that iterating over the hash map will be faster than the
Network & disk i/o . But still if i could avoid it will be better.

Only if you could do it without adding code and complexity
to your project.

Adding code and complexity for optimizations that are not
measurable is bad practice.

Arne
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,235
Members
46,821
Latest member
AleidaSchi

Latest Threads

Top