What is best searching algorithm for URL

S

sandeep

Our team is developing proxy server(in VC++)which can handle 5000
clients. I have to implement cache part so when ever a new request com
from client I have to check the request URL content is in cache of
proxy and send to client if it is cache, if it is not there then it
have to get data from web server and store in proxy server cache.

so i am thinking to use binary tree search(or AVL tree) to search
request URL content in cache if it is not there to insert in it
is it a good idea so that insertion and searching is faster
I also used hash table and key I has chosen according to first
character in URL
So now in that bucket it contain double linked list now I have search
in it, for that I am thinking to use binary tree
 
V

Vladimir Oka

sandeep said:
Our team is developing proxy server(in VC++)which can handle 5000
clients. I have to implement cache part so when ever a new request com
from client I have to check the request URL content is in cache of
proxy and send to client if it is cache, if it is not there then it
have to get data from web server and store in proxy server cache.

so i am thinking to use binary tree search(or AVL tree) to search
request URL content in cache if it is not there to insert in it
is it a good idea so that insertion and searching is faster
I also used hash table and key I has chosen according to first
character in URL
So now in that bucket it contain double linked list now I have search
in it, for that I am thinking to use binary tree

Your question is best answered in comp.programming, feel free to come
back here if you encounter C specific problems while implementing your
solution. Note: you say you're using VC++ -- if you're writing C++,
comp.lang.c++ is the place to go (but still ask algorithm questions in
comp.programming).
 
M

Malcolm

sandeep said:
Our team is developing proxy server(in VC++)which can handle 5000
clients. I have to implement cache part so when ever a new request com
from client I have to check the request URL content is in cache of
proxy and send to client if it is cache, if it is not there then it
have to get data from web server and store in proxy server cache.

so i am thinking to use binary tree search(or AVL tree) to search
request URL content in cache if it is not there to insert in it
is it a good idea so that insertion and searching is faster
I also used hash table and key I has chosen according to first
character in URL
So now in that bucket it contain double linked list now I have search
in it, for that I am thinking to use binary tree
C has no build in hash tables nor binary trees, so you will have to
implement them yourself This is fiddly, though Chuck Falconer has a hash
library on his website which he is happy for people to use.
Anything else is comp.programming, as it is the algorithm which is your
problem. However either will probably provide perfectly acceptable
performance. .
 
W

Walter Roberson

Our team is developing proxy server(in VC++)which can handle 5000
clients. I have to implement cache part
binary tree search(or AVL tree) to search
hash table and key I has chosen according to first
character in URL

Implement something clean and maintainable first, and *then*
measure to see if the performance is acceptable.

If you need unusually high performance searching, you need a lot
more information about your architecture -- points such as the
amount of primary cache you have; the size of the cache line
fetched from secondary cache; the interprocess communications
mechanisms to notify that something has entered or gone out of cache;
effective shared-memory mechanisms; locks and semaphores to
ensure cache coherency; details about the filesystem, about block
allocation strategies within the filesystem, details about the
seek times (i.e., you always want to go to a -nearby- disk block, but
"nearby" will depend upon in-track seek times, track-to-track seek
times, head-switch times, intelligence of the controller, level
at which the RAID is happening...)

And you shouldn't be getting too far into any of these until you
first read Knuth's "The Art of Computer Programming" volume on
"Searching and Sorting".
 
C

CBFalconer

Malcolm said:
.... snip ...

C has no build in hash tables nor binary trees, so you will have
to implement them yourself This is fiddly, though Chuck Falconer
has a hash library on his website which he is happy for people to
use. Anything else is comp.programming, as it is the algorithm
which is your problem. However either will probably provide
perfectly acceptable performance. .

At <http://cbfalconer.home.att.net/download/>. The nmalloc package
is licensed under GPL (not GLPL), but other licenses can be
negotiated.

--
Some informative links:
http://www.geocities.com/nnqweb/
http://www.catb.org/~esr/faqs/smart-questions.html
http://www.caliburn.nl/topposting.html
http://www.netmeister.org/news/learn2quote.html
 
T

tedu

sandeep said:
I also used hash table and key I has chosen according to first
character in URL

the first character of a url is likely to be a spectacularly bad choice
of key.
 
M

Malcolm

CBFalconer said:
Correction - the hashlib package is ... nmalloc is unrestricted.
Why did you not LGPL it?
Not all commerical programmers work for evil huge corporations, you know.
Some are kids in bedrooms trying to make a useful living.
 
C

CBFalconer

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,183
Messages
2,570,967
Members
47,520
Latest member
KrisMacono

Latest Threads

Top