Building a good search for your site is a very challenging project.
Here are some tips from my experience. I had to build search for my
e-commerce site
http://www.mspiercing.com
Biggest problem is misspells. People often misspell words and if you do not
account for it they end up with empty results and quit the site thinking you
do not have what you looking for. There is a free NetSpell component you can
download from
http://sourceforge.net/projects/netspell/ . I end up modifying
the code to suit my needs. But the major part is ok there.
Then I played with Free Text but my conclusion it's not suitable for phrase
searching. So i end up writing my own. Here are the steps.
1. Create dictionary of dead words (such as 'the', 'in'....). That list
depends on your site. For example my site has a dead word 'ring'. Almost
anything there can be called 'ring'
2. Create substitution dictionary. For example on my site people look for
needle or needles. Essentially same product must come up in a search. Hence
I do the substitution on the fly. Mostly plural to singular form. But not
always. For example on my site people might look for 'belly ring' or 'navel
ring'. Hence 'Navel' = 'Belly' in my substitution dictionary.
3. Create dictionary of words that is used on a site. Simply by scanning
your Title and Description fields. Pass it through 'Dead Words' and
'Substitution' dictionaries and create a table (WordId, Word)
gold = 1
plated = 2
....
4. Create 'Search Helper' table. Sometimes it's called Markov's chain. You
do not need PHD to understand what it is.
I choose to use chain with length 3.
(ItemId, WordId1, WordId2, WordId3, Weight)
So for example ItemId: 25, Title: "Gold plated belly button ring with gem"
becomes following
gold = 1
plated = 2
belly = 3
gem = 4
(ring, with - dead words)
Now Markov's chaing with lenght 5 for this item with weight = 1.01
25,1,2,3,1.01
25,2,3,4,1.01
25,3,4,-1,1.01
25,4,-1,-1,1.01
I assign higher weight for keywords that are in Title that to those that are
in Description.
So idea is to bring up those where 3 words match together higher than 3
words match separately.
So if someone looking for 'Gold plated ring' item 25 will be ranked first.
Cause 'Gold plated' go together.
If someone looking for 'Gold belly ring' the ones that have a title 'Gold
belly ring' will be ranked first. And 'Gold plated' will be bellow them.
So if someone is looking for 'Gold belly ring' you issue following SQL
statements (something like this, i did not check it for SQL correctness)
"Gold Belly ring" = 1,3
SELECT ItemId FROM (
SELECT ItemId, 10*(Weight) FROM tblSearchHelper WHERE WordId1 = 1 AND
Word2=3
UNION
SELECT ItemId, Weigth FROM tblSearchHelper WHERE WordId1 = 1 OR Word2 = 3
) tblTemp
ORDER BY SUM(Weight) DESC
Then items will be sorted by weight in descending order. We multiplied
weight by 10 in first SELECT so the items that have 'gold ring' together
outweighed those that do not have them together.
Obviously it works for my type of site where i control the content.
George.