Le 9/4/09 1:11 AM, Stefan Weiss a écrit :
Le 9/3/09 3:43 PM, Stefan Weiss a écrit :
Well, who will choice the terms to index ?
Who will built for each file its own array of terms ?
Who will built the links for each term (to the files and inside them)?
The indexer will do all of that.
From the point where the data are complete and in an object (or a
simple array) I suppose that most of the job is made.
Not necessarily. You need both parts for an efficient search engine: the
index and the lookup algorithm. The index lookup needs to be fast, and
able to sort the results in a meaningful way.
<
http://cjoint.com/?jdvO4bUE6Q> 1500 items
(without index ... not in SANstore)
| var liste = [
| '00.htm',
| '000.htm',
| '0000000000000001.txt',
| '001.htm',
| '12-1.gif',
| '20-100_100tre.htm',
| '20-100_100tre2.htm',
That's just a list of file names again, not a full text index. It has
only 1500 entries, which isn't even close to what we're dealing here.
It has 1500 entries, will the CD contain more than 1500 files ?
With these simple entries (they could have been lines of a cvs file,
each line been a card of the file with name, date, list of indexed
terms, short introduction ...)
I didn't understand the "not in SANstore" part - how is that relevant?
I havn't more complicated example in stock (in store ? in SAM's shop).
If you would have one I'll be glad to see it.
Searching one or more terms along this list is very fast because we have
only to keep each line containing one of the terms : a single loop on
the 1500 lines (or entries). The new list of files, expected relatively
short, can then be easily manipulated to show what wanted.
About indexation of a list of terms met in the files I suppose we can
have an array of them
terms = [
'add 12 125 956',
'addition 1 8 274 315 977 1235',
...
where the numbers are the indexes to find the correct files stored in
another array.
This method would have to be faster.
Maybe it takes more room in memory ? Not sure.
Regarding your other post: Spotlight is only available on OSX, and
(AFAIK) doesn't have a JavaScript front-end. It may be possible to burn
a its index to a CD, but without the Spotlight executable, that won't
help much.
At least that could be a solution for a specific environment ;-)
TNO's suggestion has a similar problem: it requires WSH to be installed
and accessible from an HTML page (unlikely). It will be afwully slow as
well, because each search will have to read the complete contents of the
I suppose that it would be better to have all the content written in memory.
CD. And then it probably won't find "à bientôt" because the source
encoding doesn't match the search encoding.
Once Reg Exp will plan that \w is no more only ASCII characters but
those of more complet charsets, perhaps will we can match more seriously
(or easily) !english words, even if search functions were made by an
illiterate guy from US.
JSSINDEX still looks like the way to go (didn't test it, though). BTW, I
just checked, Lush is available as Debian and Ubuntu packages. If there
aren't any other requirements, getting the indexer to work should be a
piece of cake.
Something in Ruby ?
<
http://books.google.fr/books?id=OBh...esult&ct=result&resnum=8#v=onepage&q=&f=false>