M
mk
Hello everyone,
I need to do the following:
(0. transform words in a document into word roots)
1. analyze a set of documents to see which words are highly frequent
2. detect clusters of those highly frequent words
3. map the clusters to some "special" keywords
4. rank the documents on clusters and "top n" most frequent words
5. provide search that would rank documents according to whether search
words were "special" cluster keywords or frequent words
Is there some good open source engine out there that would be suitable
to the task at hand? Anybody has experience with them?
Now, I do now about NLTK and Python bindings to UIMA. The thing is, I do
not know if those are good for the above task. If somebody has
experience with those or other and would be able to say if they're good
for this, please post.
Regards,
mk
I need to do the following:
(0. transform words in a document into word roots)
1. analyze a set of documents to see which words are highly frequent
2. detect clusters of those highly frequent words
3. map the clusters to some "special" keywords
4. rank the documents on clusters and "top n" most frequent words
5. provide search that would rank documents according to whether search
words were "special" cluster keywords or frequent words
Is there some good open source engine out there that would be suitable
to the task at hand? Anybody has experience with them?
Now, I do now about NLTK and Python bindings to UIMA. The thing is, I do
not know if those are good for the above task. If somebody has
experience with those or other and would be able to say if they're good
for this, please post.
Regards,
mk