Text mining in Python

M

mk

Hello everyone,

I need to do the following:

(0. transform words in a document into word roots)

1. analyze a set of documents to see which words are highly frequent

2. detect clusters of those highly frequent words

3. map the clusters to some "special" keywords

4. rank the documents on clusters and "top n" most frequent words

5. provide search that would rank documents according to whether search
words were "special" cluster keywords or frequent words

Is there some good open source engine out there that would be suitable
to the task at hand? Anybody has experience with them?

Now, I do now about NLTK and Python bindings to UIMA. The thing is, I do
not know if those are good for the above task. If somebody has
experience with those or other and would be able to say if they're good
for this, please post.

Regards,
mk
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top