Hi ...
I am new member here ....
I have a done an assignment on Text Analysis. In the 1st phase of assignment (on text analysis) I have extracted text from multiple web pages and applied some rules Latent Semantic analysis. I have removed prepositions, articles, brackets etc, stoping words. Then I applied steming algorithm (Porter Algorithm) to remove suffixes. Then I stored remaining words in a seperate Hash Table for each document. Also I have collected their frequencies.
Now In 2nd phase I have to find the list of most common words in all the documents. Most common doesn't refer to most occuring words, mind you. means to say if a document contains "sick" and in this document or any other contains "ill" then it's also updated in common words list. the words will not be given by user. The existing words will be used to search . Any Idea or Algorithm plz???????
I am new member here ....
I have a done an assignment on Text Analysis. In the 1st phase of assignment (on text analysis) I have extracted text from multiple web pages and applied some rules Latent Semantic analysis. I have removed prepositions, articles, brackets etc, stoping words. Then I applied steming algorithm (Porter Algorithm) to remove suffixes. Then I stored remaining words in a seperate Hash Table for each document. Also I have collected their frequencies.
Now In 2nd phase I have to find the list of most common words in all the documents. Most common doesn't refer to most occuring words, mind you. means to say if a document contains "sick" and in this document or any other contains "ill" then it's also updated in common words list. the words will not be given by user. The existing words will be used to search . Any Idea or Algorithm plz???????