Hp said:
Hi All,
Thanks a lot for all your replies.
My requirement is as follows:
I need to read a text file, eliminate certain special characters(like !
, - = + ), and then convert it to lower case and then remove certain
stopwords(like and, a, an, by, the etc) which is there in another txt
file.
Then, i need to run it thru a stemmer(a program which converts words
like running to run, ie, converts them to roots words).
Then i need to create a term-by-document matrix, which would be a
matrix, where in M(i,j) will give the number of times the term j occurs
in the document i.
My situation as of now is as below:
I have read the file contents into a string variable, removed/replaced
the special characters with a space using the replace function, and
then converted the string completely to lower case, using the transform
function.
I would really appreciate .any help, thanks i advance.
Thanks,
Hp
I know this may sound sacriliegious in a C++ newsgroup and all, but
does the text processing program have to be written in C++?
There are several dedicated text processing tools such as awk or sed,
or scripting languages (like Perl) that are specifically designed for
text stream editing. While certainly none of these alternatives is
particularly accessible, none has a steep learning curve either.
The power of regular expressions for manipulating text is difficult to
match in a C++ program without such support, at least in my experience.
And since I am not (too much of) a language snob, I recommend choosing
the best language for the job, even if it's not the best language. For
example, lowercasing a file's content with sed is a simple command
sed -e 's/[A-Z]/[a-z]/g' inputfile
Writing a C++ program to do the same would more involved. The good news
is that tr1's regex brings regular expression support to C++. So if a
C++ solution is required, I would look at regex to see whether it can
help solve your problem.
And if you do write the program in a language other than C++, some here
will be able to forgive you. But just don't tell your friends what you
have done.
Greg