U
Unknown Poster
Assuming I have used LWP to get an HTML document into a response
object, I'd like to know what module(s) to use for the following task.
I would like to disregard the content of HTML tags (that is, everything
in angled brackets) and then break the "real" text into words so that
I can do a frequency count, etc.
Would someone experienced in this sort of task recommend
a combination of the following (or others I may have missed) ?
Text::WordParse
HTML:arse
HTML:arser
HTML:ullParser
HTML::TokeParser
object, I'd like to know what module(s) to use for the following task.
I would like to disregard the content of HTML tags (that is, everything
in angled brackets) and then break the "real" text into words so that
I can do a frequency count, etc.
Would someone experienced in this sort of task recommend
a combination of the following (or others I may have missed) ?
Text::WordParse
HTML:arse
HTML:arser
HTML:ullParser
HTML::TokeParser