G
goodarm
Gurus,
I am relatively new to Perl so please bear with me. I am trying to
write a simple scrapper for a non-English Web pages. For that purpose I
am using HTML::TokeParser. Now, I am looking to extract some content I
need and generate another HTML page (whch potentially will have notes
in multiple languages). The pages I am scrapping are written using
numeric characters, e.g. оду, when I am extracting
them, then injecting into my HTML page they get converted into
charaecters. All I want - is to preserve the original numeric
characters, as it seems to be the easiest way to build my result page.
How do I do that?
A sample code:
sub parseResponce($$) {
my $data = shift;
my $stream = new HTML::TokeParser($data);
while (my $tag = $stream->get_tag("p")) {
if (...) {
$buff = $stream->get_trimmed_text("/p");
}
}
Thanks in advance, Victor
I am relatively new to Perl so please bear with me. I am trying to
write a simple scrapper for a non-English Web pages. For that purpose I
am using HTML::TokeParser. Now, I am looking to extract some content I
need and generate another HTML page (whch potentially will have notes
in multiple languages). The pages I am scrapping are written using
numeric characters, e.g. оду, when I am extracting
them, then injecting into my HTML page they get converted into
charaecters. All I want - is to preserve the original numeric
characters, as it seems to be the easiest way to build my result page.
How do I do that?
A sample code:
sub parseResponce($$) {
my $data = shift;
my $stream = new HTML::TokeParser($data);
while (my $tag = $stream->get_tag("p")) {
if (...) {
$buff = $stream->get_trimmed_text("/p");
}
}
Thanks in advance, Victor