Want to extract the proxy list by using regexp.

H

Hongyi Zhao

Hi all,

I want to extract the proxy list given in the following url:

http://www.cybersyndrome.net/pla5.html

which is in the following form:

---------------
[snipped]

202.99.29.27:80
221.11.27.110:8080
ip-72-55-191-6.static.privatedns.com:3128
114.30.47.10:80
116.52.155.237:80
204.73.37.112:80
220.227.90.154:8080
211.136.253.234:80
host04.wilsonareasdips.w.subnet.rcn.com:8080

[snipped]
-----------------

Firstly, I use wget to obtin the above webpage:

wget -c http://www.cybersyndrome.net/pla5.html -O pla5

Then I want to use some regular expressions to extract the proxy list,
who can give me some hints?

Regards,
 
T

Tad J McClellan

I want to extract the proxy list given in the following url:

http://www.cybersyndrome.net/pla5.html

Then I want to use some regular expressions to extract the proxy list,
who can give me some hints?


Regular expressions are most often not the Right Tool for processing
HTML data.

A module that understands HTML is best for processing HTML data.


------------------------------
#!/usr/bin/perl
use warnings;
use strict;
use HTML::TreeBuilder;
use LWP::Simple;

my $html = get 'http://www.cybersyndrome.net/pla5.html';
my $tree = HTML::TreeBuilder->new_from_content($html);

foreach my $elem ( $tree->find_by_attribute('onmouseout', 'd()') ) {
print $elem->as_text, "\n";
}
 
H

Hongyi Zhao

Regular expressions are most often not the Right Tool for processing
HTML data.

A module that understands HTML is best for processing HTML data.


------------------------------
#!/usr/bin/perl
use warnings;
use strict;
use HTML::TreeBuilder;
use LWP::Simple;

my $html = get 'http://www.cybersyndrome.net/pla5.html';
my $tree = HTML::TreeBuilder->new_from_content($html);

foreach my $elem ( $tree->find_by_attribute('onmouseout', 'd()') ) {
print $elem->as_text, "\n";
}
------------------------------

Very good, thanks a lot.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top