Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Want to extract the proxy list by using regexp.

Reply
Thread Tools

Want to extract the proxy list by using regexp.

 
 
Hongyi Zhao
Guest
Posts: n/a
 
      01-29-2009
Hi all,

I want to extract the proxy list given in the following url:

http://www.cybersyndrome.net/pla5.html

which is in the following form:

---------------
[snipped]

202.99.29.27:80
221.11.27.110:8080
ip-72-55-191-6.static.privatedns.com:3128
114.30.47.10:80
116.52.155.237:80
204.73.37.112:80
220.227.90.154:8080
211.136.253.234:80
host04.wilsonareasdips.w.subnet.rcn.com:8080

[snipped]
-----------------

Firstly, I use wget to obtin the above webpage:

wget -c http://www.cybersyndrome.net/pla5.html -O pla5

Then I want to use some regular expressions to extract the proxy list,
who can give me some hints?

Regards,

--
..: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :.
 
Reply With Quote
 
 
 
 
Tad J McClellan
Guest
Posts: n/a
 
      01-29-2009
Hongyi Zhao <(E-Mail Removed)> wrote:


> I want to extract the proxy list given in the following url:
>
> http://www.cybersyndrome.net/pla5.html



> Then I want to use some regular expressions to extract the proxy list,
> who can give me some hints?



Regular expressions are most often not the Right Tool for processing
HTML data.

A module that understands HTML is best for processing HTML data.


------------------------------
#!/usr/bin/perl
use warnings;
use strict;
use HTML::TreeBuilder;
use LWP::Simple;

my $html = get 'http://www.cybersyndrome.net/pla5.html';
my $tree = HTML::TreeBuilder->new_from_content($html);

foreach my $elem ( $tree->find_by_attribute('onmouseout', 'd()') ) {
print $elem->as_text, "\n";
}
------------------------------


--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
 
Reply With Quote
 
 
 
 
Hongyi Zhao
Guest
Posts: n/a
 
      01-29-2009
On Thu, 29 Jan 2009 06:50:36 -0600, Tad J McClellan
<(E-Mail Removed)> wrote:

>Hongyi Zhao <(E-Mail Removed)> wrote:
>
>
>> I want to extract the proxy list given in the following url:
>>
>> http://www.cybersyndrome.net/pla5.html

>
>
>> Then I want to use some regular expressions to extract the proxy list,
>> who can give me some hints?

>
>
>Regular expressions are most often not the Right Tool for processing
>HTML data.
>
>A module that understands HTML is best for processing HTML data.
>
>
>------------------------------
>#!/usr/bin/perl
>use warnings;
>use strict;
>use HTML::TreeBuilder;
>use LWP::Simple;
>
>my $html = get 'http://www.cybersyndrome.net/pla5.html';
>my $tree = HTML::TreeBuilder->new_from_content($html);
>
>foreach my $elem ( $tree->find_by_attribute('onmouseout', 'd()') ) {
> print $elem->as_text, "\n";
>}
>------------------------------


Very good, thanks a lot.

--
..: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
New Web Proxy List - View Restricted Content with web proxy melati1979@gmail.com Cisco 1 02-25-2009 08:51 PM
hrpicot - cant extract what i want from page Adam Akhtar Ruby 7 03-31-2008 12:20 AM
How do i extract vidios when winrar wont extract them??? help plzzzzzzzz smuttdog@sc.rr.com Computer Support 2 12-23-2007 07:03 AM
extract the soap message from the proxy class RA ASP .Net Web Services 1 05-01-2005 09:02 AM
Howto: extract a 'column' from a list of lists into a new list? Greg Brunet Python 7 07-02-2003 03:20 AM



Advertisments