Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Need help with parsing data

Reply
Thread Tools

Need help with parsing data

 
 
Shan
Guest
Posts: n/a
 
      08-09-2006
So I need code that will go through a list of URLs (formatted as
http://www.google.com) and for each url get the following information:

1. The url after the href= within the following tags <link
rel="alternate" and />

So if there is <link rel="alternate" type="application/atom+xml"
title="Atom" href="http://hello.typepad.com/hello/atom.xml" /> I want
the http://hello.typepad.com/hello/atom.xml


2. everything bewtween the following tags <title> and </title>
so if there is <title>hello, typepad</title> I want hello, typepad

3. everything between the tags <h2 id="banner-description"> and </h2>


4. Finally i would like the results to be saved to a delimited file in
the following format:

column 1: original url
column 2: data obtained from step 1
column 3: data obtained from step 2
column 4: data obtained from step 3

if there is no result for any one of the steps a null should be saved.


I would like to thank whoever can provide me with the code in advance,
Thank you.

 
Reply With Quote
 
 
 
 
DJ Stunks
Guest
Posts: n/a
 
      08-09-2006
Shan wrote:
> So I need code that will go through a list of URLs (formatted as
> http://www.google.com) and for each url get the following information:
>
> 1. The url after the href= within the following tags <link
> rel="alternate" and />
>
> So if there is <link rel="alternate" type="application/atom+xml"
> title="Atom" href="http://hello.typepad.com/hello/atom.xml" /> I want
> the http://hello.typepad.com/hello/atom.xml
>
>
> 2. everything bewtween the following tags <title> and </title>
> so if there is <title>hello, typepad</title> I want hello, typepad
>
> 3. everything between the tags <h2 id="banner-description"> and </h2>
>
>
> 4. Finally i would like the results to be saved to a delimited file in
> the following format:
>
> column 1: original url
> column 2: data obtained from step 1
> column 3: data obtained from step 2
> column 4: data obtained from step 3
>
> if there is no result for any one of the steps a null should be saved.
>
>
> I would like to thank whoever can provide me with the code in advance,
> Thank you.


it is highly unlikely that anyone will do so for a simple "thanks".
check out jobs.perl.org for someone willing to follow orders in return
for compensation.

-jp

 
Reply With Quote
 
 
 
 
John Bokma
Guest
Posts: n/a
 
      08-09-2006
"Shan" <> wrote:

> So I need code that will go through a list of URLs (formatted as
> http://www.google.com) and for each url get the following information:
>
> 1. The url after the href= within the following tags <link
> rel="alternate" and />
>
> So if there is <link rel="alternate" type="application/atom+xml"
> title="Atom" href="http://hello.typepad.com/hello/atom.xml" /> I want
> the http://hello.typepad.com/hello/atom.xml
>
>
> 2. everything bewtween the following tags <title> and </title>
> so if there is <title>hello, typepad</title> I want hello, typepad
>
> 3. everything between the tags <h2 id="banner-description"> and </h2>



I use HTML::TreeBuilder for this, since it makes life really easy. See
http://johnbokma.com/perl/ for several examples (Web automation).

For example 3. can be done as:

my $root = HTML::TreeBuilder->new_from_content( $content );

:
:

my @column4;
push @column4, $_->as_trimmed_text
for $root->look_down( _tag => h2, id =>'banner-description' );

> I would like to thank whoever can provide me with the code in advance,
> Thank you.


I can provide the code, and forms to thank me are here:
http://johnbokma.com/wish-list.html

Either Object Oriented Perl or Perl Best Practices would be fine with me
since directly and indirectly you will contribute back to the Perl
community.

--
John Bokma Freelance software developer
&
Experienced Perl programmer: http://castleamber.com/
 
Reply With Quote
 
Tad McClellan
Guest
Posts: n/a
 
      08-10-2006
Shan <> wrote:

> Subject: Need help with parsing data



What part is it that you need help with?


(you should use a module that understands XHTML data if you need
to process XHTML data.
)


> I would like to thank whoever can provide me with the code in advance,



What makes you think that someone will write your program for you?


--
Tad McClellan SGML consulting
Perl programming
Fort Worth, Texas
 
Reply With Quote
 
Shan
Guest
Posts: n/a
 
      08-10-2006
Thanks for your advice. i will work on writing a script today and see
what kind of results I get.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[ANN] Parsing Tutorial and YARD 1.0: A C++ Parsing Framework Christopher Diggins C++ 0 07-09-2007 09:01 PM
[ANN] Parsing Tutorial and YARD 1.0: A C++ Parsing Framework Christopher Diggins C++ 0 07-09-2007 08:58 PM
Parsing data file, need help with the logic guser@packetstorm.org Perl Misc 6 06-27-2006 08:55 PM
SAX Parsing - Weird results when parsing content between tags. Naren XML 0 05-11-2004 07:25 PM
Perl expression for parsing CSV (ignoring parsing commas when in double quotes) GIMME Perl 2 02-11-2004 05:40 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57