Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > search question

Reply
Thread Tools

search question

 
 
hgwoss@gmx.de
Guest
Posts: n/a
 
      09-18-2005
Hi,

I would like to extract a certain link url and link title from an html
document, which is stored in a text file.

it may look like this:

"A lot of text. <a href="linkurl.html">Link Title</a> Even more Text."

My question is: What is the most efficient way of doing that?

 
Reply With Quote
 
 
 
 
Matija Papec
Guest
Posts: n/a
 
      09-18-2005
X-Ftn-To: http://www.velocityreviews.com/forums/(E-Mail Removed)

(E-Mail Removed) wrote:
>I would like to extract a certain link url and link title from an html
>document, which is stored in a text file.
>
>it may look like this:
>
>"A lot of text. <a href="linkurl.html">Link Title</a> Even more Text."
>
>My question is: What is the most efficient way of doing that?


from perldoc,
perldoc -q extract
=========
How do I extract URLs?
You can easily extract all sorts of URLs from HTML with
"HTML::SimpleLinkExtor" which handles anchors, images, objects,
frames, and many other tags that can contain a URL. If you need
anything more complex, you can create your own subclass of
"HTML::LinkExtor" or "HTML:arser". You might even use
"HTML::SimpleLinkExtor" as an example for something specifically
suited to your needs.

You can use URI::Find to extract URLs from an arbitrary text
document.


--
Matija
 
Reply With Quote
 
 
 
 
William James
Guest
Posts: n/a
 
      09-18-2005

(E-Mail Removed) wrote:
> Hi,
>
> I would like to extract a certain link url and link title from an html
> document, which is stored in a text file.
>
> it may look like this:
>
> "A lot of text. <a href="linkurl.html">Link Title</a> Even more Text."
>
> My question is: What is the most efficient way of doing that?


text = <<HERE
A lot of text. <a href="linkurl.html">Link Title</a>
Even more Text.
HERE

if text =~ /<a href="(.*?)">(.*?)<\/a>/m
printf "%s links to %s.\n", $2, $1
end

 
Reply With Quote
 
Jürgen Exner
Guest
Posts: n/a
 
      09-19-2005
William James wrote:
> (E-Mail Removed) wrote:
>> Hi,
>>
>> I would like to extract a certain link url and link title from an
>> html document, which is stored in a text file.
>>
>> it may look like this:
>>
>> "A lot of text. <a href="linkurl.html">Link Title</a> Even more
>> Text."
>>
>> My question is: What is the most efficient way of doing that?

>
> text = <<HERE
> A lot of text. <a href="linkurl.html">Link Title</a>
> Even more Text.
> HERE
>
> if text =~ /<a href="(.*?)">(.*?)<\/a>/m


Which works for the given example but of course fails for a myriad of other,
probably legitimate examples. See the FAQ and Google about why using simple
REs for parsing HTML is not a good idea at all.

jue


 
Reply With Quote
 
John Bokma
Guest
Posts: n/a
 
      09-19-2005
"Bill Segraves" <(E-Mail Removed)> wrote:

> "William James" <(E-Mail Removed)> wrote in message


> In Perl, the above code has numerous errors, and as such, is
> undeserving of the implied superlatives you assigned to it. Perhaps
> you intended to post the code to a different Usenet newsgroup.


Please ignore the Ruby troll.

--
John Small Perl scripts: http://johnbokma.com/perl/
Perl programmer available: http://castleamber.com/
Happy Customers: http://castleamber.com/testimonials.html

 
Reply With Quote
 
John Bokma
Guest
Posts: n/a
 
      09-20-2005
"Bill Segraves" <(E-Mail Removed)> wrote:

> "John Bokma" <(E-Mail Removed)> wrote in message
> news:Xns96D68FB6434E8castleamber@130.133.1.4...
>> "Bill Segraves" <(E-Mail Removed)> wrote:
>>
>> > "William James" <(E-Mail Removed)> wrote in message

>>
>> > In Perl, the above code has numerous errors, and as such, is
>> > undeserving of the implied superlatives you assigned to it. Perhaps
>> > you intended to post the code to a different Usenet newsgroup.

>>
>> Please ignore the Ruby troll.

>
> Normally, I do.
>
> In this case, however, the Ruby troll neglected to mention his code
> was written in Ruby, which might have been misleading to the OP,
> especially re: "jue" Exner's response. My response was intended for
> the benefit of the OP.


Ah, ok, I understand, apologies

> For the further benefit of the OP, what could be simpler than the
> first example given in the documentation for HTML::TokeParser? This
> "correct" code parses HTML with <A> tags and textual information
> spread across multiple lines, while the code the Ruby troll posted
> fails miserably on similarly-mangled HTML.


Yup, it's a troll. I mean, why is it hanging out in a Perl related group,
there must be a Ruby group.

--
John Small Perl scripts: http://johnbokma.com/perl/
Perl programmer available: http://castleamber.com/
Happy Customers: http://castleamber.com/testimonials.html

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Google search result like site search!! How? =?Utf-8?B?TGFrc2htaSBOYXJheWFuYW4uUg==?= ASP .Net 3 05-06-2005 02:08 AM
removing search engines from the search bar whatever.or.not@gmail.com Firefox 1 02-17-2005 12:06 PM
Search Bar not displaying search Engines Zimran Douglas Firefox 1 01-07-2005 02:30 PM
search within a search within a search - looking for better way...my script times out Abby Lee ASP General 5 08-02-2004 04:01 PM



Advertisments