Go Back   Velocity Reviews > Newsgroups > PERL
User Name
Password
Register FAQ Members List Calendar Search Today's Posts Mark Forums Read

Reply

PERL - Re: Capture only first match in regular expression

 
Thread Tools Search this Thread
Old 04-19-2009, 08:17 AM   #1
Default Re: Capture only first match in regular expression



Zapanaz <http://joecosby.com/code/mail.pl> wrote:

> I am parsing/page scraping some HTML. I know the first anchor tag <a>
> contains information I want.
>
> So I do this:
>
> if($content =~ /.*(<a.*<\/a>).*/i){
> $anchorContent = $1;


Another poster suggested that regular expressions aren't sufficient
for this. But you may be able to do it anyway if you can confidently
predict features of the incoming HTML.

That is, if you know "know the first anchor tag <a> contains
information" you want, you may also know other things about the HTML
you're trying to parse.

Given an anchor of the general form:

<a href=foo possible-other-arrtibutes=bar> Anchor-text </a>

If you know in advance that the "Anchor-text" is *not* an <IMG
src=...> tag and that the "Anchor-text" does not itself contain any
other tags (such as, say, "<i>Anchor-text</i>) then you could use:

if($content =~ /(<a\s[^>]+>[^<]*<\/a>)/i)
{
$anchorContent = $1;
}

Match the <a literally
Require some matching whitespace after the 'a'
Match anything that can occur within an opening <A...> tag
Match the closing '>' of the opening <a tag
Match any text except the '<' that will signal the closing </a> tag
Match the closing </a> tag

Won't work if the incoming HTML is arbitrary because you might have:

<a href=foo><img src=bar></a> or
<a href=foo> <i> Yow<b>!</b></i> </a>

I'm no expert but I suspect that to reliably match what you want from
any arbitrary HTML, you'll have to write a more general parser.

--
Mike Spencer Nova Scotia, Canada


Mike Spencer
  Reply With Quote
Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
SuperVideoCap work as a broadcast capture and screen capture and record tool. hely0123 Media 0 10-30-2007 08:59 AM
Need help on Modelsim VHDL syntax? ASAP:) kaji General Help Related Topics 0 03-14-2007 10:43 PM
Need help on a Modelsim VHDL Syntax? ASAP:) kaji Software 0 03-14-2007 10:43 PM
Need Help on a Modelsim VHDL Syntax....ASAP:) kaji Hardware 0 03-14-2007 10:41 PM
Capture Card and Software Advice Scott DVD Video 1 04-18-2004 08:39 PM




SEO by vBSEO 3.3.2 ©2009, Crawlability, Inc.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46