Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Perl Misc (http://www.velocityreviews.com/forums/f67-perl-misc.html)
-   -   regular expression (http://www.velocityreviews.com/forums/t869880-regular-expression.html)

dj 07-02-2003 02:18 AM

regular expression
 
Hi,
I am writing a script that parses an html file (which has been retrieved as
a scalar by LWP::UserAgent). The script looks for everything in between the
first <P> tag and the last </P> tag, with any number of <P> and </P> tags in
between. I am sure I have done something like this before, but for the life
of me I can't remember how... (maybe i did it before in lex). Anyone got
any neato suggestions?

Thanks for any help,
Drew



Nicholas Knight 07-02-2003 02:54 AM

Re: regular expression
 
on Tuesday 01 July 2003 07:18 pm, dj <thecommissioner@hotmail.com> wrote
in <3f02410e$0$30568$afc38c87@news.optusnet.com.au> :

> Hi,
> I am writing a script that parses an html file (which has been retrieved
> as
> a scalar by LWP::UserAgent). The script looks for everything in between
> the first <P> tag and the last </P> tag, with any number of <P> and </P>
> tags in
> between. I am sure I have done something like this before, but for the
> life
> of me I can't remember how... (maybe i did it before in lex). Anyone
> got any neato suggestions?


Are you looking for //s ? It makes '.' match newlines, too. I'd probably
do it like this (the 'i' to ignore case, as some people capitalize all
tags and some don't):

/<p>(.*)<\/p>/si


--
Nicholas Knight <nknight@runawaynet.com>

dj 07-02-2003 03:10 AM

Re: regular expression
 
Hi Nicholas,

yep, i had something along these lines,

while ($_ =~ s/.+<P>(.+)<\/P>.+/$1/gsi) {
print;
}

but no sub occurs. I have tried a few combinations, but no match :)



"Nicholas Knight" <nknight@runawaynet.com> wrote in message
news:bdthgf$10je22$1@ID-132594.news.dfncis.de...
> on Tuesday 01 July 2003 07:18 pm, dj <thecommissioner@hotmail.com> wrote
> in <3f02410e$0$30568$afc38c87@news.optusnet.com.au> :
>
> > Hi,
> > I am writing a script that parses an html file (which has been retrieved
> > as
> > a scalar by LWP::UserAgent). The script looks for everything in between
> > the first <P> tag and the last </P> tag, with any number of <P> and </P>
> > tags in
> > between. I am sure I have done something like this before, but for the
> > life
> > of me I can't remember how... (maybe i did it before in lex). Anyone
> > got any neato suggestions?

>
> Are you looking for //s ? It makes '.' match newlines, too. I'd probably
> do it like this (the 'i' to ignore case, as some people capitalize all
> tags and some don't):
>
> /<p>(.*)<\/p>/si
>
>
> --
> Nicholas Knight <nknight@runawaynet.com>




Martien Verbruggen 07-02-2003 03:27 AM

Re: regular expression
 
On 02 Jul 2003 03:02:47 GMT,
Martien Verbruggen <mgjv@tradingpost.com.au> wrote:

> A _very_ simpleminded approach could do this:
>
> my ($stuff) = /<P>(.*)</P>/i;


Addition: You also need the /s flag to match newlines. But again, I
wouldn't use it.

Martien
--
|
Martien Verbruggen | If at first you don't succeed, try again.
Trading Post Australia | Then quit; there's no use being a damn fool
| about it.


All times are GMT. The time now is 10:51 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.