Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > regular expression

Reply
Thread Tools

regular expression

 
 
dj
Guest
Posts: n/a
 
      07-02-2003
Hi,
I am writing a script that parses an html file (which has been retrieved as
a scalar by LWP::UserAgent). The script looks for everything in between the
first <P> tag and the last </P> tag, with any number of <P> and </P> tags in
between. I am sure I have done something like this before, but for the life
of me I can't remember how... (maybe i did it before in lex). Anyone got
any neato suggestions?

Thanks for any help,
Drew


 
Reply With Quote
 
 
 
 
Nicholas Knight
Guest
Posts: n/a
 
      07-02-2003
on Tuesday 01 July 2003 07:18 pm, dj <(E-Mail Removed)> wrote
in <3f02410e$0$30568$(E-Mail Removed)> :

> Hi,
> I am writing a script that parses an html file (which has been retrieved
> as
> a scalar by LWP::UserAgent). The script looks for everything in between
> the first <P> tag and the last </P> tag, with any number of <P> and </P>
> tags in
> between. I am sure I have done something like this before, but for the
> life
> of me I can't remember how... (maybe i did it before in lex). Anyone
> got any neato suggestions?


Are you looking for //s ? It makes '.' match newlines, too. I'd probably
do it like this (the 'i' to ignore case, as some people capitalize all
tags and some don't):

/<p>(.*)<\/p>/si


--
Nicholas Knight <(E-Mail Removed)>
 
Reply With Quote
 
 
 
 
dj
Guest
Posts: n/a
 
      07-02-2003
Hi Nicholas,

yep, i had something along these lines,

while ($_ =~ s/.+<P>(.+)<\/P>.+/$1/gsi) {
print;
}

but no sub occurs. I have tried a few combinations, but no match



"Nicholas Knight" <(E-Mail Removed)> wrote in message
news:bdthgf$10je22$(E-Mail Removed)...
> on Tuesday 01 July 2003 07:18 pm, dj <(E-Mail Removed)> wrote
> in <3f02410e$0$30568$(E-Mail Removed)> :
>
> > Hi,
> > I am writing a script that parses an html file (which has been retrieved
> > as
> > a scalar by LWP::UserAgent). The script looks for everything in between
> > the first <P> tag and the last </P> tag, with any number of <P> and </P>
> > tags in
> > between. I am sure I have done something like this before, but for the
> > life
> > of me I can't remember how... (maybe i did it before in lex). Anyone
> > got any neato suggestions?

>
> Are you looking for //s ? It makes '.' match newlines, too. I'd probably
> do it like this (the 'i' to ignore case, as some people capitalize all
> tags and some don't):
>
> /<p>(.*)<\/p>/si
>
>
> --
> Nicholas Knight <(E-Mail Removed)>



 
Reply With Quote
 
Martien Verbruggen
Guest
Posts: n/a
 
      07-02-2003
On 02 Jul 2003 03:02:47 GMT,
Martien Verbruggen <(E-Mail Removed)> wrote:

> A _very_ simpleminded approach could do this:
>
> my ($stuff) = /<P>(.*)</P>/i;


Addition: You also need the /s flag to match newlines. But again, I
wouldn't use it.

Martien
--
|
Martien Verbruggen | If at first you don't succeed, try again.
Trading Post Australia | Then quit; there's no use being a damn fool
| about it.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Seek xpath expression where an attribute name is a regular expression GIMME XML 3 12-29-2008 03:11 PM
C/C++ language proposal: Change the 'case expression' from "integral constant-expression" to "integral expression" Adem C++ 42 11-04-2008 12:39 PM
C/C++ language proposal: Change the 'case expression' from "integral constant-expression" to "integral expression" Adem C Programming 45 11-04-2008 12:39 PM
Matching abitrary expression in a regular expression =?iso-8859-1?B?bW9vcJk=?= Java 8 12-02-2005 12:51 AM
Dynamically changing the regular expression of Regular Expression validator VSK ASP .Net 2 08-24-2003 02:47 PM



Advertisments