Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > hwo to match more than 1 line?

Reply
Thread Tools

hwo to match more than 1 line?

 
 
Geoff Cox
Guest
Posts: n/a
 
      12-07-2003
Hello,

How do I capture text that goes over 2 lines?

The text could be say

<TD vAlign=top width="80%" colSpan=2>White Road, Northgate,
London N500 5JJJ</TD></TR>

The following code only gets the text up to and including Northgate,

if ($line =~ /<TD vAlign=top(.*?)<\/TD>/m) {
print OUT ("$1 \n");
}

Ideas please?!

Thanks

Geoff
 
Reply With Quote
 
 
 
 
Jay Tilton
Guest
Posts: n/a
 
      12-07-2003
Geoff Cox <(E-Mail Removed)> wrote:

: How do I capture text that goes over 2 lines?
:
: The text could be say
:
: <TD vAlign=top width="80%" colSpan=2>White Road, Northgate,
: London N500 5JJJ</TD></TR>
:
: The following code only gets the text up to and including Northgate,
:
: if ($line =~ /<TD vAlign=top(.*?)<\/TD>/m) {
^
-------------------------------------------^
The /m switch affects only how ^ and $ match, and your regex contains
neither of those metacharacters.

You want the /s switch, which lets . match a newline character.

: print OUT ("$1 \n");
: }

 
Reply With Quote
 
 
 
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      12-07-2003
Geoff Cox wrote:
> How do I capture text that goes over 2 lines?
>
> The text could be say
>
> <TD vAlign=top width="80%" colSpan=2>White Road, Northgate,
> London N500 5JJJ</TD></TR>
>
> The following code only gets the text up to and including
> Northgate,
>
> if ($line =~ /<TD vAlign=top(.*?)<\/TD>/m) {
> print OUT ("$1 \n");
> }
>
> Ideas please?!


Use the right modifier. /m seems not to be what you want. Look up in

perldoc perlre

what to use instead.

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

 
Reply With Quote
 
Tintin
Guest
Posts: n/a
 
      12-07-2003

"Geoff Cox" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> Hello,
>
> How do I capture text that goes over 2 lines?
>
> The text could be say
>
> <TD vAlign=top width="80%" colSpan=2>White Road, Northgate,
> London N500 5JJJ</TD></TR>
>
> The following code only gets the text up to and including Northgate,
>
> if ($line =~ /<TD vAlign=top(.*?)<\/TD>/m) {
> print OUT ("$1 \n");
> }
>
> Ideas please?!


You've discovered that regexes aren't very robust/easy/flexible when it
comes to parsing HTML. Use one of the HTML parsers on CPAN.


 
Reply With Quote
 
Geoff Cox
Guest
Posts: n/a
 
      12-07-2003
On Sun, 07 Dec 2003 09:39:24 GMT, http://www.velocityreviews.com/forums/(E-Mail Removed) (Jay Tilton)
wrote:

>Geoff Cox <(E-Mail Removed)> wrote:
>
>: How do I capture text that goes over 2 lines?
>:
>: The text could be say
>:
>: <TD vAlign=top width="80%" colSpan=2>White Road, Northgate,
>: London N500 5JJJ</TD></TR>
>:
>: The following code only gets the text up to and including Northgate,
>:
>: if ($line =~ /<TD vAlign=top(.*?)<\/TD>/m) {
> ^
>-------------------------------------------^
>The /m switch affects only how ^ and $ match, and your regex contains
>neither of those metacharacters.
>
>You want the /s switch, which lets . match a newline character.
>
>: print OUT ("$1 \n");
>: }



Jay,

thanks for that - I'm still not quite there - I am trying to get the
name and address only out of following - how should I do this? Geoff

<TR>
<TD vAlign=top align=left colSpan=4>
<H6><IMG height=10 alt=bullet
src="barnet_files/blue_bullet2.gif"
width=7>&nbsp;&nbsp;The College</H6></TD></TR>
<TR>
<TD align=left width="20%" colSpan=2><B>Head
Teacher</B></TD>
<TD vAlign=top width="80%" colSpan=2>Fred Smith</TD></TR>
<TR>
<TD align=left width="20%" colSpan=2><B>Address</B></TD>
<TD vAlign=top width="80%" colSpan=2>Cedar Road, Northgate,
Sussex N777 5RJ</TD></TR>


 
Reply With Quote
 
Geoff Cox
Guest
Posts: n/a
 
      12-07-2003
On Sun, 07 Dec 2003 10:52:05 +0100, Gunnar Hjalmarsson
<(E-Mail Removed)> wrote:

>Geoff Cox wrote:
>> How do I capture text that goes over 2 lines?
>>
>> The text could be say
>>
>> <TD vAlign=top width="80%" colSpan=2>White Road, Northgate,
>> London N500 5JJJ</TD></TR>
>>
>> The following code only gets the text up to and including
>> Northgate,
>>
>> if ($line =~ /<TD vAlign=top(.*?)<\/TD>/m) {
>> print OUT ("$1 \n");
>> }
>>
>> Ideas please?!

>
>Use the right modifier. /m seems not to be what you want. Look up in
>
> perldoc perlre
>
>what to use instead.


Gunnar,

think you are correct about the m but could you take a look at my
other email which show the text I am trying to use..?

Thanks

Geoff

 
Reply With Quote
 
Geoff Cox
Guest
Posts: n/a
 
      12-07-2003
On Sun, 7 Dec 2003 23:10:48 +1300, "Tintin" <(E-Mail Removed)> wrote:

>
>"Geoff Cox" <(E-Mail Removed)> wrote in message
>news:(E-Mail Removed).. .
>> Hello,
>>
>> How do I capture text that goes over 2 lines?
>>
>> The text could be say
>>
>> <TD vAlign=top width="80%" colSpan=2>White Road, Northgate,
>> London N500 5JJJ</TD></TR>
>>
>> The following code only gets the text up to and including Northgate,
>>
>> if ($line =~ /<TD vAlign=top(.*?)<\/TD>/m) {
>> print OUT ("$1 \n");
>> }
>>
>> Ideas please?!

>
>You've discovered that regexes aren't very robust/easy/flexible when it
>comes to parsing HTML. Use one of the HTML parsers on CPAN.
>


There seem to be a large number of them! any recommendation?!

Cheers

Geoff

 
Reply With Quote
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      12-07-2003
Geoff Cox wrote:
> I am trying to get the name and address only out of following - how
> should I do this? Geoff
>
> <TR>
> <TD vAlign=top align=left colSpan=4>
> <H6><IMG height=10 alt=bullet
> src="barnet_files/blue_bullet2.gif"
> width=7>&nbsp;&nbsp;The College</H6></TD></TR>
> <TR>
> <TD align=left width="20%" colSpan=2><B>Head
> Teacher</B></TD>
> <TD vAlign=top width="80%" colSpan=2>Fred Smith</TD></TR>
> <TR>
> <TD align=left width="20%" colSpan=2><B>Address</B></TD>
> <TD vAlign=top width="80%" colSpan=2>Cedar Road, Northgate,
> Sussex N777 5RJ</TD></TR>


That was quite a different question. This might do what you want:

if ( $line =~ /Head\s+Teacher.+?<TD[^>]+>([^<]+)
.+?
Address.+?<TD[^>]+>([^<]+)
/isx ) {
print "Name: $1\nAddress: $2\n";
}

But don't use it if you don't understand it. And even if you do
understand it, you may want to use a module for parsing HTML instead.

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

 
Reply With Quote
 
Jürgen Exner
Guest
Posts: n/a
 
      12-07-2003
Geoff Cox wrote:

You are asking the wrong question, but anyway...

> How do I capture text that goes over 2 lines?
>
> The text could be say
>
> <TD vAlign=top width="80%" colSpan=2>White Road, Northgate,
> London N500 5JJJ</TD></TR>
>
> The following code only gets the text up to and including Northgate,
>
> if ($line =~ /<TD vAlign=top(.*?)<\/TD>/m) {


To answer the question you did ask in the subject:
You are using the wrong modifier. Actually you are using exactly the
opposite one to the one you need.
Please "perldoc perlre" about what 'm' and what 's' do.

[...]
>
> Ideas please?!


The question you should have asked but didn't ask is: what is the right tool
to parse HTML?

And as has been answered a gazillion of times: parsing HTML correctly is
rocket science and nobody with a sane mind would attempt to do it using REs.
See 'perldoc -q "remove HTML"' for why and how and what to do instead.

jue


 
Reply With Quote
 
ko
Guest
Posts: n/a
 
      12-07-2003
Geoff Cox wrote:
> On Sun, 7 Dec 2003 23:10:48 +1300, "Tintin" <(E-Mail Removed)> wrote:
>
>
>>"Geoff Cox" <(E-Mail Removed)> wrote in message
>>news:(E-Mail Removed). ..


[snip]

>>>Ideas please?!

>>
>>You've discovered that regexes aren't very robust/easy/flexible when it
>>comes to parsing HTML. Use one of the HTML parsers on CPAN.

>
> There seem to be a large number of them! any recommendation?!


HTML:arser. If you're only interested in extracting text, here's an
example to get you started:

http://search.cpan.org/src/GAAS/HTML...-3.34/eg/htext

There are other example scripts in the parent directory.

HTH - keith

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Hwo to write %> into a file in asp js ASP General 1 05-11-2006 09:15 PM
Hwo to do Hello World From XSLT to JavaScript? RC Javascript 6 03-31-2005 03:21 PM
Hwo to do Hello World From XSLT to JavaScript? RC Java 1 03-29-2005 07:58 PM
Hwo to Upload file ? Arsalan ASP .Net 4 02-24-2005 05:27 AM
Hwo to rewrite attribute to tags generically moroder XML 1 10-15-2004 12:46 PM



Advertisments