Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Perl Misc (http://www.velocityreviews.com/forums/f67-perl-misc.html)
-   -   a backreference problem? (http://www.velocityreviews.com/forums/t882047-a-backreference-problem.html)

Geoff Cox 08-23-2003 09:53 PM

a backreference problem?
 
Hello,

I can use

$string =~ /="(.*)"\.doc/;
print $1;

which will get "docs/path/word"
from <a href="docs/path/word.doc">link to word doc</a> (A)

But! What if I have a file with say 100 lines similar to A above? How
do I deal with multiple values of $1?

Cheers

Geoff




Tad McClellan 08-23-2003 10:23 PM

Re: a backreference problem?
 
Geoff Cox <geoff.cox@blueyonder.co.uk> wrote:

> I can use
>
> $string =~ /="(.*)"\.doc/;
> print $1;



Yes, but you shouldn't.

You should never use the dollar-digit variables unless you
have first ensured that the match _succeeded_.


if ( $string =~ /="(.*)"\.doc/ )
{ print $1 }


--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas

Tad McClellan 08-23-2003 10:29 PM

Re: a backreference problem?
 
Geoff Cox <geoff.cox@blueyonder.co.uk> wrote:


> $string =~ /="(.*)"\.doc/;

^^^
^^^
> which will get "docs/path/word"

^^^^^^^^
^^^^^^^^ No it won't.


> from <a href="docs/path/word.doc">link to word doc</a> (A)



Your pattern requires a double quote before a dot.

The string does not contain a double quote before a dot.

The match must fail, and $1 will *not* be set, it will be left
with the same value that it had before the match was attempted.


> But! What if I have a file with say 100 lines similar to A above? How
> do I deal with multiple values of $1?



It depends on what "deal with" means when you say it.

The answer would probably involve one of Perl's looping constructs
and/or aggregate data types.

We would need a better question in order to give a better answer.

If "deal with" means "print dollar one" for instance, then the
answer would be "use a while(<FILE>) loop".


--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas

Peter Cooper 08-23-2003 11:05 PM

Re: a backreference problem?
 
"Geoff Cox" <geoff.cox@blueyonder.co.uk> wrote:
> which will get "docs/path/word"
> from <a href="docs/path/word.doc">link to word doc</a> (A)
>
> But! What if I have a file with say 100 lines similar to A above? How
> do I deal with multiple values of $1?


You can match 'many' things into an array like so:

my $data1 = q{
<a href="docs/path/word.doc">link to word doc</a>
<a href="docs/path/word2.doc">link to word doc</a>
<a href="docs/path/word3.doc">link to word doc</a>
};

(@names) = ($data1 =~ /="(.*?)\.doc"/gsi);
print $_ . "\n" for @names;

However, if you really want to parse HTML, and aren't just using HTML as an
example here, you will want to look into modules which are dedicated to this
purpose. Look at the HTML Parser set at
http://search.cpan.org/author/GAAS/HTML-Parser-3.31/ . HTML::LinkExtor (a
link extractor) may be of particular use to you.

Regards,
Peter Cooper



Geoff Cox 08-24-2003 08:41 AM

Re: a backreference problem?
 
On Sun, 24 Aug 2003 00:05:44 +0100, "Peter Cooper"
<newsfeed@boog.co.uk> wrote:

Peter et al ...

Now trying this - you will perhaps see better what I am trying to
do...problem with the passing of $1 to the sub getintro - I get an
uninitialized value in pattern match error ...

Cheers

Geoff

open(IN, "a2-left.htm");
open(OUT, ">>out");
open(INN, "total");

if (open(IN, "a2-left.htm")) {

$line = <IN>;

while ($line ne "") {
if ($line =~ /^<a href/) {
if ($line =~ /="(.*)\.doc/) {
&getintro($1);
}
}
$line = <IN>;

}
}
sub getintro {

@intro = <INN>;
for ($n=0;$n<900;$n++) {
if ($into[$n] =~ /$1/) {
print OUT ("$into[$n]\n");
print OUT ("$line[$n-1]\n");
}
}
}

close (IN);
close (OUT);
close (INN);




>"Geoff Cox" <geoff.cox@blueyonder.co.uk> wrote:
>> which will get "docs/path/word"
>> from <a href="docs/path/word.doc">link to word doc</a> (A)
>>
>> But! What if I have a file with say 100 lines similar to A above? How
>> do I deal with multiple values of $1?

>
>You can match 'many' things into an array like so:
>
>my $data1 = q{
><a href="docs/path/word.doc">link to word doc</a>
><a href="docs/path/word2.doc">link to word doc</a>
><a href="docs/path/word3.doc">link to word doc</a>
>};
>
>(@names) = ($data1 =~ /="(.*?)\.doc"/gsi);
>print $_ . "\n" for @names;
>
>However, if you really want to parse HTML, and aren't just using HTML as an
>example here, you will want to look into modules which are dedicated to this
>purpose. Look at the HTML Parser set at
>http://search.cpan.org/author/GAAS/HTML-Parser-3.31/ . HTML::LinkExtor (a
>link extractor) may be of particular use to you.
>
>Regards,
>Peter Cooper
>



Geoff Cox 08-24-2003 08:46 AM

Re: a backreference problem?
 
On Sun, 24 Aug 2003 09:41:55 +0100, Geoff Cox
<geoff.cox@blueyonder.co.uk> wrote:


I know there are 2 mistakes re $into where it should read $intro etc .
have corrected these but still get same error message....

Geoff

>On Sun, 24 Aug 2003 00:05:44 +0100, "Peter Cooper"
><newsfeed@boog.co.uk> wrote:
>
>Peter et al ...
>
>Now trying this - you will perhaps see better what I am trying to
>do...problem with the passing of $1 to the sub getintro - I get an
>uninitialized value in pattern match error ...
>
>Cheers
>
>Geoff
>
>open(IN, "a2-left.htm");
>open(OUT, ">>out");
>open(INN, "total");
>
>if (open(IN, "a2-left.htm")) {
>
>$line = <IN>;
>
>while ($line ne "") {
>if ($line =~ /^<a href/) {
>if ($line =~ /="(.*)\.doc/) {
>&getintro($1);
>}
>}
>$line = <IN>;
>
>}
>}
>sub getintro {
>
>@intro = <INN>;
>for ($n=0;$n<900;$n++) {
>if ($into[$n] =~ /$1/) {
>print OUT ("$into[$n]\n");
>print OUT ("$line[$n-1]\n");
>}
>}
>}
>
>close (IN);
>close (OUT);
>close (INN);
>
>
>
>
>>"Geoff Cox" <geoff.cox@blueyonder.co.uk> wrote:
>>> which will get "docs/path/word"
>>> from <a href="docs/path/word.doc">link to word doc</a> (A)
>>>
>>> But! What if I have a file with say 100 lines similar to A above? How
>>> do I deal with multiple values of $1?

>>
>>You can match 'many' things into an array like so:
>>
>>my $data1 = q{
>><a href="docs/path/word.doc">link to word doc</a>
>><a href="docs/path/word2.doc">link to word doc</a>
>><a href="docs/path/word3.doc">link to word doc</a>
>>};
>>
>>(@names) = ($data1 =~ /="(.*?)\.doc"/gsi);
>>print $_ . "\n" for @names;
>>
>>However, if you really want to parse HTML, and aren't just using HTML as an
>>example here, you will want to look into modules which are dedicated to this
>>purpose. Look at the HTML Parser set at
>>http://search.cpan.org/author/GAAS/HTML-Parser-3.31/ . HTML::LinkExtor (a
>>link extractor) may be of particular use to you.
>>
>>Regards,
>>Peter Cooper
>>



Geoff Cox 08-24-2003 10:17 AM

Re: a backreference problem?
 
On Sun, 24 Aug 2003 10:42:09 +0100, Geoff Cox
<geoff.cox@blueyonder.co.uk> wrote:


>which is odd...the value for $1 does get into the sub getintro but get
>the error message "uninitialized value in pattern match" for the line
>
>if ($into[$n] =~ /$1/) {


have improved code by using strict but still get above error message?!

use strict;

open(IN, "a2-left.htm");
open(OUT, ">>out");
open(INN, "total");


my $line = <IN>;

while ($line ne "") {
if ($line =~ /^<a href/) {
if ($line =~ /="(.*)\.doc/) {
&getintro($1);
}
}
$line = <IN>;
}

sub getintro {
my $n;
my @intro = <INN>;
for ($n=0;$n<900;$n++) {
if ($intro[$n] =~ /$1/) {
print OUT ("$intro[$n]\n");
print OUT ("$intro[$n-1]\n");
}
}
}
close (IN);
close (OUT);
close (INN);


James E Keenan 08-24-2003 12:35 PM

Re: a backreference problem?
 

"Geoff Cox" <geoff.cox@blueyonder.co.uk> wrote in message
news:beugkvoagpkce4ihrp76qimdbsi6onpr61@4ax.com...
> On Sun, 24 Aug 2003 00:05:44 +0100, "Peter Cooper"
> <newsfeed@boog.co.uk> wrote:
>
> Peter et al ...
>
> Now trying this - you will perhaps see better what I am trying to
> do...problem with the passing of $1 to the sub getintro - I get an
> uninitialized value in pattern match error ...
>
> Cheers
>
> Geoff
>
> open(IN, "a2-left.htm");
> open(OUT, ">>out");
> open(INN, "total");
>
> if (open(IN, "a2-left.htm")) {


Why are you asking to do something if and only if the filehandle is open?
You opened it 3 lines above.

>
> $line = <IN>;
>
> while ($line ne "") {


better for 2 above lines:

while (defined $line = <IN>) {
next if $line =~ /^$/;

> if ($line =~ /^<a href/) {


Right here it becomes apparent that you're trying to parse HTML -- which
means you should heed Peter's advice to check out HTML::Parser.

> if ($line =~ /="(.*)\.doc/) {
> &getintro($1);
> }
> }
> $line = <IN>;
>

What's the purpose of the line above?

> }
> }
> sub getintro {
>
> @intro = <INN>;


You don't appear to do anything with the content of @intro, so why read from
<INN> at all?

> for ($n=0;$n<900;$n++) {
> if ($into[$n] =~ /$1/) {


.... unless, that is, you have a typo in line above and meant $intro

But here $1 contains the result of the first captured expression on the last
matching line ... which may not always be what you want.

> print OUT ("$into[$n]\n");
> print OUT ("$line[$n-1]\n");
> }
> }
> }
>
> close (IN);
> close (OUT);
> close (INN);
>


Note: The subject of your OP was "backreference problem." But at no point
in the discussion have you used any backreferences (e.g., \1 as part of a
pattern match). This leads me to suspect that you just don't understand
Perl regexes very well. I recommend going to a good Perl text (e.g., the
llama) and carefully working through the exercises on regexes.




Tad McClellan 08-24-2003 02:54 PM

Re: a backreference problem?
 
Geoff Cox <geoff.cox@blueyonder.co.uk> wrote:


Have you seen the Posting Guidelines that are posted here frequently?


> any ideas?



Indent your code for human readability if you want humans to read it.

Many people will not take the time to read your code because you
did not take the time to make it easy for them to read your code.


> open(IN, "a2-left.htm");



You should always, yes *always*, check the return value from open().

You were doing that, but now you've taken it back out.

open(IN, 'a2-left.htm') or die "could not open 'a2-left.htm' $!";


> sub getintro {
>
> my $n;
>
> print ("$1\n");
>
> my @intro = <INN>;
> for ($n=0;$n<900;$n++) {



foreach my $n ( 0 .. 899 ) { # does the same thing


> if ($intro[$n] =~ /$1/) {
> &print;
> }
> }
>
> sub print {
> print OUT ("$intro[$n]\n");

^^
^^ $n is undefined



[snip TOFU, please do not do that anymore]

--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas

Tad McClellan 08-24-2003 02:58 PM

Re: a backreference problem?
 
Geoff Cox <geoff.cox@blueyonder.co.uk> wrote:


> &getintro($1);



Why are you passing an argument when the subroutine definition
never makes use of the argument that you passed?


> sub getintro {



my( $file ) = @_;


> my $n;
>
> print ("$1\n");



print ("$file\n");


> if ($intro[$n] =~ /$1/) {



if ($intro[$n] =~ /$file/) {


> &print;



print OUT "$intro[$n]\n"



[snip TOFU]

--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas


All times are GMT. The time now is 01:43 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.