Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Pattern Matching problem!

Reply
Thread Tools

Pattern Matching problem!

 
 
Francis Sylvester
Guest
Posts: n/a
 
      11-14-2005
Hi,

I'm a Perl newbie and am having a nightmare trying to get the code below
working. I'm trying to fetch a webpage and if a link within the page matches
the search criterion - return the text after the link. It doesn't seem to be
working and I'm wondering if it's because the pattern match is within the
while loop. If anybody can shed some light I'd be eternally grateful!

Cheers,
Francis

# --------------------------
use LWP::Simple;
use HTML::TokeParser;

my $document = get("http://www.anexamplesite.com");
my $mymatch = "searchstring";

my $parser = HTML::TokeParser->new(\$document);

while ($token = $parser->get_tag("a")) {
if ($token->[1]->{"href"} =~ /$mymatch/) {
# print $server.$token->[1]->{href}."\n";
$document =~ /$searchstring(.+?)someidentifier/;
print "$1";
}
}


 
Reply With Quote
 
 
 
 
A. Sinan Unur
Guest
Posts: n/a
 
      11-14-2005
"Francis Sylvester" <(E-Mail Removed)> wrote in
news:AH8ef.16551$(E-Mail Removed) k:

> I'm a Perl newbie and am having a nightmare trying to get the code
> below working. I'm trying to fetch a webpage and if a link within the
> page matches the search criterion - return the text after the link. It
> doesn't seem to be working and I'm wondering


As it is, we have no idea "doesn't seem to be working means". Please
read the posting guidelines to find out how you can help yourself, and,
in the process, help others help you.

use strict;
use warnings;

missing.

> use LWP::Simple;
> use HTML::TokeParser;
>
> my $document = get("http://www.anexamplesite.com");
> my $mymatch = "searchstring";
>
> my $parser = HTML::TokeParser->new(\$document);
>
> while ($token = $parser->get_tag("a")) {
> if ($token->[1]->{"href"} =~ /$mymatch/) {
> # print $server.$token->[1]->{href}."\n";
> $document =~ /$searchstring(.+?)someidentifier/;


The exact contents of $mymatch, $searchstring and whatever
someidentifier might have something to do with what's actually being
matched, no?

> print "$1";


You are not capturing anything, why do you expect there to be anything
valid in $1?

Sinan
--
A. Sinan Unur <(E-Mail Removed)>
(reverse each component and remove .invalid for email address)

comp.lang.perl.misc guidelines on the WWW:
http://mail.augustmail.com/~tadmc/cl...uidelines.html

 
Reply With Quote
 
 
 
 
it_says_BALLS_on_your forehead
Guest
Posts: n/a
 
      11-14-2005

Francis Sylvester wrote:
> Hi,
>
> I'm a Perl newbie and am having a nightmare trying to get the code below
> working. I'm trying to fetch a webpage and if a link within the page matches
> the search criterion - return the text after the link. It doesn't seem to be
> working and I'm wondering if it's because the pattern match is within the
> while loop. If anybody can shed some light I'd be eternally grateful!
>
> Cheers,
> Francis
>
> # --------------------------
> use LWP::Simple;
> use HTML::TokeParser;
>
> my $document = get("http://www.anexamplesite.com");
> my $mymatch = "searchstring";
>
> my $parser = HTML::TokeParser->new(\$document);
>
> while ($token = $parser->get_tag("a")) {
> if ($token->[1]->{"href"} =~ /$mymatch/) {


try:
if ( $token->[1]{href} =~ /$mymatch/o ) {

> # print $server.$token->[1]->{href}."\n";
> $document =~ /$searchstring(.+?)someidentifier/;
> print "$1";
> }
> }


 
Reply With Quote
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      11-15-2005
it_says_BALLS_on_your forehead wrote:
> Francis Sylvester wrote:
>>
>>while ($token = $parser->get_tag("a")) {
>> if ($token->[1]->{"href"} =~ /$mymatch/) {

>
> try:
> if ( $token->[1]{href} =~ /$mymatch/o ) {


I fail to see why that would make a difference. Could you please explain
why you think it would?

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
 
Reply With Quote
 
it_says_BALLS_on_your forehead
Guest
Posts: n/a
 
      11-15-2005

Gunnar Hjalmarsson wrote:
> it_says_BALLS_on_your forehead wrote:
> > Francis Sylvester wrote:
> >>
> >>while ($token = $parser->get_tag("a")) {
> >> if ($token->[1]->{"href"} =~ /$mymatch/) {

> >
> > try:
> > if ( $token->[1]{href} =~ /$mymatch/o ) {

>
> I fail to see why that would make a difference. Could you please explain
> why you think it would?
>


I looked up HTML::TokeParse in CPAN.

The first Example displayed illustrated that the way to get the href
was:

my $url = $token->[1]{href} || "-";

....i noticed that the OP did not use the same syntax. I didn't know if
this was causing his problem. the 'o' at the end of the pattern was
just to optimize the pattern match, since it doesn't seem like the OP
needed to recompile the regex every time...

 
Reply With Quote
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      11-15-2005
it_says_BALLS_on_your forehead wrote:
> Gunnar Hjalmarsson wrote:
>>it_says_BALLS_on_your forehead wrote:
>>>Francis Sylvester wrote:
>>>>
>>>>while ($token = $parser->get_tag("a")) {
>>>> if ($token->[1]->{"href"} =~ /$mymatch/) {
>>>
>>>try:
>>>if ( $token->[1]{href} =~ /$mymatch/o ) {

>>
>>I fail to see why that would make a difference. Could you please explain
>>why you think it would?

>
> I looked up HTML::TokeParse in CPAN.


That's a good start, I suppose.

> The first Example displayed illustrated that the way to get the href
> was:
>
> my $url = $token->[1]{href} || "-";
>
> ...i noticed that the OP did not use the same syntax. I didn't know if
> this was causing his problem.


The reason why I asked is that I thought that

$token->[1]->{"href"}

is always the same as

$token->[1]{href}

following Perl's syntax for references and data structures.

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
 
Reply With Quote
 
it_says_BALLS_on_your forehead
Guest
Posts: n/a
 
      11-15-2005

Gunnar Hjalmarsson wrote:
> it_says_BALLS_on_your forehead wrote:
> > Gunnar Hjalmarsson wrote:
> >>it_says_BALLS_on_your forehead wrote:
> >>>Francis Sylvester wrote:
> >>>>
> >>>>while ($token = $parser->get_tag("a")) {
> >>>> if ($token->[1]->{"href"} =~ /$mymatch/) {
> >>>
> >>>try:
> >>>if ( $token->[1]{href} =~ /$mymatch/o ) {
> >>
> >>I fail to see why that would make a difference. Could you please explain
> >>why you think it would?

> >
> > I looked up HTML::TokeParse in CPAN.

>
> That's a good start, I suppose.
>
> > The first Example displayed illustrated that the way to get the href
> > was:
> >
> > my $url = $token->[1]{href} || "-";
> >
> > ...i noticed that the OP did not use the same syntax. I didn't know if
> > this was causing his problem.

>
> The reason why I asked is that I thought that
>
> $token->[1]->{"href"}
>
> is always the same as
>
> $token->[1]{href}
>
> following Perl's syntax for references and data structures.


ahh, i think you're right. pg. 254 Programming Perl 3rd ed.

"The arrow is optional between brackets or braces, or between a closing
bracket or brace and a parenthesis for an indirect function call."

 
Reply With Quote
 
A. Sinan Unur
Guest
Posts: n/a
 
      11-15-2005
Abigail <(E-Mail Removed)> wrote in
news:(E-Mail Removed):

> A. Sinan Unur ((E-Mail Removed)) wrote on MMMMCDLVIII
> September MCMXCIII in
> <URL:news:Xns970EBA81F5AA9asu1cornelledu@127.0.0.1 >:
> "Francis Sylvester" <(E-Mail Removed)> wrote in
> news:AH8ef.16551$(E-Mail Removed) k:


....

> > $document =~ /$searchstring(.+?)someidentifier/;
>
> The exact contents of $mymatch, $searchstring and whatever
> someidentifier might have something to do with what's actually
> being matched, no?
>
> > print "$1";
>
> You are not capturing anything, why do you expect there to be
> anything valid in $1?



> Not capturing? I'd say the parens in
> /$searchstring(.+?)someidentifier/ capture (if the match is
> succesful), or there's a bug in perl.


Arrgh! Thank you very much for catching that.

Sinan
--
A. Sinan Unur <(E-Mail Removed)>
(reverse each component and remove .invalid for email address)

comp.lang.perl.misc guidelines on the WWW:
http://mail.augustmail.com/~tadmc/cl...uidelines.html

 
Reply With Quote
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      11-15-2005
Francis Sylvester wrote:
> I'm a Perl newbie and am having a nightmare trying to get the code below
> working. I'm trying to fetch a webpage and if a link within the page matches
> the search criterion - return the text after the link.
>
> use LWP::Simple;
> use HTML::TokeParser;


Yes, using a module for parsing an HTML document is a good idea.

> my $document = get("http://www.anexamplesite.com");
> my $mymatch = "searchstring";
>
> my $parser = HTML::TokeParser->new(\$document);
>
> while ($token = $parser->get_tag("a")) {
> if ($token->[1]->{"href"} =~ /$mymatch/) {
> # print $server.$token->[1]->{href}."\n";
> $document =~ /$searchstring(.+?)someidentifier/;


What's that? After you have possibly found your search string, you let
the program search the whole document using a simple regex. Doing so
makes no sense to me.

Either you'd better stick to a simple regex, and skip the parsing
module, or (better) taking advantage of the module you are using, and
doing something like:

while ( my $token = $parser->get_tag('a') ) {
if ($token->[1]{href} =~ /$mymatch/) {
print $parser->get_text('a')."\n";
}
}

(I'm not sure if that's what you're looking for, but hopefully you get
the idea.)

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
 
Reply With Quote
 
Francis Sylvester
Guest
Posts: n/a
 
      11-15-2005
> Either you'd better stick to a simple regex, and skip the parsing module,
> or (better) taking advantage of the module you are using, and doing
> something like:
>
> while ( my $token = $parser->get_tag('a') ) {
> if ($token->[1]{href} =~ /$mymatch/) {
> print $parser->get_text('a')."\n";
> }
> }
>
> (I'm not sure if that's what you're looking for, but hopefully you get the
> idea.)
>


Many thanks for all your replies. I'm sorry, I should have been clearer -
the code executes without error messages but I sometimes get unwanted
results in $1. After closer inspection, I think it's because sometimes it's
returning $1 from the earlier pattern match ( if ($token->[1]->{"href"} =~
/$mymatch/) rather than the pattern match I wanted ($document =~
/$searchstring(.+?)someidentifier/
Is there a way to reset the value of $1?

Many thanks,
Francis


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Help with Pattern matching. Matching multiple lines from while reading from a file. Bobby Chamness Perl Misc 2 05-03-2007 06:02 PM
Matching neighbouring words of a pattern using Regex CV Perl 2 08-31-2004 12:27 AM
Pattern matching : not matching problem Marc Bissonnette Perl Misc 9 01-13-2004 05:52 PM
Pattern matching help! grep emails from file! danpres2k Perl 3 08-25-2003 02:47 PM
A newbie question on pattern matching DelphiDude Perl 3 07-26-2003 12:54 PM



Advertisments