Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > simple regex

Reply
Thread Tools

simple regex

 
 
r3gis
Guest
Posts: n/a
 
      06-05-2007
Hi
I am trying to extract all URLs ending with php or cgi or pl from one
website with the following code :
foreach $judge( $res->content=~m#((http://[a-z-\/\.~]+\.(php|cgi|
pl)))#g)
{
print $judge,"\n";
}
But for some reason I get redundant results :

http://www.kanazawa-gu.ac.jp/~hayash...in/log/env.cgi
cgi
http://www.bsnoop.de/cgi-bin/jenv.cgi
cgi

etc.

Could someone explain to me why the file extension is present in this
result set . What am I doing wrong ?

 
Reply With Quote
 
 
 
 
Paul Lalli
Guest
Posts: n/a
 
      06-05-2007
On Jun 5, 12:33 pm, r3gis <regi...@gmail.com> wrote:
> Hi
> I am trying to extract all URLs ending with php or cgi or pl from one
> website with the following code :
> foreach $judge( $res->content=~m#((http://[a-z-\/\.~]+\.(php|cgi|
> pl)))#g)
> {
> print $judge,"\n";
> }
> But for some reason I get redundant results :
>
> http://www.kanazawa-gu.ac.jp/~hayash...in/log/env.cgi
> cgihttp://www.bsnoop.de/cgi-bin/jenv.cgi
> cgi
>
> etc.
>
> Could someone explain to me why the file extension is present in this
> result set . What am I doing wrong ?


You have multiple capturing parentheses in your pattern match. A
pattern match in list context (such as that imposed by the foreach
loop) returns a list of ALL captured parentheses.

Change the ones you don't want to capture to be noncapturing, by
adding a ?: right after the (

See also:
perldoc perlre
perldoc perlretut
perldoc perlreref

Paul Lalli

 
Reply With Quote
 
 
 
 
jeevs
Guest
Posts: n/a
 
      06-06-2007
well paul i tested this on windows n works fine... So is it really
related to multiple paranthesis?.
So I think, the input has to be checked. But r3gis please follow
Paul's advice as I may be wrong being a newbie

#!/usr/bin/perl
use strict;
use warnings;
my @arr = ('http://www.kanazawa-gu.ac.jp/~hayashiy/cgi-bin/log/
env.cgi', 'http://www.bsnoop.de/cgi-bin/jenv.cgi','asdadada');
foreach (@arr) {
if ($_=~m!((http://[a-z-\/\.~]+\.(php|cgi|pl)))!g) {
print $_;
}
}






 
Reply With Quote
 
Paul Lalli
Guest
Posts: n/a
 
      06-06-2007
On Jun 6, 2:11 am, jeevs <jeevan.ing...@gmail.com> wrote:
> well paul i tested this on windows n works fine... So is it really
> related to multiple paranthesis?.


You tested *what* on Windows? The code that r3gis posted, or the code
that you posted? The code that r3gis posted is incomplete, so I'd
like to see the actual program you used. The code that you posted has
nothing at all to do with the original problem.


Confused,
Paul Lalli

 
Reply With Quote
 
jeevs
Guest
Posts: n/a
 
      06-06-2007
On Jun 6, 3:38 pm, Paul Lalli <mri...@gmail.com> wrote:
> On Jun 6, 2:11 am, jeevs <jeevan.ing...@gmail.com> wrote:
>
> > well paul i tested this on windows n works fine... So is it really
> > related to multiple paranthesis?.


> You tested *what* on Windows? The code that r3gis posted, or the code
> that you posted?


Sorry for my irrelevant post ... I meant the code posted by me which I
accept was not at all related to the original problem and I apologize
for taking your and others time into this.

r3gis as suggested by Paul you can replace the following line in your
code

foreach $judge( $res->content=~m#((http://[a-z-\/\.~]+\.(php|cgi|
pl)))#g)

by

foreach $judge( $res->content=~m!(http://[a-z-\/\.~]+\.(?hp|cgi|pl))!
g)

Thanks Paul. I will be carefull next time.





 
Reply With Quote
 
r3gis
Guest
Posts: n/a
 
      06-16-2007
I did not realize that the capturing parentheses can be nested in
another one independently of the whole regex.

Thanks for help. Everything is working right now as it should :]

 
Reply With Quote
 
Michele Dondi
Guest
Posts: n/a
 
      06-16-2007
On Sat, 16 Jun 2007 09:02:16 -0000, r3gis <> wrote:

>I did not realize that the capturing parentheses can be nested in
>another one independently of the whole regex.


Yep: <http://perlmonks.org/?node_id=442322>


Michele
--
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How make regex that means "contains regex#1 but NOT regex#2" ?? seberino@spawar.navy.mil Python 3 07-01-2008 03:06 PM
String Pattern Matching: regex and Python regex documentation Xah Lee Java 1 09-22-2006 07:11 PM
Is ASP Validator Regex Engine Same As VS2003 Find Regex Engine? =?Utf-8?B?SmViQnVzaGVsbA==?= ASP .Net 2 10-22-2005 02:43 PM
Java regex imposture re: Perl regex compatibility a_c_Attlee@yahoo.com Java 2 05-06-2005 12:16 AM
perl regex to java regex Rick Venter Java 5 11-06-2003 10:55 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57