Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Perl Misc (http://www.velocityreviews.com/forums/f67-perl-misc.html)
-   -   Can't find a syntax error, hoping a second set of eyes will help (http://www.velocityreviews.com/forums/t952582-cant-find-a-syntax-error-hoping-a-second-set-of-eyes-will-help.html)

Jason C 09-24-2012 04:09 AM

Can't find a syntax error, hoping a second set of eyes will help
 
Can someone look at this and tell me what I'm messing up? I've been coding all night, and my eyes have gone fuzzy :-)

while ($text =~ #<a[^>]* href=(["'])*[^\1>]*\1[^>]*?>(.*?)</a>#gsi) {
if ($2 =~ /^http/i) {
$text =~ s#<a[^>]*? href=(["'])*([^\1>]*)\1[^>]*?>(.*?)</a>#$2#gsi;
}
}

The error is on the while() line (at least, I remove it and no more error). The error just says:

syntax error at blah.cgi line 239, near "if"
syntax error at blah.cgi line 246, near "}"

The purpose of the function is to remove the <a href=...></a> code in submitted text, but only if the linked text begins with http.

TIA,

Jason

Uri Guttman 09-24-2012 05:22 AM

Re: Can't find a syntax error, hoping a second set of eyes will help
 
>>>>> "JC" == Jason C <jwcarlton@gmail.com> writes:

JC> Can someone look at this and tell me what I'm messing up? I've been coding all night, and my eyes have gone fuzzy :-)
JC> while ($text =~ #<a[^>]* href=(["'])*[^\1>]*\1[^>]*?>(.*?)</a>#gsi) {

why do you think the # marks the start of a regex? only if you use m//
can you change the regex delim from /.
and ^ will not invert a char class for \1 as \1 isn't a char class
element. so even if you fix the regex delim, that will fail. finally,
why are you parsing out urls with a regex when there are modules that do
it correctly?

uri


Jason C 09-24-2012 09:28 AM

Re: Can't find a syntax error, hoping a second set of eyes will help
 
On Monday, September 24, 2012 1:03:03 AM UTC-4, Ben Morrow wrote:
>
> > while ($text =~ #<a[^>]* href=(["'])*[^\1>]*\1[^>]*?>(.*?)</a>#gsi) {

> ^^ m
>
> (I would suggest finding a highlighting editor. It makes this sort of
> syntactic mistake much easier to spot.)


Thanks, Ben. I didn't realize the m//; was required; since you can change the delimiter with s/// ad hoc, I thought you could here, too.

I'm using Notepad++, and while it helps me catch opening and ending brackets, it didn't do a lot in recognizing syntax errors (at least, not that I know of). What editor do you recommend?

Jason C 09-24-2012 09:35 AM

Re: Can't find a syntax error, hoping a second set of eyes will help
 
On Monday, September 24, 2012 1:23:40 AM UTC-4, Uri Guttman wrote:

> why do you think the # marks the start of a regex? only if you use m//
> can you change the regex delim from /.


Thanks to you, too, Uri. Like I replied to Ben a second ago, I thought thatsince you could replace the delimiter in s/// ad hoc, that you could in m//, too. Learn something new every day! :-)


> and ^ will not invert a char class for \1 as \1 isn't a char class
> element. so even if you fix the regex delim, that will fail.


Oh. Now THAT I did NOT know at all! It does explain a few other errors I'vehad, though, and couldn't figure out.


> finally,
> why are you parsing out urls with a regex when there are modules that do
> it correctly?


Two reasons:

1. I've been working with regex for a year or two, and while it's by no means a strong point in my vocabulary (yet), I'm at least familiar enough withit to usually figure it out.

2. I briefly looked for a module that would handle this correctly, but wasn't sure what to look for. And, I'm not sure that it warrants the including of a full module if it could potentially be done in a simple regex. If you can recommend a module that would be more stable and/or faster than what I'm doing, though, then I would definitely appreciate the reference!

FWIW, this modification did work:

while ($text =~ m#(<a[^>]* href=["'].*?["'].*?>)(.*?)(</a>)#gsi) {
$pattern = $1$2$3;
$repl = $2;

if ($2 =~ /^http/i) {
$text =~ s/$pattern/$repl/gsi;
}
}

Admittedly, I'm not sure why $2 is stored long enough for the if() statement, but inside of the if() statement it's empty. Storing them to a differentvariable worked for this purpose, but if there's a better way, I'm very much open to it.

Peter Makholm 09-24-2012 09:49 AM

Re: Can't find a syntax error, hoping a second set of eyes will help
 
Jason C <jwcarlton@gmail.com> writes:

>> > while ($text =~ #<a[^>]* href=(["'])*[^\1>]*\1[^>]*?>(.*?)</a>#gsi) {

>> ^^ m

>
> Thanks, Ben. I didn't realize the m//; was required; since you can
> change the delimiter with s/// ad hoc, I thought you could here, too.


You can change the delimiter, but the m is only optional when you use
the // delimiters.

//Makholm

Marc Girod 09-24-2012 10:30 AM

Re: Can't find a syntax error, hoping a second set of eyes will help
 
On Sep 24, 10:28*am, Jason C <jwcarl...@gmail.com> wrote:

> What editor do you recommend?


GNU emacs with cperl-mode

Marc

anotheranne 09-24-2012 12:19 PM

Re: Can't find a syntax error, hoping a second set of eyes will help
 
Jason C wrote:

> Can someone look at this and tell me what I'm messing up? I've been coding all night, and my eyes have gone fuzzy :-)
>
> while ($text =~ #<a[^>]* href=(["'])*[^\1>]*\1[^>]*?>(.*?)</a>#gsi) {
> if ($2 =~ /^http/i) {
> $text =~ s#<a[^>]*? href=(["'])*([^\1>]*)\1[^>]*?>(.*?)</a>#$2#gsi;
> }
> }


Whatever other errors your regex may have, I would suggest that
you stick with the regular m// and s/// constructs. You should of
course then escape the '/' in </a> . Changing this should make it run.

Don't use # as an eye-easy replacement for / because a) it is the perl
character for a comment, and b) in a regex (at least with the /x
modifier) it is also a metacharacter. Trouble will come your way if
you use this.

If you do want to get away from // and /// then use balanced
delimiters like m{} and s{}{} . See p319 in Friedl MASTERING REGULAR
EXPRESSIONS. O'Reilly.

When use use any alternate to m// the m is then mandatory. Only when
using // can you omit the m. thus // or m{} are valid constructs.

Also you can remove the ';' after the gsi

hope this helps.

anotheranne
>
> The error is on the while() line (at least, I remove it and no more error). The error just says:
>
> syntax error at blah.cgi line 239, near "if"
> syntax error at blah.cgi line 246, near "}"
>
> The purpose of the function is to remove the <a href=...></a> code in submitted text, but only if the linked text begins with http.
>
> TIA,
>
> Jason



anotheranne 09-24-2012 12:42 PM

Re: Can't find a syntax error, hoping a second set of eyes will help
 
Jason C wrote:

> On Monday, September 24, 2012 1:03:03 AM UTC-4, Ben Morrow wrote:
>>
>> > while ($text =~ #<a[^>]* href=(["'])*[^\1>]*\1[^>]*?>(.*?)</a>#gsi) {

>> ^^ m
>>
>> (I would suggest finding a highlighting editor. It makes this sort of
>> syntactic mistake much easier to spot.)

>
> Thanks, Ben. I didn't realize the m//; was required; since you can change the delimiter with s/// ad hoc, I thought you could here, too.
>
> I'm using Notepad++, and while it helps me catch opening and ending brackets, it didn't do a lot in recognizing syntax errors (at least, not that I know of). What editor do you recommend?


Padre is a nice perl IDE.

http://padre.perlide.org/

anotheranne

Scott Bryce 09-24-2012 03:11 PM

Re: Can't find a syntax error, hoping a second set of eyes will help
 
On 9/24/2012 3:28 AM, Jason C wrote:
> I'm using Notepad++,


I assume that means you are on a Windows box.

> What editor do you recommend?


I like UltraEdit.



Uri Guttman 09-24-2012 07:43 PM

Re: Can't find a syntax error, hoping a second set of eyes will help
 
>>>>> "JC" == Jason C <jwcarlton@gmail.com> writes:

JC> On Monday, September 24, 2012 1:23:40 AM UTC-4, Uri Guttman wrote:
>> why do you think the # marks the start of a regex? only if you use m//
>> can you change the regex delim from /.


JC> Thanks to you, too, Uri. Like I replied to Ben a second ago, I
JC> thought that since you could replace the delimiter in s/// ad hoc,
JC> that you could in m//, too. Learn something new every day! :-)

but s/// has the s to mark the next char. =~ ## has no leading marker so it
would just be a comment. also using # for the delimiter is just a bad
idea as it confuses many readers.

>> finally,
>> why are you parsing out urls with a regex when there are modules that do
>> it correctly?


JC> Two reasons:

JC> 1. I've been working with regex for a year or two, and while it's
JC> by no means a strong point in my vocabulary (yet), I'm at least
JC> familiar enough with it to usually figure it out.

good that you are studying them but it still is the wrong tool for
this. learning when regexes aren't a good solution is part of learning
regexes.

JC> 2. I briefly looked for a module that would handle this correctly,
JC> but wasn't sure what to look for. And, I'm not sure that it
JC> warrants the including of a full module if it could potentially be
JC> done in a simple regex. If you can recommend a module that would
JC> be more stable and/or faster than what I'm doing, though, then I
JC> would definitely appreciate the reference!

JC> FWIW, this modification did work:

JC> while ($text =~ m#(<a[^>]* href=["'].*?["'].*?>)(.*?)(</a>)#gsi) {

it will fail if the opening quote is " and the string has a ' inside
it. perfectly legal html but you can't parse it that way.

JC> Admittedly, I'm not sure why $2 is stored long enough for the if()
JC> statement, but inside of the if() statement it's empty. Storing
JC> them to a different variable worked for this purpose, but if
JC> there's a better way, I'm very much open to it.

you need to read more about regexes and the $1 stuff. they live until
the next regex is run (they are global).

uri


All times are GMT. The time now is 07:25 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.