Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Can't find a syntax error, hoping a second set of eyes will help

Reply
Thread Tools

Can't find a syntax error, hoping a second set of eyes will help

 
 
Jason C
Guest
Posts: n/a
 
      09-24-2012
Can someone look at this and tell me what I'm messing up? I've been coding all night, and my eyes have gone fuzzy

while ($text =~ #<a[^>]* href=(["'])*[^\1>]*\1[^>]*?>(.*?)</a>#gsi) {
if ($2 =~ /^http/i) {
$text =~ s#<a[^>]*? href=(["'])*([^\1>]*)\1[^>]*?>(.*?)</a>#$2#gsi;
}
}

The error is on the while() line (at least, I remove it and no more error). The error just says:

syntax error at blah.cgi line 239, near "if"
syntax error at blah.cgi line 246, near "}"

The purpose of the function is to remove the <a href=...></a> code in submitted text, but only if the linked text begins with http.

TIA,

Jason
 
Reply With Quote
 
 
 
 
Uri Guttman
Guest
Posts: n/a
 
      09-24-2012
>>>>> "JC" == Jason C <(E-Mail Removed)> writes:

JC> Can someone look at this and tell me what I'm messing up? I've been coding all night, and my eyes have gone fuzzy
JC> while ($text =~ #<a[^>]* href=(["'])*[^\1>]*\1[^>]*?>(.*?)</a>#gsi) {

why do you think the # marks the start of a regex? only if you use m//
can you change the regex delim from /.
and ^ will not invert a char class for \1 as \1 isn't a char class
element. so even if you fix the regex delim, that will fail. finally,
why are you parsing out urls with a regex when there are modules that do
it correctly?

uri

 
Reply With Quote
 
 
 
 
Jason C
Guest
Posts: n/a
 
      09-24-2012
On Monday, September 24, 2012 1:03:03 AM UTC-4, Ben Morrow wrote:
>
> > while ($text =~ #<a[^>]* href=(["'])*[^\1>]*\1[^>]*?>(.*?)</a>#gsi) {

> ^^ m
>
> (I would suggest finding a highlighting editor. It makes this sort of
> syntactic mistake much easier to spot.)


Thanks, Ben. I didn't realize the m//; was required; since you can change the delimiter with s/// ad hoc, I thought you could here, too.

I'm using Notepad++, and while it helps me catch opening and ending brackets, it didn't do a lot in recognizing syntax errors (at least, not that I know of). What editor do you recommend?
 
Reply With Quote
 
Jason C
Guest
Posts: n/a
 
      09-24-2012
On Monday, September 24, 2012 1:23:40 AM UTC-4, Uri Guttman wrote:

> why do you think the # marks the start of a regex? only if you use m//
> can you change the regex delim from /.


Thanks to you, too, Uri. Like I replied to Ben a second ago, I thought thatsince you could replace the delimiter in s/// ad hoc, that you could in m//, too. Learn something new every day!


> and ^ will not invert a char class for \1 as \1 isn't a char class
> element. so even if you fix the regex delim, that will fail.


Oh. Now THAT I did NOT know at all! It does explain a few other errors I'vehad, though, and couldn't figure out.


> finally,
> why are you parsing out urls with a regex when there are modules that do
> it correctly?


Two reasons:

1. I've been working with regex for a year or two, and while it's by no means a strong point in my vocabulary (yet), I'm at least familiar enough withit to usually figure it out.

2. I briefly looked for a module that would handle this correctly, but wasn't sure what to look for. And, I'm not sure that it warrants the including of a full module if it could potentially be done in a simple regex. If you can recommend a module that would be more stable and/or faster than what I'm doing, though, then I would definitely appreciate the reference!

FWIW, this modification did work:

while ($text =~ m#(<a[^>]* href=["'].*?["'].*?>)(.*?)(</a>)#gsi) {
$pattern = $1$2$3;
$repl = $2;

if ($2 =~ /^http/i) {
$text =~ s/$pattern/$repl/gsi;
}
}

Admittedly, I'm not sure why $2 is stored long enough for the if() statement, but inside of the if() statement it's empty. Storing them to a differentvariable worked for this purpose, but if there's a better way, I'm very much open to it.
 
Reply With Quote
 
Peter Makholm
Guest
Posts: n/a
 
      09-24-2012
Jason C <(E-Mail Removed)> writes:

>> > while ($text =~ #<a[^>]* href=(["'])*[^\1>]*\1[^>]*?>(.*?)</a>#gsi) {

>> ^^ m

>
> Thanks, Ben. I didn't realize the m//; was required; since you can
> change the delimiter with s/// ad hoc, I thought you could here, too.


You can change the delimiter, but the m is only optional when you use
the // delimiters.

//Makholm
 
Reply With Quote
 
Marc Girod
Guest
Posts: n/a
 
      09-24-2012
On Sep 24, 10:28*am, Jason C <(E-Mail Removed)> wrote:

> What editor do you recommend?


GNU emacs with cperl-mode

Marc
 
Reply With Quote
 
anotheranne
Guest
Posts: n/a
 
      09-24-2012
Jason C wrote:

> Can someone look at this and tell me what I'm messing up? I've been coding all night, and my eyes have gone fuzzy
>
> while ($text =~ #<a[^>]* href=(["'])*[^\1>]*\1[^>]*?>(.*?)</a>#gsi) {
> if ($2 =~ /^http/i) {
> $text =~ s#<a[^>]*? href=(["'])*([^\1>]*)\1[^>]*?>(.*?)</a>#$2#gsi;
> }
> }


Whatever other errors your regex may have, I would suggest that
you stick with the regular m// and s/// constructs. You should of
course then escape the '/' in </a> . Changing this should make it run.

Don't use # as an eye-easy replacement for / because a) it is the perl
character for a comment, and b) in a regex (at least with the /x
modifier) it is also a metacharacter. Trouble will come your way if
you use this.

If you do want to get away from // and /// then use balanced
delimiters like m{} and s{}{} . See p319 in Friedl MASTERING REGULAR
EXPRESSIONS. O'Reilly.

When use use any alternate to m// the m is then mandatory. Only when
using // can you omit the m. thus // or m{} are valid constructs.

Also you can remove the ';' after the gsi

hope this helps.

anotheranne
>
> The error is on the while() line (at least, I remove it and no more error). The error just says:
>
> syntax error at blah.cgi line 239, near "if"
> syntax error at blah.cgi line 246, near "}"
>
> The purpose of the function is to remove the <a href=...></a> code in submitted text, but only if the linked text begins with http.
>
> TIA,
>
> Jason


 
Reply With Quote
 
anotheranne
Guest
Posts: n/a
 
      09-24-2012
Jason C wrote:

> On Monday, September 24, 2012 1:03:03 AM UTC-4, Ben Morrow wrote:
>>
>> > while ($text =~ #<a[^>]* href=(["'])*[^\1>]*\1[^>]*?>(.*?)</a>#gsi) {

>> ^^ m
>>
>> (I would suggest finding a highlighting editor. It makes this sort of
>> syntactic mistake much easier to spot.)

>
> Thanks, Ben. I didn't realize the m//; was required; since you can change the delimiter with s/// ad hoc, I thought you could here, too.
>
> I'm using Notepad++, and while it helps me catch opening and ending brackets, it didn't do a lot in recognizing syntax errors (at least, not that I know of). What editor do you recommend?


Padre is a nice perl IDE.

http://padre.perlide.org/

anotheranne
 
Reply With Quote
 
Scott Bryce
Guest
Posts: n/a
 
      09-24-2012
On 9/24/2012 3:28 AM, Jason C wrote:
> I'm using Notepad++,


I assume that means you are on a Windows box.

> What editor do you recommend?


I like UltraEdit.


 
Reply With Quote
 
Uri Guttman
Guest
Posts: n/a
 
      09-24-2012
>>>>> "JC" == Jason C <(E-Mail Removed)> writes:

JC> On Monday, September 24, 2012 1:23:40 AM UTC-4, Uri Guttman wrote:
>> why do you think the # marks the start of a regex? only if you use m//
>> can you change the regex delim from /.


JC> Thanks to you, too, Uri. Like I replied to Ben a second ago, I
JC> thought that since you could replace the delimiter in s/// ad hoc,
JC> that you could in m//, too. Learn something new every day!

but s/// has the s to mark the next char. =~ ## has no leading marker so it
would just be a comment. also using # for the delimiter is just a bad
idea as it confuses many readers.

>> finally,
>> why are you parsing out urls with a regex when there are modules that do
>> it correctly?


JC> Two reasons:

JC> 1. I've been working with regex for a year or two, and while it's
JC> by no means a strong point in my vocabulary (yet), I'm at least
JC> familiar enough with it to usually figure it out.

good that you are studying them but it still is the wrong tool for
this. learning when regexes aren't a good solution is part of learning
regexes.

JC> 2. I briefly looked for a module that would handle this correctly,
JC> but wasn't sure what to look for. And, I'm not sure that it
JC> warrants the including of a full module if it could potentially be
JC> done in a simple regex. If you can recommend a module that would
JC> be more stable and/or faster than what I'm doing, though, then I
JC> would definitely appreciate the reference!

JC> FWIW, this modification did work:

JC> while ($text =~ m#(<a[^>]* href=["'].*?["'].*?>)(.*?)(</a>)#gsi) {

it will fail if the opening quote is " and the string has a ' inside
it. perfectly legal html but you can't parse it that way.

JC> Admittedly, I'm not sure why $2 is stored long enough for the if()
JC> statement, but inside of the if() statement it's empty. Storing
JC> them to a different variable worked for this purpose, but if
JC> there's a better way, I'm very much open to it.

you need to read more about regexes and the $1 stuff. they live until
the next regex is run (they are global).

uri
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Hoping somebody with higher access then guest to Cisco's web site will help me on an altruistic endeavour BikeZilla Cisco 6 08-18-2004 12:17 AM
Re: need second pair of eyes: databinder.eval problem Phil Winstanley [Microsoft MVP ASP.NET] ASP .Net 3 06-21-2004 06:24 PM
need second pair of eyes: databinder.eval problem darrel ASP .Net 0 06-21-2004 03:16 PM
Visual Studio problem...can't load projects...hoping for help! KatB ASP .Net 1 10-19-2003 04:43 AM
Can I get a second pair of eyes on this sort ... MW Perl Misc 14 08-29-2003 09:50 PM



Advertisments