Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Can't find a syntax error, hoping a second set of eyes will help

Reply
Thread Tools

Can't find a syntax error, hoping a second set of eyes will help

 
 
Jason C
Guest
Posts: n/a
 
      09-25-2012
On Monday, September 24, 2012 11:03:04 AM UTC-4, Ben Morrow wrote:
> > FWIW, this modification did work:
> >
> > while ($text =~ m#(<a[^>]* href=["'].*?["'].*?>)(.*?)(</a>)#gsi) {
> > $pattern = $1$2$3;

> ^^ ^^
> I think not...


Blah, sorry; that's what I get for trying to type of dummy code at 5am. In practice, I put it in quotes:

$pattern = "$1$2$3";


> > if ($2 =~ /^http/i) {
> > $text =~ s/$pattern/$repl/gsi;

>
> This almost certainly doesn't do what you think. If nothing else, you
> want to \Q $pattern.


Excellent point about \Q. What do you mean, though, that it doesn't do what I think?


> What are you trying to do here: strip tags?


Yes and no. I'm using a contenteditable instead of a textarea, and I've discovered that when someone copy-and-pastes an URL from Chrome or FF, it's automatically making the URL a link. Eg:

<a href="http://www.google.com">http://www.google.com</a>

But of course, if you just type the address, then it doesn't. So on my end, I was using URI::Find to convert addresses to links, and ending up with a mess like:

<a href="<a href="http://www.google.com">http://www.google.com</a>"><a href="http://www.google.com">http://www.google.com</a></a>

So, my goal here is to remove the <a href> tag, but only if the linked text is an URL.


> Why not
> just do one s/// (or, you know, use a module)?


I had originally tried doing it with a simple s///, but couldn't figure out how to make it conditional. Like this:

$text =~ s#<a[^>]*? href=(["'])*([^\1>]*)\1[^>]*?>(.*?)</a>#$2#gsi
if ($3 =~ /^http/i);

This worked correctly if I removed the if() statement. In testing, I changed the replacement to:

1 - $1, 2 - $2, 3 - $3

just to make sure that $3 did begin with http, and it did, so I couldn't figure out why the if() wasn't catching it unless it was dropping the $3 value before reaching the if().


> > Admittedly, I'm not sure why $2 is stored long enough for the if()
> > statement, but inside of the if() statement it's empty. Storing them to
> > a different variable worked for this purpose, but if there's a better
> > way, I'm very much open to it.

>
> The $N variables last until the next successful pattern match. In this
> case, the '$2 =~ /^http/i' in the condition of the if clears them all
> (even though it doesn't capture anything).


Ahh, that makes sense. I mistakenly thought that, since I wasn't assigning $N, then they would retain the previous value.


> In general I prefer to assign captures to real variables right away:
>
> while (my ($tag, $url) = m#(<a...>(.*?)</a>)#gsi) {
>
> (notice also that captures can be nested, and DTRT).


Great to know! Thanks.
 
Reply With Quote
 
 
 
 
Jason C
Guest
Posts: n/a
 
      09-25-2012
On Monday, September 24, 2012 11:03:04 AM UTC-4, Ben Morrow wrote:

> while (my ($tag, $url) = m#(<a...>(.*?)</a>)#gsi) {


In this, how does it know that we're testing $test? Or, did you mean to type something like:

while (my (tag, $url) = $text =~ m#(<a...>(.*?)</a>)#gsi)
 
Reply With Quote
 
 
 
 
Jason C
Guest
Posts: n/a
 
      09-25-2012
On Monday, September 24, 2012 3:44:44 PM UTC-4, Uri Guttman wrote:

> JC> while ($text =~ m#(<a[^>]* href=["'].*?["'].*?>)(.*?)(</a>)#gsi) {
>
> it will fail if the opening quote is " and the string has a ' inside
> it. perfectly legal html but you can't parse it that way.


I'll probably discard this idea and pursue a module, like you guys suggested. But for the sake of learning...

I recognized this issue, too, which is why I was originally using [^\1], like so:

(["'])*([^\1>]*)\1

I think it was you that pointed out that I can't negate a backreference like that, though.

What would be the correct way to do this, if I can't negate a backreference as a character class?
 
Reply With Quote
 
Jim Gibson
Guest
Posts: n/a
 
      09-25-2012
In article <(E-Mail Removed)>,
Jason C <(E-Mail Removed)> wrote:

> On Monday, September 24, 2012 3:44:44 PM UTC-4, Uri Guttman wrote:
>
> > JC> while ($text =~ m#(<a[^>]* href=["'].*?["'].*?>)(.*?)(</a>)#gsi) {
> >
> > it will fail if the opening quote is " and the string has a ' inside
> > it. perfectly legal html but you can't parse it that way.

>
> I'll probably discard this idea and pursue a module, like you guys suggested.
> But for the sake of learning...
>
> I recognized this issue, too, which is why I was originally using [^\1], like
> so:
>
> (["'])*([^\1>]*)\1
>
> I think it was you that pointed out that I can't negate a backreference like
> that, though.
>
> What would be the correct way to do this, if I can't negate a backreference as a character class?


Capture the leading delimiter and use a backreference that is not in a
character class:

while ($text =~ m{(<a[^>]* href=(["']).*?\2.*?>)(.*?)(</a>)}gsi) {
^^

--
Jim Gibson
 
Reply With Quote
 
Kaz Kylheku
Guest
Posts: n/a
 
      09-26-2012
On 2012-09-26, Eli the Bearded <*@eli.users.panix.com> wrote:
>:r! cat $PHTML/some.links.html


UUOC infects the the vi command line!

:r!cat <file> -> :r <file>

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Hoping somebody with higher access then guest to Cisco's web site will help me on an altruistic endeavour BikeZilla Cisco 6 08-18-2004 12:17 AM
Re: need second pair of eyes: databinder.eval problem Phil Winstanley [Microsoft MVP ASP.NET] ASP .Net 3 06-21-2004 06:24 PM
need second pair of eyes: databinder.eval problem darrel ASP .Net 0 06-21-2004 03:16 PM
Visual Studio problem...can't load projects...hoping for help! KatB ASP .Net 1 10-19-2003 04:43 AM
Can I get a second pair of eyes on this sort ... MW Perl Misc 14 08-29-2003 09:50 PM



Advertisments