Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > '+' messing up regular expression

Reply
Thread Tools

'+' messing up regular expression

 
 
Chris Johnson
Guest
Posts: n/a
 
      09-15-2005
I've written a CGI script that basically emulates the Apache default
page, but with more customizations. One of these is the addition of
content above the file list, and I've decided to use Wikipedia-esque
shorthand.

I've got it pretty much working. Except there are some problems with
the link conversion. (In case you've never seen it,
[[http://www.google.com|Google]] translates to <a
href="http://www.google.com">Google</a>)

I've found that if there's a '+' in the string to be replaced, it
simply won't be replaced. Here's the code that works on most every
situation:

while(/\[\[(.*?)\]\]/g){
$new = $1;
if($new =~ s/^(.*)\|(.*)$/<a href="$1">$2<\/a>/){
s/\[\[$1\|$2\]\]/$new/g;
}
}

The specific input that's having trouble is

[[http://fy.chalmers.se/~appro/linux/D...dvd+rw-tools]]

but the peculiar thing is that if I remove the +'s, it makes the
replacement fine (except for the fact that the link is no longer
valid). So does anyone see why this is happening?

Thanks for your time,
Chris

 
Reply With Quote
 
 
 
 
A. Sinan Unur
Guest
Posts: n/a
 
      09-16-2005
"Chris Johnson" <(E-Mail Removed)> wrote in
news:(E-Mail Removed) oups.com:

> while(/\[\[(.*?)\]\]/g){
> $new = $1;
> if($new =~ s/^(.*)\|(.*)$/<a href="$1">$2<\/a>/){
> s/\[\[$1\|$2\]\]/$new/g;
> }
> }
>
> The specific input that's having trouble is
>
> [[http://fy.chalmers.se/~appro/linux/D...dvd+rw-tools]]


#!/usr/bin/perl

use strict;
use warnings;

my $s = '[[http://fy.chalmers.se/~appro/linux/D...-tools]]';

if($s =~ /^\[\[(.+)\|(.+)\]\]$/) {
print qq{<a href="$1">$2</a>\n};
}

__END__

D:\Home\asu1\UseNet\clpmisc> c
<a href="http://fy.chalmers.se/~appro/linux/DVD+RW/">dvd+rw-tools</a>

--
A. Sinan Unur <(E-Mail Removed)>
(reverse each component and remove .invalid for email address)

comp.lang.perl.misc guidelines on the WWW:
http://mail.augustmail.com/~tadmc/cl...uidelines.html
 
Reply With Quote
 
 
 
 
Chris Johnson
Guest
Posts: n/a
 
      09-16-2005
A. Sinan Unur wrote:
> "Chris Johnson" <(E-Mail Removed)> wrote in
> news:(E-Mail Removed) oups.com:
>
> > while(/\[\[(.*?)\]\]/g){
> > $new = $1;
> > if($new =~ s/^(.*)\|(.*)$/<a href="$1">$2<\/a>/){
> > s/\[\[$1\|$2\]\]/$new/g;
> > }
> > }
> >
> > The specific input that's having trouble is
> >
> > [[http://fy.chalmers.se/~appro/linux/D...dvd+rw-tools]]

>
> #!/usr/bin/perl
>
> use strict;
> use warnings;
>
> my $s = '[[http://fy.chalmers.se/~appro/linux/D...-tools]]';
>
> if($s =~ /^\[\[(.+)\|(.+)\]\]$/) {
> print qq{<a href="$1">$2</a>\n};
> }
>
> __END__
>
> D:\Home\asu1\UseNet\clpmisc> c
> <a href="http://fy.chalmers.se/~appro/linux/DVD+RW/">dvd+rw-tools</a>


I should clarify, it seems. The input is a text file. I do not simply
want to print the matched patterns; I want to replace the text, and
then print the entire contents of the file. What I'm curious about is
why it won't run the s/$old/$new/g if there's a '+' in $old.

Incidentally, if I change the code to:

while(/\[\[(.*?)\]\]/g){
$new = $1;
if($new =~ s/^(.*)\|(.*)$/<a href="$1">$2<\/a>/){
$old = "[[$1|$2]]";
s/$old/$new/g;
}
}

I get the following error:

Invalid [] range "w-t" in regex; marked by <-- HERE in
m/[[http://fy.chalmers.se/~appro/linux/DVD+RW/|dvd+rw-t <-- HERE
ools]]/ at index.cgi line 89.

 
Reply With Quote
 
A. Sinan Unur
Guest
Posts: n/a
 
      09-16-2005
"Chris Johnson" <(E-Mail Removed)> wrote in
news:(E-Mail Removed) oups.com:

> A. Sinan Unur wrote:
>> "Chris Johnson" <(E-Mail Removed)> wrote in
>> news:(E-Mail Removed) oups.com:
>>
>> > while(/\[\[(.*?)\]\]/g){
>> > $new = $1;
>> > if($new =~ s/^(.*)\|(.*)$/<a href="$1">$2<\/a>/){
>> > s/\[\[$1\|$2\]\]/$new/g;
>> > }
>> > }
>> >
>> > The specific input that's having trouble is
>> >
>> > [[http://fy.chalmers.se/~appro/linux/D...dvd+rw-tools]]

>>
>> #!/usr/bin/perl
>>
>> use strict;
>> use warnings;
>>
>> my $s =
>> '[[http://fy.chalmers.se/~appro/linux/D...-tools]]';
>>
>> if($s =~ /^\[\[(.+)\|(.+)\]\]$/) {
>> print qq{<a href="$1">$2</a>\n};
>> }
>>
>> __END__
>>
>> D:\Home\asu1\UseNet\clpmisc> c
>> <a href="http://fy.chalmers.se/~appro/linux/DVD+RW/">dvd+rw-tools</a>

>
> I should clarify, it seems. The input is a text file. I do not simply
> want to print the matched patterns; I want to replace the text, and
> then print the entire contents of the file. What I'm curious about is
> why it won't run the s/$old/$new/g if there's a '+' in $old.


Because + and - are special in regexes.

It seems like you need to read the docs.

From perldoc perlop:

\Q quote non-word characters till \E

So, for example:

use strict ;
use warnings;

my $test = 'Sinan+Unur';
my $old = '+';
my $new = ' ';

$test =~ s/$old/$new/g;

print "$test\n";


__END__

D:\Home\asu1\UseNet\clpmisc> c
Quantifier follows nothing in regex; marked by <-- HERE in m/+ <-- HERE
/ at D:\Home\asu1\UseNet\clpmisc\c.pl line 8.

Whereas:

use strict ;
use warnings;

my $test = 'Sinan+Unur';
my $old = '+';
my $new = ' ';

$test =~ s/\Q$old\E/$new/g;

print "$test\n";

__END__

D:\Home\asu1\UseNet\clpmisc> c
Sinan Unur

--
A. Sinan Unur <(E-Mail Removed)>
(reverse each component and remove .invalid for email address)

comp.lang.perl.misc guidelines on the WWW:
http://mail.augustmail.com/~tadmc/cl...uidelines.html
 
Reply With Quote
 
Chris Johnson
Guest
Posts: n/a
 
      09-16-2005
A. Sinan Unur wrote:
> "Chris Johnson" <(E-Mail Removed)> wrote in
> news:(E-Mail Removed) oups.com:
>
> > while(/\[\[(.*?)\]\]/g){
> > $new = $1;
> > if($new =~ s/^(.*)\|(.*)$/<a href="$1">$2<\/a>/){
> > s/\[\[$1\|$2\]\]/$new/g;
> > }
> > }
> >
> > The specific input that's having trouble is
> >
> > [[http://fy.chalmers.se/~appro/linux/D...dvd+rw-tools]]

>
> #!/usr/bin/perl
>
> use strict;
> use warnings;
>
> my $s = '[[http://fy.chalmers.se/~appro/linux/D...-tools]]';
>
> if($s =~ /^\[\[(.+)\|(.+)\]\]$/) {
> print qq{<a href="$1">$2</a>\n};
> }
>
> __END__
>
> D:\Home\asu1\UseNet\clpmisc> c
> <a href="http://fy.chalmers.se/~appro/linux/DVD+RW/">dvd+rw-tools</a>


I should clarify, it seems. The input is a text file. I do not simply
want to print the matched patterns; I want to replace the text, and
then print the entire contents of the file. What I'm curious about is
why it won't run the s/$old/$new/g if there's a '+' in $old.

Incidentally, if I change the code to:

while(/\[\[(.*?)\]\]/g){
$new = $1;
if($new =~ s/^(.*)\|(.*)$/<a href="$1">$2<\/a>/){
$old = "[[$1|$2]]";
s/$old/$new/g;
}
}

I get the following error:

Invalid [] range "w-t" in regex; marked by <-- HERE in
m/[[http://fy.chalmers.se/~appro/linux/DVD+RW/|dvd+rw-t <-- HERE
ools]]/ at index.cgi line 89.

 
Reply With Quote
 
Chris Johnson
Guest
Posts: n/a
 
      09-16-2005
Thank you. I was under the impression that those characters only made a
difference if they were typed explicitly, but not if they were part of
a variable.

 
Reply With Quote
 
Jürgen Exner
Guest
Posts: n/a
 
      09-16-2005
Chris Johnson wrote:
[...]
> then print the entire contents of the file. What I'm curious about is
> why it won't run the s/$old/$new/g if there's a '+' in $old.


Well, it does, but probably you didn't mean to use the '+' sign to indicate
one or more instances of the preceeding unit in the RE.
Like in /a+/ matches any non-empty sequence of the letter 'a'.

> Incidentally, if I change the code to:



> I get the following error:
>
> Invalid [] range "w-t" in regex;


Well, yeah, how many characters are there between 'w' and 't'? Note: I
didn't ask for characters between 't' and 'w'.

I strongly recommend you familiarize yourself with regular expressions.
"perldoc perlretut" is a reasonably good introduction.

jue


 
Reply With Quote
 
Tad McClellan
Guest
Posts: n/a
 
      09-16-2005
A. Sinan Unur <(E-Mail Removed)> wrote:

> Because + and - are special in regexes.



Hyphen (-) is not meta in a regular expression, while plus (+) is meta.

Hyphen (-) is meta in a character class, while plus (+) is not meta.


We must peel our "language onion" to know what funny characters are funny.

We have a language inside of a language inside of a language. The
teeny-tiny character class language is inside of the larger regular
expression language which is inside of big ol' Perl.

So we must identify which language we are currently in before we
know what metacharacters apply.

eg:

Hyphen (-):

Perl: subtraction
RE: not meta
CC: range

Caret (^):

Perl: bitwise exclusive or
RE: beginning of string
CC: negates the class


--
Tad McClellan SGML consulting
http://www.velocityreviews.com/forums/(E-Mail Removed) Perl programming
Fort Worth, Texas
 
Reply With Quote
 
T Beck
Guest
Posts: n/a
 
      09-16-2005

Chris Johnson wrote:
[snip early description
>
> while(/\[\[(.*?)\]\]/g){
> $new = $1;
> if($new =~ s/^(.*)\|(.*)$/<a href="$1">$2<\/a>/){
> s/\[\[$1\|$2\]\]/$new/g;
> }
> }
>
> The specific input that's having trouble is
>
> [[http://fy.chalmers.se/~appro/linux/D...dvd+rw-tools]]
>
> but the peculiar thing is that if I remove the +'s, it makes the
> replacement fine (except for the fact that the link is no longer
> valid). So does anyone see why this is happening?
>


Everyone's pointed out how it's happening... here's some code to get
around it. The trick is to not try to use what you get to do an entire
second substitution (Sinan alluded to this with his first post, but
this might be a more useable version for you)

#!/usr/bin/perl
use strict;
use warnings;

my $input =
q{[[http://fy.chalmers.se/~appro/linux/D...dvd+rw-tools]]
other text
[[http://www.google.com|google]] Final text};

$input =~ s/\[\[(.*?)\|(.*?)\]\]/<a href="$1">$2<\/a>/sg;

print "Output:\n$input\n";

../test.pl
Output:
<a href="http://fy.chalmers.se/~appro/linux/DVD+RW/">dvd+rw-tools</a>
other text
<a href="http://www.google.com">google</a> Final text


--T Beck

 
Reply With Quote
 
A. Sinan Unur
Guest
Posts: n/a
 
      09-17-2005
Tad McClellan <(E-Mail Removed)> wrote in
news:(E-Mail Removed):

> A. Sinan Unur <(E-Mail Removed)> wrote:
>
>> Because + and - are special in regexes.

>
>
> Hyphen (-) is not meta in a regular expression, while plus (+) is
> meta.
>
> Hyphen (-) is meta in a character class, while plus (+) is not meta.
>
>
> We must peel our "language onion" to know what funny characters are
> funny.


Absolutely. Thank you for the clarification.

Sinan

--
A. Sinan Unur <(E-Mail Removed)>
(reverse each component and remove .invalid for email address)

comp.lang.perl.misc guidelines on the WWW:
http://mail.augustmail.com/~tadmc/cl...uidelines.html
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Seek xpath expression where an attribute name is a regular expression GIMME XML 3 12-29-2008 03:11 PM
C/C++ language proposal: Change the 'case expression' from "integral constant-expression" to "integral expression" Adem C++ 42 11-04-2008 12:39 PM
C/C++ language proposal: Change the 'case expression' from "integral constant-expression" to "integral expression" Adem C Programming 45 11-04-2008 12:39 PM
Matching abitrary expression in a regular expression =?iso-8859-1?B?bW9vcJk=?= Java 8 12-02-2005 12:51 AM
Dynamically changing the regular expression of Regular Expression validator VSK ASP .Net 2 08-24-2003 02:47 PM



Advertisments