Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Perl Misc (http://www.velocityreviews.com/forums/f67-perl-misc.html)
-   -   Basic Regular Expressions question... (http://www.velocityreviews.com/forums/t891681-basic-regular-expressions-question.html)

Will 04-06-2005 04:17 PM

Basic Regular Expressions question...
 
Hi,
I have a longer program that finds and recursively replaces text in
many html files that works beautifully for most cases, but I think I'm
getting hung up on a s/// and regular expressions glitch. I wrote a
very short program that gets to the heart of the matter...

################################################## ##############################
use strict;
use warnings;


my
$find="https://sinaicentral.mssm.edu/intranet/intranet/ct_public/view?trial_id=MSM03204&searchNow=no";
my $replace="http://www.excite.com";


my $thisPage=
"https://sinaicentral.mssm.edu/intranet/intranet/ct_public/view?trial_id=MSM03204&searchNow=no";

$thisPage =~ s#$find#$replace#g;

print $thisPage;
################################################## ##############################

To my understanding, this program should take the long string in $find
and then replace it with $replace and the output should be
"http://www.excite.com". I think the "?" in the $find variable is
being treated as a Regular Expression but I can't figure out a way to
nullify that effect. I'm a librarian not a programmer! Sombody please
help! I'm working for a worthy non-profit that is strapped for cash, so
I have to figure this out! It will bring you good karma! Thanks a
bunch!

Will Jiang


A. Sinan Unur 04-06-2005 04:29 PM

Re: Basic Regular Expressions question...
 
"Will" <kd3qc@yahoo.com> wrote in
news:1112804221.588102.16580@o13g2000cwo.googlegro ups.com:

> To my understanding, this program should take the long string in $find
> and then replace it with $replace and the output should be
> "http://www.excite.com". I think the "?" in the $find variable is
> being treated as a Regular Expression but I can't figure out a way to
> nullify that effect.


To put it correctly, ? is special in a regular expression.

perldoc perlreref

? Matches the preceding element 0 or 1 times

also from the same document

\Q Disable pattern metacharacters until \E

$thispage =~ s{\Q$find\E}{$replace};

should work.

> I'm a librarian not a programmer! Sombody
> please help! I'm working for a worthy non-profit that is strapped for
> cash, so I have to figure this out! It will bring you good karma!


None of this increases your chances of getting help. Describing your
problem accurately, as you did, is the crucial part.

For further information on how to help others help you, please see the
posting guidelines for this group if you haven't already done so.

Sinan

--
A. Sinan Unur <1usa@llenroc.ude.invalid>
(reverse each component and remove .invalid for email address)

comp.lang.perl.misc guidelines on the WWW:
http://mail.augustmail.com/~tadmc/cl...uidelines.html

Will 04-06-2005 04:37 PM

Re: Basic Regular Expressions question...
 
THANKS SO MUCH! I really appreciate the help! Have a wonderful day!

Will Jiang
A. Sinan Unur wrote:
> "Will" <kd3qc@yahoo.com> wrote in
> news:1112804221.588102.16580@o13g2000cwo.googlegro ups.com:
>
> > To my understanding, this program should take the long string in

$find
> > and then replace it with $replace and the output should be
> > "http://www.excite.com". I think the "?" in the $find variable is
> > being treated as a Regular Expression but I can't figure out a way

to
> > nullify that effect.

>
> To put it correctly, ? is special in a regular expression.
>
> perldoc perlreref
>
> ? Matches the preceding element 0 or 1 times
>
> also from the same document
>
> \Q Disable pattern metacharacters until \E
>
> $thispage =~ s{\Q$find\E}{$replace};
>
> should work.
>
> > I'm a librarian not a programmer! Sombody
> > please help! I'm working for a worthy non-profit that is strapped

for
> > cash, so I have to figure this out! It will bring you good karma!

>
> None of this increases your chances of getting help. Describing your
> problem accurately, as you did, is the crucial part.
>
> For further information on how to help others help you, please see

the
> posting guidelines for this group if you haven't already done so.
>
> Sinan
>
> --
> A. Sinan Unur <1usa@llenroc.ude.invalid>
> (reverse each component and remove .invalid for email address)
>
> comp.lang.perl.misc guidelines on the WWW:
> http://mail.augustmail.com/~tadmc/cl...uidelines.html



Gunnar Hjalmarsson 04-06-2005 05:05 PM

Re: Basic Regular Expressions question...
 
A. Sinan Unur wrote:
>
> \Q Disable pattern metacharacters until \E
>
> $thispage =~ s{\Q$find\E}{$replace};


Since escaping all the characters in PATTERN makes it a non-regex
problem, I played with using index() and substr() instead:

substr $thisPage, index($thisPage, $find), length $find, $replace;

However, to take the /g modifier into consideration (which the OP
originally used), you seem to need something like:

my ($i, $length) = (0,0);
while ( ( $i = index $thisPage, $find, $i+$length ) >= 0 ) {
$length = length $find;
substr $thisPage, $i, $length, $replace;
}

That's much typing to 'emulate'

$thispage =~ s/\Q$find/$replace/g;

Assuming that using index() and substr() is more efficient than the
using the s/// operator, is there any easier way to combine them to
achieve the same result?

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

Tad McClellan 04-06-2005 11:46 PM

Re: Basic Regular Expressions question...
 
Will <kd3qc@yahoo.com> wrote:

> I have a longer program that finds and recursively replaces text in



There is no recursion in what you are doing.


--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas

Anno Siegel 04-07-2005 11:06 AM

Re: Basic Regular Expressions question...
 
Gunnar Hjalmarsson <noreply@gunnar.cc> wrote in comp.lang.perl.misc:
> A. Sinan Unur wrote:
> >
> > \Q Disable pattern metacharacters until \E
> >
> > $thispage =~ s{\Q$find\E}{$replace};

>
> Since escaping all the characters in PATTERN makes it a non-regex
> problem, I played with using index() and substr() instead:
>
> substr $thisPage, index($thisPage, $find), length $find, $replace;
>
> However, to take the /g modifier into consideration (which the OP
> originally used), you seem to need something like:
>
> my ($i, $length) = (0,0);
> while ( ( $i = index $thisPage, $find, $i+$length ) >= 0 ) {
> $length = length $find;
> substr $thisPage, $i, $length, $replace;
> }
>
> That's much typing to 'emulate'
>
> $thispage =~ s/\Q$find/$replace/g;


A bit tighter:

my $i = -1;
substr $thisPage, $i, length $find, $replace while
( $i = index $thisPage, $find, $i + 1) >= 0;

Anno

Gunnar Hjalmarsson 04-07-2005 08:35 PM

Re: Basic Regular Expressions question...
 
Anno Siegel wrote:
> Gunnar Hjalmarsson wrote:
>> However, to take the /g modifier into consideration (which the OP
>> originally used), you seem to need something like:
>>
>> my ($i, $length) = (0,0);
>> while ( ( $i = index $thisPage, $find, $i+$length ) >= 0 ) {
>> $length = length $find;
>> substr $thisPage, $i, $length, $replace;
>> }
>>
>> That's much typing to 'emulate'
>>
>> $thispage =~ s/\Q$find/$replace/g;

>
> A bit tighter:
>
> my $i = -1;
> substr $thisPage, $i, length $find, $replace while
> ( $i = index $thisPage, $find, $i + 1) >= 0;


Yeah, but I just realized that neither of the above index() + substr()
solutions would work on e.g. this set of input:

my $thisPage = "It's a ball. The ball is brown.";
my $find = 'ball';
my $replace = 'football';

Isn't it something like this that's needed:

my $repl_length = length $replace;
my $i = -$repl_length;
while ( ( $i = index $thisPage, $find, $i + $repl_length ) >= 0 ) {
substr $thisPage, $i, length $find, $replace;
}

Maybe no wonder that the s/// operator is frequently used also for
replacing non-regex patterns when efficiency is not a restriction.

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

Anno Siegel 04-07-2005 10:06 PM

Re: Basic Regular Expressions question...
 
Gunnar Hjalmarsson <noreply@gunnar.cc> wrote in comp.lang.perl.misc:
> Anno Siegel wrote:
> > Gunnar Hjalmarsson wrote:
> >> However, to take the /g modifier into consideration (which the OP
> >> originally used), you seem to need something like:
> >>
> >> my ($i, $length) = (0,0);
> >> while ( ( $i = index $thisPage, $find, $i+$length ) >= 0 ) {
> >> $length = length $find;
> >> substr $thisPage, $i, $length, $replace;
> >> }
> >>
> >> That's much typing to 'emulate'
> >>
> >> $thispage =~ s/\Q$find/$replace/g;

> >
> > A bit tighter:
> >
> > my $i = -1;
> > substr $thisPage, $i, length $find, $replace while
> > ( $i = index $thisPage, $find, $i + 1) >= 0;

>
> Yeah, but I just realized that neither of the above index() + substr()
> solutions would work on e.g. this set of input:
>
> my $thisPage = "It's a ball. The ball is brown.";
> my $find = 'ball';
> my $replace = 'football';
>
> Isn't it something like this that's needed:
>
> my $repl_length = length $replace;
> my $i = -$repl_length;
> while ( ( $i = index $thisPage, $find, $i + $repl_length ) >= 0 ) {
> substr $thisPage, $i, length $find, $replace;
> }


You're right, though I wouldn't bother with storing the length of
anything. Working backwards runs smoother:

my $i = length $thisPage;
substr( $thisPage, $i, length $find) = $replace while
( $i = rindex $thisPage, $find, $i) >= 0;

That way only the unchanged part of the string is ever searched.

> Maybe no wonder that the s/// operator is frequently used also for
> replacing non-regex patterns when efficiency is not a restriction.


I don't think I've used index for anything but simple location or just
presence/absence. Replacement is too much hassle with the substr() for
my taste.

Anno

John W. Krahn 04-07-2005 11:18 PM

Re: Basic Regular Expressions question...
 
Anno Siegel wrote:
> Gunnar Hjalmarsson <noreply@gunnar.cc> wrote in comp.lang.perl.misc:
>
>>Anno Siegel wrote:
>>
>>>Gunnar Hjalmarsson wrote:
>>>
>>>>However, to take the /g modifier into consideration (which the OP
>>>>originally used), you seem to need something like:
>>>>
>>>> my ($i, $length) = (0,0);
>>>> while ( ( $i = index $thisPage, $find, $i+$length ) >= 0 ) {
>>>> $length = length $find;
>>>> substr $thisPage, $i, $length, $replace;
>>>> }
>>>>
>>>>That's much typing to 'emulate'
>>>>
>>>> $thispage =~ s/\Q$find/$replace/g;
>>>
>>>A bit tighter:
>>>
>>> my $i = -1;
>>> substr $thisPage, $i, length $find, $replace while
>>> ( $i = index $thisPage, $find, $i + 1) >= 0;

>>
>>Yeah, but I just realized that neither of the above index() + substr()
>>solutions would work on e.g. this set of input:
>>
>> my $thisPage = "It's a ball. The ball is brown.";
>> my $find = 'ball';
>> my $replace = 'football';
>>
>>Isn't it something like this that's needed:
>>
>> my $repl_length = length $replace;
>> my $i = -$repl_length;
>> while ( ( $i = index $thisPage, $find, $i + $repl_length ) >= 0 ) {
>> substr $thisPage, $i, length $find, $replace;
>> }

>
>
> You're right, though I wouldn't bother with storing the length of
> anything. Working backwards runs smoother:
>
> my $i = length $thisPage;
> substr( $thisPage, $i, length $find) = $replace while
> ( $i = rindex $thisPage, $find, $i) >= 0;
>
> That way only the unchanged part of the string is ever searched.


Also, using the four argument substr() should be faster.

my $i = length $thisPage;
substr $thisPage, $i, length $find, $replace
while ( $i = rindex $thisPage, $find, $i ) >= 0;



John
--
use Perl;
program
fulfillment

Anno Siegel 04-08-2005 05:57 AM

Re: Basic Regular Expressions question...
 
John W. Krahn <krahnj@telus.net> wrote in comp.lang.perl.misc:
> Anno Siegel wrote:
> > Gunnar Hjalmarsson <noreply@gunnar.cc> wrote in comp.lang.perl.misc:


[using index() instead of s///]

> > my $i = length $thisPage;
> > substr( $thisPage, $i, length $find) = $replace while
> > ( $i = rindex $thisPage, $find, $i) >= 0;
> >
> > That way only the unchanged part of the string is ever searched.

>
> Also, using the four argument substr() should be faster.


How so? I never heard of that.

I use "=" with substr() assignments because it reads better. Four argument
substr is for when I need the old value of the substring too.

> my $i = length $thisPage;
> substr $thisPage, $i, length $find, $replace
> while ( $i = rindex $thisPage, $find, $i ) >= 0;


Anno


All times are GMT. The time now is 01:28 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.