Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Perl Misc (http://www.velocityreviews.com/forums/f67-perl-misc.html)
-   -   Stupid regex problem, s/// catching extra letter (http://www.velocityreviews.com/forums/t948215-stupid-regex-problem-s-catching-extra-letter.html)

Jason C 07-18-2012 04:01 AM

Stupid regex problem, s/// catching extra letter
 
I know better than to work late at night, but sometimes it just can't be helped :-)

I'm doing a simple s///, converting "www." to "http://www." when "www." occurs without a preceding "http://". Here's what I'm doing:

$text = "www.example.com";
$text =~ s#[^(http://)]www\.#http://www\.#gi;
print $text;

If $text is this, though:

$text = "<div>www.example.com</div>";

the regex is catching the > in <div>, printing:

<divhttp://www.example.com</div>

Where am I screwing up?

Jason C 07-18-2012 05:05 AM

Re: Stupid regex problem, s/// catching extra letter
 
On Wednesday, July 18, 2012 12:57:00 AM UTC-4, thepoet wrote:
> What you're trying to do is a zero width negative look-behind
> assertion.
> s#(?<!http://)www\.#http://www.#gi should do the trick.
> The "(?<!...)" tells the regex engine to only match the following
> pattern if it is not preceded by the pattern in the look-behind,
> without capturing anything.
>
> "perldoc perlre" has good explanations for character classes
> and look-around assertions.
>
> -Chris


Thanks for the help, Chris. Character classes aren't exactly intuitive when a symbol changes definition completely based on context, so I'm still struggling with that a little.

The modification you suggested was perfect, though! Thanks again :-)

Rainer Weikusat 07-18-2012 12:30 PM

Re: Stupid regex problem, s/// catching extra letter
 
Jason C <jwcarlton@gmail.com> writes:
> On Wednesday, July 18, 2012 12:57:00 AM UTC-4, thepoet wrote:
>> What you're trying to do is a zero width negative look-behind
>> assertion.
>> s#(?<!http://)www\.#http://www.#gi should do the trick.
>> The "(?<!...)" tells the regex engine to only match the following
>> pattern if it is not preceded by the pattern in the look-behind,
>> without capturing anything.
>>
>> "perldoc perlre" has good explanations for character classes
>> and look-around assertions.
>>
>> -Chris

>
> Thanks for the help, Chris. Character classes aren't exactly
> intuitive when a symbol changes definition completely based on
> context, so I'm still struggling with that a little.


A character class denotes an unordered set of characters, meaning

[^http://]
[^htp:/]
[^:pppppth/]
[^:/hpt]
[^h:t/p]

all represent identical sets and they all match a single character.
But you wanted to match the string http:// and a regex matching a
string is just the string itself, IOW, THIS sequence of characters.


All times are GMT. The time now is 08:56 AM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.