Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Perl Misc (http://www.velocityreviews.com/forums/f67-perl-misc.html)
-   -   Clickable link conversion regex? (http://www.velocityreviews.com/forums/t955000-clickable-link-conversion-regex.html)

Tuxedo 11-30-2012 09:51 PM

Clickable link conversion regex?
 
Can anyone suggest a solution to enclose bare urls with href tags?

open(my $fh, 'urls.txt') or die $!;

while (my $line = <$fh>) {
$line =~ s[...] # match http or https instances
[...]s; # replace with enclosing hrefs
print $line;
}

The input format may be one or more URLs p/line.

Each scheme begins with either http:// or https:// but not necessarily as a
first string on a line.

Each URL ends with either the end of a line or a whitespace.

The input file would look like for example:

---------- urls.txt -------

http://www.example.com/hello
http://www.example.com/

bla https://www.example.com/a_page.htm plus a string not part of the URL

-----------

If an http or https string already has a preceding occurrence of a closing
html tag ">", such as:
<a href=http://bla.com>http://bla.com</a>
.... then it should be excluded with no replacement.

Two conditions exist in the input file:

The 'http' or 'https' bit will always begin at the first character on a new
line or have a preceding whitespace immediately before itself, like:

http://someurl.com line w/ whitespace before
http://someother.com
hello http://bla.com also w/ a whitespace before

The match and replace output on the above three lines would then be:

<a href=http://someurl.com>http://someurl.com</a> line w/ whitespace before
<a href=http://someother.com>http://someother.com</a>
hello <a href=http://bla.com>http://bla.com</a> also w/ a whitespace before

In case something may written as http://bla, which as in this sentence
isn't a link, it would inadvertently end up being converted into a link,
but that would be a rare occurrence. In other words, without additional
validity checking, the regex would be a best-guess procedure. For a more
strict procedure, each match could perhaps be checked against a
is_web_uri($...) function using Data::Validate::URI that validates http or
https URIs specifically. That said, any example that illustrates a basic
search and replace concept be much appreciated, even if it's only a
best-guess URL type of procedure.

Many thanks for any bright ideas!

Tuxedo


All times are GMT. The time now is 08:00 PM.

Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57