![]() |
Clickable link conversion regex?
Can anyone suggest a solution to enclose bare urls with href tags?
open(my $fh, 'urls.txt') or die $!; while (my $line = <$fh>) { $line =~ s[...] # match http or https instances [...]s; # replace with enclosing hrefs print $line; } The input format may be one or more URLs p/line. Each scheme begins with either http:// or https:// but not necessarily as a first string on a line. Each URL ends with either the end of a line or a whitespace. The input file would look like for example: ---------- urls.txt ------- http://www.example.com/hello http://www.example.com/ bla https://www.example.com/a_page.htm plus a string not part of the URL ----------- If an http or https string already has a preceding occurrence of a closing html tag ">", such as: <a href=http://bla.com>http://bla.com</a> .... then it should be excluded with no replacement. Two conditions exist in the input file: The 'http' or 'https' bit will always begin at the first character on a new line or have a preceding whitespace immediately before itself, like: http://someurl.com line w/ whitespace before http://someother.com hello http://bla.com also w/ a whitespace before The match and replace output on the above three lines would then be: <a href=http://someurl.com>http://someurl.com</a> line w/ whitespace before <a href=http://someother.com>http://someother.com</a> hello <a href=http://bla.com>http://bla.com</a> also w/ a whitespace before In case something may written as http://bla, which as in this sentence isn't a link, it would inadvertently end up being converted into a link, but that would be a rare occurrence. In other words, without additional validity checking, the regex would be a best-guess procedure. For a more strict procedure, each match could perhaps be checked against a is_web_uri($...) function using Data::Validate::URI that validates http or https URIs specifically. That said, any example that illustrates a basic search and replace concept be much appreciated, even if it's only a best-guess URL type of procedure. Many thanks for any bright ideas! Tuxedo |
| All times are GMT. The time now is 08:00 PM. |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.