Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > HTML::LinkExtor or me ?

Reply
Thread Tools

HTML::LinkExtor or me ?

 
 
Saya
Guest
Posts: n/a
 
      05-05-2004
Hi,

This is the code:

sub Escape{
$item = shift;

use HTML::LinkExtor;

$p = HTML::LinkExtor->new(\&replaceURL, "");
$p->parse($item);

return $item;
}


sub replaceURL {

my(@links) = @_;


my $makeSubstitution = false;
my $newLink;

foreach my $link (@links) {
#$link =~ s/\/$//i;
$makeSubstitution = compareValues($link);

if ($makeSubstitution eq true) {
if($link =~ /http|www/) {
if ($link !~ /http/) {
$newLink = "http://" . $link;
}
else {
$newLink = $link;
}
$item =~ s/href=\"$link/href=\"\/redirect.asp?forwardURL=$newLink/i;
}
}
else {
if($link =~ /http|www/) {
if ($link !~ /http/) {
$item=~ s/href=\"$link/href=\"http:\/\/$link/i;
}
}
}
}
}

sub compareValues {
my $link = shift;

my @safeLinkArr;
@safeLinkArr = getSafeSites();
my $sizeOfArray = @safeLinkArr;
my $result = true;

if($sizeOfArray eq 0) {
return $result;
}

foreach my $safeLink (@safeLinkArr) {

if ( (0 <= (index($link, $safeLink))) or (0 <= (index($safeLink,
$link))) ) {
$result = false;
last;
}
else {
$result = true;
}
}

return $result;
}



sub getSafeSites {
use XML:OM;

my $count;
my $WAPath;
my @linkArr;


foreach $arg (@ARGV)
{
if ($ARGV[$count] eq '-iw_include-location')
{
$WAPath = $ARGV[$count + 1];
}
$count++;
}

my $nonRedirectList = $WAPath . "/include/nonRedirectList.xml";

# --- Parsing the XML file ---
my $parser = XML:OM:arser->new();
my $doc = $parser->parsefile($nonRedirectList);

# --- get all tags ---
my $links = $doc->getElementsByTagName('Link');
my $link;

for my $i (0..$links->getLength()-1) {
$link = $links->item($i);

if ($link->getFirstChild->getNodeValue) {
@linkArr[$i] = $link->getFirstChild->getNodeValue;
}
$i++;
}

$doc->dispose;

return @linkArr;
}

Escape($item);

$item = is real scenario is text + <a> + text <a> etc.

For some reason that I do not understand some links are not parsed
correctly. Does anyone have a reason for why this might be happening ?

I have looked at this problem for 2 days now, and can not find the
problem, so any help will be greatly appreciated

/Saya
 
Reply With Quote
 
 
 
 
Gisle Aas
Guest
Posts: n/a
 
      05-06-2004
http://www.velocityreviews.com/forums/(E-Mail Removed) (Saya) writes:

> This is the code:
>
> sub Escape{
> $item = shift;
>
> use HTML::LinkExtor;
>
> $p = HTML::LinkExtor->new(\&replaceURL, "");
> $p->parse($item);
>
> return $item;
> }


What are you actually trying to do? Please describe that and remove
unrelated details from your example program before you post.

If you want to do substitutions on links in an HTML document, then
this example program might be a good start.

http://search.cpan.org/src/GAAS/HTML....36/eg/hrefsub

--
Gisle Aas
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off




Advertisments