On Oct 7, 10:28*am, "henryturnerli...@googlemail.com"
<henryturnerli...@googlemail.com> wrote:
> Well, I suppose there are incorrectly formatted links too... I was
> talking about correctly formatted links that point to a 400+ status
> code resource. Something libxml would not pick up since I guess you're
talking about its syntax checking bit.
Well, libxml stores the line number of every element. So you can
extract all links, check them, and print out element.line_num for each
one that fails the check.
Here's some starter code:
#----------------------------------------------
require 'rubygems'
require 'xml'
XML:

arser.default_line_numbers = true
html = <<END_HTML
<html>
<head><title>test</title></head>
<body>
Here is a <a href="http://brok.en">broken link.</a>
</body>
</html>
END_HTML
parser = XML:

arser.string html
doc = parser.parse
def broken?(link)
true
end
doc.find("//a[@href]").each do |link|
if broken?(link)
puts "Broken link to #{link['href']} on line #{link.line_num}"
end
end