Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > ASP .Net > Retrievel Hyperlinks for a web page in code

Reply
Thread Tools

Retrievel Hyperlinks for a web page in code

 
 
Enigma Boy
Guest
Posts: n/a
 
      08-14-2007
Hi folks,

I am retrieving a website for a site using httpWebRequest. What I want to
do with the retrieved webpage is list all the hyperlinks in the page. If I
do a simple regex search for <a then I get links that are commented out in
code and I don't want that. I want links that are actually active. This is
to do with reciprocal link check.

Can someone please point me in the right direction.

Thanks.

--
<a href="http://1pakistangifts.com">Send Gifts to Pakisan at #Pakistan Gifts
Store</a> | <a href="http://dotspecialists.com">Leading Software offshoring
and outsourcing service provider</a> | <a
href="http://websitedesignersrus.com">Professional Websites at affordable
prices</a>



 
Reply With Quote
 
 
 
 
Alexey Smirnov
Guest
Posts: n/a
 
      08-14-2007
On Aug 14, 8:01 am, "Enigma Boy" <(E-Mail Removed)> wrote:
> Hi folks,
>
> I am retrieving a website for a site using httpWebRequest. What I want to
> do with the retrieved webpage is list all the hyperlinks in the page. If I
> do a simple regex search for <a then I get links that are commented out in
> code and I don't want that. I want links that are actually active. This is
> to do with reciprocal link check.


Hi, I think you can try to clean the text before you get the links.
For example:

html_code = Regex.Replace(html_code, "<!--((.|\n)*?)-->", "");

This will replace all commented code by an empty string and then you
can get the links.

 
Reply With Quote
 
 
 
 
Jesse Houwing
Guest
Posts: n/a
 
      08-14-2007
Hello Enigma,

> Hi folks,
>
> I am retrieving a website for a site using httpWebRequest. What I
> want to do with the retrieved webpage is list all the hyperlinks in
> the page. If I do a simple regex search for <a then I get links that
> are commented out in code and I don't want that. I want links that
> are actually active. This is to do with reciprocal link check.
>
> Can someone please point me in the right direction.
>
> Thanks.


Have a look at the HTML Agility pack. It allows you to treat the HTML as
it were XML.

http://www.codeplex.com/Wiki/View.as...tmlagilitypack

--
Jesse Houwing
jesse.houwing at sogeti.n


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
changing hyperlinks in code Gary Larimer ASP .Net 0 08-14-2008 03:42 PM
Save a web page - hyperlinks question fiefie.niles@gmail.com ASP General 3 08-27-2006 07:50 AM
data retrievel via perl batista@bit.uni-bonn.de Perl Misc 2 12-08-2005 08:55 AM
hyperlinks don't appear on Google page. Settings? Bruce Computer Support 1 07-07-2004 07:50 PM
Positioning hyperlinks dynamicaly (in code) carlos.cruz@algeco.pt ASP .Net 2 03-04-2004 07:15 PM



Advertisments