Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > ASP .Net > Replacing html tags

Reply
Thread Tools

Replacing html tags

 
 
jumblesale
Guest
Posts: n/a
 
      10-04-2006
Hello all,
I'm not all that bad at Regex, but i'm stumped on how to approach my
problem.

I need to parse a string and remove all html tags except hyperlinks.

I can remove all the html tags using: Regex.Replace(inputText,
@"<(/?[^\>]+)>", "");
But this also removes any hyperlinks, which i need to keep.

I've also written a regex for finding hyperlinks:
<a[\s]href=["'][^"]+[.\s]*["'][^<]+[.\s]*</a>
but my problem is trying to put all this together.

I've thought of using Regex.Matches and checking each instance but
can't get that to work.

Any ideas and/ or code would be great - i'm used to C# but VB's cool as
well.

Cheers in advance,
max

 
Reply With Quote
 
 
 
 
Chris Fulstow
Guest
Posts: n/a
 
      10-04-2006
You could do this with the HTML Agility Pack:
http://www.codeplex.com/Wiki/View.as...tmlagilitypack

I think it comes with an example that strips HTML tags, which you could
probably adapt quite quickly to keep <a> tags.

jumblesale wrote:
> Hello all,
> I'm not all that bad at Regex, but i'm stumped on how to approach my
> problem.
>
> I need to parse a string and remove all html tags except hyperlinks.
>
> I can remove all the html tags using: Regex.Replace(inputText,
> @"<(/?[^\>]+)>", "");
> But this also removes any hyperlinks, which i need to keep.
>
> I've also written a regex for finding hyperlinks:
> <a[\s]href=["'][^"]+[.\s]*["'][^<]+[.\s]*</a>
> but my problem is trying to put all this together.
>
> I've thought of using Regex.Matches and checking each instance but
> can't get that to work.
>
> Any ideas and/ or code would be great - i'm used to C# but VB's cool as
> well.
>
> Cheers in advance,
> max


 
Reply With Quote
 
 
 
 
jumblesale
Guest
Posts: n/a
 
      10-04-2006
wow, that's a great pack but surely there's a simpler way of doing it
with regex? seems like a huge amount of files to import just to check a
string

Cheers for your quick response,
max

Chris Fulstow wrote:

> You could do this with the HTML Agility Pack:
> http://www.codeplex.com/Wiki/View.as...tmlagilitypack
>
> I think it comes with an example that strips HTML tags, which you could
> probably adapt quite quickly to keep <a> tags.
>
> jumblesale wrote:
> > Hello all,
> > I'm not all that bad at Regex, but i'm stumped on how to approach my
> > problem.
> >
> > I need to parse a string and remove all html tags except hyperlinks.
> >
> > I can remove all the html tags using: Regex.Replace(inputText,
> > @"<(/?[^\>]+)>", "");
> > But this also removes any hyperlinks, which i need to keep.
> >
> > I've also written a regex for finding hyperlinks:
> > <a[\s]href=["'][^"]+[.\s]*["'][^<]+[.\s]*</a>
> > but my problem is trying to put all this together.
> >
> > I've thought of using Regex.Matches and checking each instance but
> > can't get that to work.
> >
> > Any ideas and/ or code would be great - i'm used to C# but VB's cool as
> > well.
> >
> > Cheers in advance,
> > max


 
Reply With Quote
 
Mark Fitzpatrick
Guest
Posts: n/a
 
      10-04-2006
Woohoo! This is a great control library. Glad you posted it here as it saved
me from writing a lot of code using the WebBrowser control to do some
similar HTML manipulation.


--
Thanks again,
Mark Fitzpatrick
Former Microsoft FrontPage MVP 199?-2006


"Chris Fulstow" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed) ups.com...
> You could do this with the HTML Agility Pack:
> http://www.codeplex.com/Wiki/View.as...tmlagilitypack
>
> I think it comes with an example that strips HTML tags, which you could
> probably adapt quite quickly to keep <a> tags.
>
> jumblesale wrote:
>> Hello all,
>> I'm not all that bad at Regex, but i'm stumped on how to approach my
>> problem.
>>
>> I need to parse a string and remove all html tags except hyperlinks.
>>
>> I can remove all the html tags using: Regex.Replace(inputText,
>> @"<(/?[^\>]+)>", "");
>> But this also removes any hyperlinks, which i need to keep.
>>
>> I've also written a regex for finding hyperlinks:
>> <a[\s]href=["'][^"]+[.\s]*["'][^<]+[.\s]*</a>
>> but my problem is trying to put all this together.
>>
>> I've thought of using Regex.Matches and checking each instance but
>> can't get that to work.
>>
>> Any ideas and/ or code would be great - i'm used to C# but VB's cool as
>> well.
>>
>> Cheers in advance,
>> max

>



 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
replacing tags between tags beartiger@gmail.com Perl Misc 9 09-19-2005 02:32 AM
All style tags after the first 30 style tags on an HTML page are not applied in Internet Explorer Rob Nicholson ASP .Net 3 05-28-2005 03:11 PM
Replacing - and not Replacing... Rob Meade ASP General 5 04-11-2005 06:49 PM
html tags within meta tags allowed? Donald Firesmith XML 5 01-08-2005 11:29 PM
RegEx to find CFML tags nested in HTML tags Dean H. Saxe Perl 0 01-03-2004 06:11 PM



Advertisments