Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > RE: Easy way to remove HTML entities from an HTML document?

Reply
Thread Tools

RE: Easy way to remove HTML entities from an HTML document?

 
 
Robert Brewer
Guest
Posts: n/a
 
      07-25-2004
Robert Oschler wrote:
> Is there a module/function to remove all the HTML entities
> from an HTML document (e.g. - &nbsp, &amp, &apos, etc.)?


Grab cleanhtml.py from the bottom of
http://www.aminus.org/rbre/python/index.html -- you should be able to
quickly rewrite the Plaintext class and just limit it to replacing (or
removing) entities--at least the regex is already written for you.

HTH!


Robert Brewer
MIS
Amor Ministries

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
.NET-ey way to convert XML-encoded/escaped entities into normal characters/HTML? ASP .Net 2 06-20-2007 05:32 PM
Easy way to remove HTML entities from an HTML document? Robert Oschler Python 8 07-31-2004 02:03 AM
HTML::Entities::encode() returning wrong(?) entities Jim Higson Perl Misc 3 07-25-2004 09:13 PM
easy way to remove nonprintable chars from string Don Hiatt Python 3 07-24-2003 08:47 PM
RE: easy way to remove nonprintable chars from string sismex01@hebmex.com Python 0 07-24-2003 08:11 PM



Advertisments