Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > How to apply text changes to HTML, keeping it intact if inside "a" tags

Reply
Thread Tools

How to apply text changes to HTML, keeping it intact if inside "a" tags

 
 
vbfoobar@gmail.com
Guest
Posts: n/a
 
      09-27-2006
Hello,

I have HTML input to which I apply some changes.

Feature 1:
=======
I want to tranform all the text, but if the text is inside
an "a href" tag, I want to leave the text as it is.

The HTML is not necessarily well-formed, so
I would like to do that using BeautifulSoup (or
maybe another tolerant parser).

As a test case, suppose I want to uppercase all the text
except the text that is within "a href" tags:

ExampleString = """
<footag>Lorem Ipsum</footag> is simply
dummy text of <a href="junk.html">the printing</a> and
<a href="junk2.html">typesetting <b>industry</b>.</a>
Thanks."""

When applying the text transform, I want to obtain:

<footag>LOREM IPSUM</footag> IS SIMPLY
DUMMY TEXT OF <a href="junk.html">the printing</a> AND
<a href="junk2.html">typesetting <b>industry</b>.</a>
THANKS."""


Feature 2:
========
Another thing I may want to do: If the text I would normally
transform is inside an "a href" tag, then do not transform it,
but insert the result of text transformation just after the "</a>".

Using the same example as input, application of
this feature2 would give something like that:

<footag>LOREM IPSUM</footag> IS SIMPLY
DUMMY TEXT OF <a href="junk.html">the printing</a><feat2>THE
PRINTING</feat2> AND
<a href="junk2.html">typesetting
<b>industry</b>.</a><feat2>TYPESETTING <b>INDUSTRY</b>.</feat2>
THANKS."""

========
Thanks for your help

 
Reply With Quote
 
 
 
 
Diez B. Roggisch
Guest
Posts: n/a
 
      09-27-2006
http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:

> Hello,
>
> I have HTML input to which I apply some changes.
>
> Feature 1:
> =======
> I want to tranform all the text, but if the text is inside
> an "a href" tag, I want to leave the text as it is.
>
> The HTML is not necessarily well-formed, so
> I would like to do that using BeautifulSoup (or
> maybe another tolerant parser).
>


<snip/>

Use the BeautifulSoup + XSL. Writing your two features in xsl is close to a
no-brainer, and it is certainly the best tool for the job.

And there are a few implementations for python available.

Diez
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: How include a large array? Edward A. Falk C Programming 1 04-04-2013 08:07 PM
HTML Colored Text Into Photoshop Intact? alanryder@aol.com HTML 3 10-04-2006 04:29 AM
Keeping DOM changes intact Angel Javascript 0 04-07-2006 10:45 AM
removing text from HTML but keeping HTML intact Raja Kannan Javascript 2 07-10-2004 09:58 PM
[XSLT] could not apply "apply-templates" Stefan Siegl XML 1 07-18-2003 09:43 AM



Advertisments