Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > XML > Normalizing XHTML with XML

Reply
Thread Tools

Normalizing XHTML with XML

 
 
Ryan Stewart
Guest
Posts: n/a
 
      05-11-2006
I'm getting XHTML input that can be in a number of formats, and I'm
trying to get it into a consistent format for later use. "Consistent"
in this case means everything in the root/body is in either a p, table,
img, ol, or ul tag. I'm processing just the body text. There is no head
section or anything. So the body is the root of the tree that I'm
processing. I've got almost everything working except one thing. If I
get input like the following:
some text<br/>some more text

then I need that to become two paragraphs, like:
<p>some text</p>
<p>some more text</p>

That's easy enough. But if I get this input:
some text <a href="blah">link</a> some more text

that should all become one paragraph:
<p>some text <a href="blah">link</a> some more text<p>

And if a table, list, or image is encountered, that should be the end
of a paragraph if there is one:
some text<table> ... </table>some more text

becomes
<p>some text</p>
<table> ... </table>
<p>some more text</p>

Again, simply placing the text nodes inside p tags is simple, but a
problem arises if there is a link or other tag inside some of that
text. (At this point other tags don't actually matter because I'm
stripping them out, but links need to be passed through.)

Basically, my problem boils down to this:
1) I need to select any text node child of the root and surround it
with p tags, but
2) if an a element is a child of the root, it should be joined with any
adjacent text nodes and the whole thing should be surrounded with p
tags.

Can someone give me an example of how to do this with XSL?

 
Reply With Quote
 
 
 
 
Joe Kesselman
Guest
Posts: n/a
 
      05-11-2006
> 1) I need to select any text node child of the root and surround it
> with p tags, but


> 2) if an a element is a child of the root, it should be joined with any
> adjacent text nodes and the whole thing should be surrounded with p
> tags.


.... If I put those two rules together, I get "I want to wrap a <p>
element around all the root's children". Since that's trivial, I presume
there's some case where you don't want to do that....?
 
Reply With Quote
 
 
 
 
Ryan Stewart
Guest
Posts: n/a
 
      05-11-2006
Yes, only text nodes and links should be inside p tags. Tables, lists,
and images will also be present and must not be wrapped, especially
since tables and lists are block elements and p tags may only contain
inline elements. Maybe a more complex example:
some text <a href="blah">a link</a> some more text<br/>
third text node<table>...</table>final text node

should become:
<p>some text <a href="blah">a link</a> some more text</p>
<p>third text node</p>
<table>...</table>
<p>final text node</p>

Notice that the <br/> causes a new p element, the first two root-level
text nodes and the a element in between them become one paragraph, the
third text node becomes a paragraph, the table is not touched, and the
last text node becomes a paragraph.

 
Reply With Quote
 
Ryan Stewart
Guest
Posts: n/a
 
      05-11-2006
>From looking around some more, I'm seeing that XSLT should be viewed as
transforming nodes from a source tree into nodes in a result tree. So a
different way of looking at my problem might be, "How do I grab
consecutive text and inline nodes (besides the br and img elements)
that are children of the root node from the source tree and put them
inside one node (a p element) in the result tree?"

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
What is normalizing in XML? sivasu.india@gmail.com Java 1 04-09-2007 09:38 PM
Normalizing mp3's. What do you use? JohnF Computer Support 9 12-20-2006 03:58 PM
Method for normalizing URL? Chris Java 6 01-12-2005 05:38 AM
Normalizing XML tag values Alexey Verkhovsky Ruby 2 08-01-2004 07:48 PM
Normalizing tm structure past 2038 Stu C Programming 5 11-01-2003 07:49 AM



Advertisments