Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > XML > XSLT Extract Text from Nodes

Reply
Thread Tools

XSLT Extract Text from Nodes

 
 
gregmcmullinjr@gmail.com
Guest
Posts: n/a
 
      10-10-2006
Hello,

I am new to the concept of XSL and am looking for some assistance.

Take the following XML document:

<binder>
<author>Greg</author>
<notes>
<time>11:45</time>
<content>
This would be some content... every once in a while you may run
into
<heading>A Heading!</heading>
Which could be followed by more content... and possible
<heading>More Headings.</heading>
and even more content!
</content>
</notes>
</binder>

What I would like to do is to be able to extract the value of the
<content> node, and have special formatting for the headings.

When I do something like:

<xsl:value-of select="content" />

I receive the data within <content> - including the values of the
nested <heading> nodes, but what I really want to be able to do is do
is to have XSLT read the text of the <content> node until a <heading>
node is reached, at which point the value of the heading node is
formatted correctly and displayed, and then continued by the text of
the <content> node after the <heading> until another <heading> is
reached... etc etc...

Could someone give me some pointers as to how this can be accomplished?

 
Reply With Quote
 
 
 
 
Martin Honnen
Guest
Posts: n/a
 
      10-10-2006


http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:


> <content>
> This would be some content... every once in a while you may run
> into
> <heading>A Heading!</heading>
> Which could be followed by more content... and possible
> <heading>More Headings.</heading>
> and even more content!
> </content>



> What I would like to do is to be able to extract the value of the
> <content> node, and have special formatting for the headings.


Use templates and xsl:apply-templates e.g.

<xsl:template match="content">
<div>
<xsl:apply-templates/>
</div>
</xsl:template>

<xsl:template match="heading">
<h1>
<xsl:apply-templates/>
</h1>
</xsl:template>

There is a built-in template for text nodes
<http://www.w3.org/TR/xslt#built-in-rule>
so you don't have to do anything for them, they end up in the result
tree anyway with the above approach.


--

Martin Honnen
http://JavaScript.FAQTs.com/
 
Reply With Quote
 
 
 
 
gregmcmullinjr@gmail.com
Guest
Posts: n/a
 
      10-10-2006
Thanks for your quick reply Martin,

This has brought me closer to what I would like to accomplish, however
I now have the following issue.

I was using the xsl:value-of element with disable-output-escaping="yes"
to produce HTML formatted text in the browser screen. You see within
the <content> node there may be HTML that should be displayed as such.
Your method produces all of the text in the correct order and formatted
according to tag name, but produces HTML tags which should be hidden.

ie.

<content>
There may be some <i>italicized</i> text...
<heading>Maybe even <u>formatting in a heading</u></heading>
...
</content>

Is there some way to overcome this?

Martin Honnen wrote:
> (E-Mail Removed) wrote:
>
>
> > <content>
> > This would be some content... every once in a while you may run
> > into
> > <heading>A Heading!</heading>
> > Which could be followed by more content... and possible
> > <heading>More Headings.</heading>
> > and even more content!
> > </content>

>
>
> > What I would like to do is to be able to extract the value of the
> > <content> node, and have special formatting for the headings.

>
> Use templates and xsl:apply-templates e.g.
>
> <xsl:template match="content">
> <div>
> <xsl:apply-templates/>
> </div>
> </xsl:template>
>
> <xsl:template match="heading">
> <h1>
> <xsl:apply-templates/>
> </h1>
> </xsl:template>
>
> There is a built-in template for text nodes
> <http://www.w3.org/TR/xslt#built-in-rule>
> so you don't have to do anything for them, they end up in the result
> tree anyway with the above approach.
>
>
> --
>
> Martin Honnen
> http://JavaScript.FAQTs.com/


 
Reply With Quote
 
gregmcmullinjr@gmail.com
Guest
Posts: n/a
 
      10-10-2006
I should say that the HTML tags within my XML document are stored as
entities (at least the < character is) i.e.

<content>
This is some &lt;i>italicized&lt;/i> text...
...
</content>

Thanks.


(E-Mail Removed) wrote:
> Thanks for your quick reply Martin,
>
> This has brought me closer to what I would like to accomplish, however
> I now have the following issue.
>
> I was using the xsl:value-of element with disable-output-escaping="yes"
> to produce HTML formatted text in the browser screen. You see within
> the <content> node there may be HTML that should be displayed as such.
> Your method produces all of the text in the correct order and formatted
> according to tag name, but produces HTML tags which should be hidden.
>
> ie.
>
> <content>
> There may be some <i>italicized</i> text...
> <heading>Maybe even <u>formatting in a heading</u></heading>
> ...
> </content>
>
> Is there some way to overcome this?
>
> Martin Honnen wrote:
> > (E-Mail Removed) wrote:
> >
> >
> > > <content>
> > > This would be some content... every once in a while you may run
> > > into
> > > <heading>A Heading!</heading>
> > > Which could be followed by more content... and possible
> > > <heading>More Headings.</heading>
> > > and even more content!
> > > </content>

> >
> >
> > > What I would like to do is to be able to extract the value of the
> > > <content> node, and have special formatting for the headings.

> >
> > Use templates and xsl:apply-templates e.g.
> >
> > <xsl:template match="content">
> > <div>
> > <xsl:apply-templates/>
> > </div>
> > </xsl:template>
> >
> > <xsl:template match="heading">
> > <h1>
> > <xsl:apply-templates/>
> > </h1>
> > </xsl:template>
> >
> > There is a built-in template for text nodes
> > <http://www.w3.org/TR/xslt#built-in-rule>
> > so you don't have to do anything for them, they end up in the result
> > tree anyway with the above approach.
> >
> >
> > --
> >
> > Martin Honnen
> > http://JavaScript.FAQTs.com/


 
Reply With Quote
 
gregmcmullinjr@gmail.com
Guest
Posts: n/a
 
      10-10-2006
I have found a solution. The following is the build in template for
text nodes:

<xsl:template match="text()|@*">
<xsl:value-of select="."/>
</xsl:template>

It can be overridden simply by creating a new custom template, which I
did as the following:

<xsl:template match="text()|@*">
<xsl:value-of select="." disable-output-escaping="yes"/>
</xsl:template>

The result is that the HTML in the text nodes outputs as desired.

(E-Mail Removed) wrote:
> I should say that the HTML tags within my XML document are stored as
> entities (at least the < character is) i.e.
>
> <content>
> This is some &lt;i>italicized&lt;/i> text...
> ...
> </content>
>
> Thanks.
>
>
> (E-Mail Removed) wrote:
> > Thanks for your quick reply Martin,
> >
> > This has brought me closer to what I would like to accomplish, however
> > I now have the following issue.
> >
> > I was using the xsl:value-of element with disable-output-escaping="yes"
> > to produce HTML formatted text in the browser screen. You see within
> > the <content> node there may be HTML that should be displayed as such.
> > Your method produces all of the text in the correct order and formatted
> > according to tag name, but produces HTML tags which should be hidden.
> >
> > ie.
> >
> > <content>
> > There may be some <i>italicized</i> text...
> > <heading>Maybe even <u>formatting in a heading</u></heading>
> > ...
> > </content>
> >
> > Is there some way to overcome this?
> >
> > Martin Honnen wrote:
> > > (E-Mail Removed) wrote:
> > >
> > >
> > > > <content>
> > > > This would be some content... every once in a while you may run
> > > > into
> > > > <heading>A Heading!</heading>
> > > > Which could be followed by more content... and possible
> > > > <heading>More Headings.</heading>
> > > > and even more content!
> > > > </content>
> > >
> > >
> > > > What I would like to do is to be able to extract the value of the
> > > > <content> node, and have special formatting for the headings.
> > >
> > > Use templates and xsl:apply-templates e.g.
> > >
> > > <xsl:template match="content">
> > > <div>
> > > <xsl:apply-templates/>
> > > </div>
> > > </xsl:template>
> > >
> > > <xsl:template match="heading">
> > > <h1>
> > > <xsl:apply-templates/>
> > > </h1>
> > > </xsl:template>
> > >
> > > There is a built-in template for text nodes
> > > <http://www.w3.org/TR/xslt#built-in-rule>
> > > so you don't have to do anything for them, they end up in the result
> > > tree anyway with the above approach.
> > >
> > >
> > > --
> > >
> > > Martin Honnen
> > > http://JavaScript.FAQTs.com/


 
Reply With Quote
 
roy axenov
Guest
Posts: n/a
 
      10-10-2006

Please don't top-post.

(E-Mail Removed) wrote:
> Martin Honnen wrote:
> > (E-Mail Removed) wrote:
> > > <content>
> > > This would be some content... every once in a
> > > while you may run into
> > > <heading>A Heading!</heading>
> > > Which could be followed by more content... and
> > > possible
> > > <heading>More Headings.</heading>
> > > and even more content!
> > > </content>

> >
> > Use templates and xsl:apply-templates e.g.
> >
> > <xsl:template match="content">
> > <div>
> > <xsl:apply-templates/>
> > </div>
> > </xsl:template>
> >
> > <xsl:template match="heading">
> > <h1>
> > <xsl:apply-templates/>
> > </h1>
> > </xsl:template>

>
> This has brought me closer to what I would like to
> accomplish, however I now have the following issue.
>
> I was using the xsl:value-of element with
> disable-output-escaping="yes" to produce HTML formatted
> text in the browser screen. You see within the <content>
> node there may be HTML that should be displayed as such.
> Your method produces all of the text in the correct order
> and formatted according to tag name, but produces HTML
> tags which should be hidden.
>
> ie.
>
> <content>
> There may be some <i>italicized</i> text...
> <heading>Maybe even <u>formatting in a
> heading</u></heading>
> ...
> </content>
>
> Is there some way to overcome this?
>
> I should say that the HTML tags within my XML document
> are stored as entities (at least the < character is) i.e.
>
> <content>
> This is some &lt;i>italicized&lt;/i> text...
> ...
> </content>


Don't do that, it seems to lead to innumerable problems.
Store you mark-up as XML instead:

<content>
This is some <i>italicized</i> text...
...
</content>

....and use the identity transformation to convert it into
HTML:

<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>

This also has the virtue of fitting neatly with the
solution for your original problem that Martin Honnen has
proposed.

You might also need to write exclusion templates for some
nodes, but that's hardly a problem.

--
roy axenov

 
Reply With Quote
 
gregmcmullinjr@gmail.com
Guest
Posts: n/a
 
      10-10-2006
Not sure what a top-post is...

While I see what your saying Roy, the problem is that the contained
HTML is not necessarily well formed because of the way that its formed
at this time. Perhaps when I have figured out how to force it to be
well formed I can use this solution.

Thanks.

roy axenov wrote:
> Please don't top-post.
>
> (E-Mail Removed) wrote:
> > Martin Honnen wrote:
> > > (E-Mail Removed) wrote:
> > > > <content>
> > > > This would be some content... every once in a
> > > > while you may run into
> > > > <heading>A Heading!</heading>
> > > > Which could be followed by more content... and
> > > > possible
> > > > <heading>More Headings.</heading>
> > > > and even more content!
> > > > </content>
> > >
> > > Use templates and xsl:apply-templates e.g.
> > >
> > > <xsl:template match="content">
> > > <div>
> > > <xsl:apply-templates/>
> > > </div>
> > > </xsl:template>
> > >
> > > <xsl:template match="heading">
> > > <h1>
> > > <xsl:apply-templates/>
> > > </h1>
> > > </xsl:template>

> >
> > This has brought me closer to what I would like to
> > accomplish, however I now have the following issue.
> >
> > I was using the xsl:value-of element with
> > disable-output-escaping="yes" to produce HTML formatted
> > text in the browser screen. You see within the <content>
> > node there may be HTML that should be displayed as such.
> > Your method produces all of the text in the correct order
> > and formatted according to tag name, but produces HTML
> > tags which should be hidden.
> >
> > ie.
> >
> > <content>
> > There may be some <i>italicized</i> text...
> > <heading>Maybe even <u>formatting in a
> > heading</u></heading>
> > ...
> > </content>
> >
> > Is there some way to overcome this?
> >
> > I should say that the HTML tags within my XML document
> > are stored as entities (at least the < character is) i.e.
> >
> > <content>
> > This is some &lt;i>italicized&lt;/i> text...
> > ...
> > </content>

>
> Don't do that, it seems to lead to innumerable problems.
> Store you mark-up as XML instead:
>
> <content>
> This is some <i>italicized</i> text...
> ...
> </content>
>
> ...and use the identity transformation to convert it into
> HTML:
>
> <xsl:template match="@*|node()">
> <xsl:copy>
> <xsl:apply-templates select="@*|node()"/>
> </xsl:copy>
> </xsl:template>
>
> This also has the virtue of fitting neatly with the
> solution for your original problem that Martin Honnen has
> proposed.
>
> You might also need to write exclusion templates for some
> nodes, but that's hardly a problem.
>
> --
> roy axenov


 
Reply With Quote
 
Johannes Koch
Guest
Posts: n/a
 
      10-10-2006
(E-Mail Removed) schrieb:
> roy axenov wrote:
>> Please don't top-post.


> Not sure what a top-post is...


Then ask a search engine. It will lead you to some documents like
<http://www.catb.org/~esr/jargon/html/T/top-post.html>.

--
Johannes Koch
Spem in alium nunquam habui praeter in te, Deus Israel.
(Thomas Tallis, 40-part motet)
 
Reply With Quote
 
Martin Honnen
Guest
Posts: n/a
 
      10-11-2006


(E-Mail Removed) wrote:


> It can be overridden simply by creating a new custom template, which I
> did as the following:
>
> <xsl:template match="text()|@*">
> <xsl:value-of select="." disable-output-escaping="yes"/>
> </xsl:template>
>
> The result is that the HTML in the text nodes outputs as desired.


If that works for you then you can use it. But you should be aware that
disable-output-escaping support is an optional feature during
serialization of the result tree meaning it might not be supported at
all by an XSLT processor or it is not supported when you don't serialize
the result tree (e.g. when you chain transformation or e.g. in a browser
like Mozilla where the result tree is being rendered directly without
any serialization happening).

--

Martin Honnen
http://JavaScript.FAQTs.com/
 
Reply With Quote
 
gregmcmullinjr@gmail.com
Guest
Posts: n/a
 
      10-11-2006
I think this will suffice for my needs as I am doing the
transformations on the server.

Thanks again.

Martin Honnen wrote:
> (E-Mail Removed) wrote:
>
>
> > It can be overridden simply by creating a new custom template, which I
> > did as the following:
> >
> > <xsl:template match="text()|@*">
> > <xsl:value-of select="." disable-output-escaping="yes"/>
> > </xsl:template>
> >
> > The result is that the HTML in the text nodes outputs as desired.

>
> If that works for you then you can use it. But you should be aware that
> disable-output-escaping support is an optional feature during
> serialization of the result tree meaning it might not be supported at
> all by an XSLT processor or it is not supported when you don't serialize
> the result tree (e.g. when you chain transformation or e.g. in a browser
> like Mozilla where the result tree is being rendered directly without
> any serialization happening).
>
> --
>
> Martin Honnen
> http://JavaScript.FAQTs.com/


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
xslt help needed with element nodes embedded in text node Volker Lenhardt XML 4 02-23-2012 01:34 PM
Why treat text nodes as nodes? Xamle Eng XML 8 05-28-2005 01:11 PM
Text nodes and element nodes query asd Java 3 05-23-2005 10:01 AM
XSLT: concatenating selected text nodes Andy Fish XML 2 01-10-2005 11:57 AM
XSLT Select nodes without text-node children whose names starts with specifix text Michael Reiche XML 3 02-05-2004 10:40 PM



Advertisments