Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > XML > Re: Problems creating an automatic index for XHTML with XSLT

Reply
Thread Tools

Re: Problems creating an automatic index for XHTML with XSLT

 
 
Marrow
Guest
Posts: n/a
 
      09-11-2003
Hi Alex,

It seems that you want to structure your flat <H?> elements into a
hierarchical item list. Something like the following stylesheet will give
you the output you wanted...

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlnssl="http://www.w3.org/1999/XSL/Transform">
<xslutput method="xml" indent="yes"/>
<!-- key for grouping H? elements by their parent H? element -->
<xsl:key name="kHGroups" match="H2"
use="concat('1|',generate-id(preceding-sibling::H1[1]))"/>
<xsl:key name="kHGroups" match="H3"
use="concat('2|',generate-id(preceding-sibling::H2[1]))"/>
<xsl:key name="kHGroups" match="H4"
use="concat('3|',generate-id(preceding-sibling::H3[1]))"/>
<xsl:key name="kHGroups" match="H5"
use="concat('4|',generate-id(preceding-sibling::H4[1]))"/>
<xsl:key name="kHGroups" match="H6"
use="concat('5|',generate-id(preceding-sibling::H5[1]))"/>
<xsl:template match="HTML">
<xsl:copy>
<xsl:copy-of select="@*"/>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>

<xsl:template match="BODY">
<xsl:copy>
<xsl:copy-of select="@*"/>
<H1>Index</H1>
<OL>
<xsl:apply-templates select="H1" mode="struc"/>
</OL>
<xsl:apply-templates select="H1 | H2 | H3 | H4 | H5 | H6"/>
</xsl:copy>
</xsl:template>

<!-- template for structuring H? elements into item lists -->
<xsl:template match="H1 | H2 | H3 | H4 | H5 | H6" mode="struc">
<LI>
<a href="#{.}">
<xsl:value-of select="."/>
</a>
<!-- get the children of this H -->
<xsl:variable name="h-children"
select="key('kHGroups',concat(substring(name(),2,1 ),'|',generate-id()))"/>
<xsl:if test="$h-children">
<OL>
<xsl:apply-templates select="$h-children" mode="struc"/>
</OL>
</xsl:if>
</LI>
</xsl:template>

<!-- template for listing H? elements as <a> links -->
<xsl:template match="H1 | H2 | H3 | H4 | H5 | H6">
<a name="{.}">
<xsl:copy-of select="."/>
</a>
</xsl:template>

</xsl:stylesheet>

Hope this helps
Marrow
http://www.marrowsoft.com - home of Xselerator (XSLT IDE and debugger)
http://www.topxml.com/Xselerator


"Alex Geller" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)-ig.de...
> Hi,
> I am trying to add an index up front of an XSLT document.
> The style should spot H1,H2 and H3s and make some sort of index from it
> (currently I use nested OL/LI).
> Example:
> $cat test.html
> <HTML>
> <BODY>
> <H1>H1 1</H1>
> <H2>H2 1.1</H2>
> <H3>H3 1.1.1</H3>
> <H3>H3 1.1.2</H3>
> <H3>H3 1.1.3</H3>
> <H2>H2 1.2</H2>
> <H3>H3 1.2.1</H3>
> <H3>H3 1.2.2</H3>
> <H3>H3 1.2.3</H3>
> <H1>H1 2</H1>
> <H2>H2 2.1</H2>
> <H3>H3 2.1.1</H3>
> <H3>H3 2.1.2</H3>
> <H3>H3 2.1.3</H3>
> <H2>H2 2.2</H2>
> <H3>H3 2.2.1</H3>
> <H3>H3 2.2.2</H3>
> <H3>H3 2.2.3</H3>
> </BODY>
> </HTML>
> $xalan -in test.html -xsl mkindex.xslt -out result.html
> $cat result.html
> <?xml version="1.0" encoding="UTF-8"?>
> <HTML>
> <BODY>
> <H1>Index</H1>
> <OL>
> <LI><a href="#H1 1">H1 1</a>
> <OL>
> <LI><a href="#H2 1.1">H2 1.1</a>
> <OL>
> <LI><a href="#H3 1.1.1">H3 1.1.1</a></LI>
> <LI><a href="#H3 1.1.2">H3 1.1.2</a>
> </LI><LI><a href="#H3 1.1.3">H3 1.1.3</a></LI>
> </OL>
> </LI>
> <LI><a href="#H2 1.2">H2 1.2</a>
> <OL>
> <LI><a href="#H3 1.2.1">H3 1.2.1</a></LI>
> <LI><a href="#H3 1.2.2">H3 1.2.2</a></LI>
> <LI><a href="#H3 1.2.3">H3 1.2.3</a></LI>
> </OL>
> </LI>
> </OL>
> </LI>
> <LI><a href="#H1 2">H1 2</a>
> <OL>
> <LI><a href="#H2 2.1">H2 2.1</a>
> <OL>
> <LI><a href="#H3 2.1.1">H3 2.1.1</a></LI>
> <LI><a href="#H3 2.1.2">H3 2.1.2</a>
> </LI><LI><a href="#H3 2.1.3">H3 2.1.3</a></LI>
> </OL>
> </LI>
> <LI><a href="#H2 2.2">H2 2.2</a>
> <OL>
> <LI><a href="#H3 2.2.1">H3 2.2.1</a></LI>
> <LI><a href="#H3 2.2.2">H3 2.2.2</a></LI>
> <LI><a href="#H3 2.2.3">H3 2.2.3</a></LI>
> </OL>
> </LI>
> </OL>
> </LI>
> </OL>
> <a name="H1 1"><H1>H1 1</H1></a>
> <a name="H2 1.1"><H2>H2 1.1</H2></a>
> <a name="H3 1.1.1"><H3>H3 1.1.1</H3></a>
> <a name="H3 1.1.2"><H3>H3 1.1.2</H3></a>
> <a name="H3 1.1.3"><H3>H3 1.1.3</H3></a>
> <a name="H2 1.2"><H2>H2 1.2</H2></a>
> <a name="H3 1.2.1"><H3>H3 1.2.1</H3></a>
> <a name="H3 1.2.2"><H3>H3 1.2.2</H3></a>
> <a name="H3 1.2.3"><H3>H3 1.2.3</H3></a>
> <a name="H1 2"><H1>H1 2</H1></a>
> <a name="H2 2.1"><H2>H2 2.1</H2></a>
> <a name="H3 2.1.1"><H3>H3 2.1.1</H3></a>
> <a name="H3 2.1.2"><H3>H3 2.1.2</H3></a>
> <a name="H3 2.1.3"><H3>H3 2.1.3</H3></a>
> <a name="H2 2.2"><H2>H2 2.2</H2></a>
> <a name="H3 2.2.1"><H3>H3 2.2.1</H3></a>
> <a name="H3 2.2.2"><H3>H3 2.2.2</H3></a>
> <a name="H3 2.2.3"><H3>H3 2.2.3</H3></a>
> </BODY>
> </HTML>
> I have found a solution that works for me but which is not very good.

Maybe
> it's helpful as a starting point or just to prove that I have tried a
> little before posting.
> The solution has at least the following problems:
> - In order to detect all H2 silblings between the two H1 elements H1a and
> H1b I create a node list of all siblings following H1a and then, by
> conditional, check whether the previous H1 sibling of the current H2 is

H1.
> The check is done by comparing the text of the H1 nodes. This breaks as
> soon as two adjacent H1s have the same text.
> - The template fails as soon as the Hn tags are not siblings in the same
> list. Suppose we introduce a <div> in the document so that one or more Hn
> elements become descendants of this element, then my scheme breaks.
> $cat mkindex.xslt
> <?xml version="1.0" encoding="ISO-8859-1"?>
> <xsl:stylesheet
> xmlnssl="http://www.w3.org/1999/XSL/Transform"
> version="1.0">
> <xslutput method="xml"/>
>
> <xsl:template match="@*|node()">
> <xsl:copy>
> <xsl:apply-templates select="@*|node()"/>
> </xsl:copy>
> </xsl:template>
> <xsl:template match="BODY">
> <BODY>
> <H1>Index</H1>
> <OL>
> <xsl:for-each select="H1">
> <xsl:variable name="h1text" select="text()"/>
> <LI>
> <a href="#{text()}">
> <xsl:value-of select="text()"/>
> </a>
> <OL>
> <xsl:for-each select="following-sibling::H2">
> <xsl:if
> test="preceding-sibling::H1[position()=1]/text()=$h1text">
> <xsl:variable name="h2text"
> select="text()"/>
> <LI>
> <a href="#{text()}">
> <xsl:value-of

select="text()"/>
> </a>
> <OL>
> <xsl:for-each
> select="following-sibling::H3">
> <xsl:if
> test="preceding-sibling::H2[position()=1]/text()=$h2text">
> <LI>
> <a

href="#{text()}">
> <xsl:value-of
> select="text()"/>
> </a>
> </LI>
> </xsl:if>
> </xsl:for-each>
> </OL>
> </LI>
> </xsl:if>
> </xsl:for-each>
> </OL>
> </LI>
> </xsl:for-each>
> </OL>
> <xsl:apply-templates/>
> </BODY>
> </xsl:template>
> <xsl:template match="H1">
> <a name="{text()}">
> <H1>
> <xsl:apply-templates/>
> </H1>
> </a>
> </xsl:template>
> <xsl:template match="H2">
> <a name="{text()}">
> <H2>
> <xsl:apply-templates/>
> </H2>
> </a>
> </xsl:template>
> <xsl:template match="H3">
> <a name="{text()}">
> <H3>
> <xsl:apply-templates/>
> </H3>
> </a>
> </xsl:template>
> </xsl:stylesheet>
> Thank you for your help
> Regards,
> Alex



 
Reply With Quote
 
 
 
 
Alex Geller
Guest
Posts: n/a
 
      09-11-2003
Hi Marrow,
thank you for your help.
Marrow wrote:

> Hi Alex,
>
> It seems that you want to structure your flat <H?> elements into a
> hierarchical item list.

Exactly
>Something like the following stylesheet will give
> you the output you wanted...
>
> <?xml version="1.0"?>
> <xsl:stylesheet version="1.0"
> xmlnssl="http://www.w3.org/1999/XSL/Transform">
>..

Your style works for the test case and it seems, that you have solved the
first problem of the name comparing. The second problem as I pointed out,
is that the template should work for almost arbitrary HTML documents where
the Hns are not neccessarly siblings but are maybe partially enclosed in a
<div> for example.
Consider the following fragment:
<H1>H1 1</H>
<div class="examples">
<H2>H2 1.1</H2>
<H2>H2 1.2</H2>
</div>
<H1>H1 2...
In this case both mine and your style fail to see the two nested H2s.
Your style has another problem of failing to copy the document content. My
first naive attempt of adding a general copy template didn't work.

Thank you however, I will study your style and try to understand how it
works.

Regards,
Alex
 
Reply With Quote
 
 
 
 
Dimitre Novatchev
Guest
Posts: n/a
 
      09-11-2003
"Alex Geller" <(E-Mail Removed)> wrote in message news:(E-Mail Removed)-ig.de...
> Hi Dimitre,
> Dimitre Novatchev wrote:
>
> > Excuse me, but it is not clear what exactly you want to produce from your
> > source xhtml -- how the output is related to the input (they seem
> > essentially to have the same structure),

> Well, not quite. The input H?s have a flat structure (siblings) while in the
> output they are nested (Hn+1 become descendands of Hn). Maybe you were
> fooled by the indentation of the input HTML.
> >how the output must be structured

> Exactly as shown (best is, you view both the input and the output in a
> browser).
> > and what requirements it must satisfy.

>
> >
> > In other words, can you define what you mean by "index"?

>
> I want an automatic generation of a table of contents up front of an
> arbitrary HTML document where chapters and subchapters are denoted using H?
> tags. The style should search for these tags in the document, create the
> table of contents from those tags and then copy the document itself. The
> items in the table of contents should be linked vi <a href=.." to their
> respective chapters in the document. The table of content should have a
> hirachical structure using numbered lists as shown in the example. The
> rules for the structure could be defined as follows:
> Let v be a vector of all H? elements found in a pre order traversal of the
> document tree.
> For example:
> v=H1,H2,H2,H3,H2,H3,H1,H1,H2,H3,H2,H3
> We call n of a Hn element, it's hierarchy value.
> Create a resulttree r so that it contains all nodes from the source vector
> v. In this resulttree r every node vn from the source vector v
> becomes the child of it's preceding sibling vn-1 if the hierarchy value of
> vn is lower than the hirarchy value of vn-1
> r=H1(H2,H2(H3),H2(H3)),H1,H1(H2(H3),H2(H3).
>
> Thank you,
> Alex


Hi Alex,

The following transformation implements all your requirements,
including the one that different Hx may not always be siblings, but
may be children of other elements, e.g. div.

<xsl:stylesheet version="1.0"
xmlnssl="http://www.w3.org/1999/XSL/Transform">

<xslutput omit-xml-declaration="yes" indent="yes"/>

<xsl:key name="kChildren"
match="H2"
use="generate-id(
(ancestor::H1 | preceding::H1)
[last()]
)"/>
<xsl:key name="kChildren"
match="H3"
use="generate-id(
(ancestor::H2 | preceding::H2)
[last()]
)"/>
<xsl:key name="kChildren"
match="H4"
use="generate-id(
(ancestor::H3 | preceding::H3)
[last()]
)"/>
<xsl:key name="kChildren"
match="H5"
use="generate-id(
(ancestor::H4 | preceding::H4)
[last()]
)"/>
<xsl:key name="kChildren"
match="H6"
use="generate-id(
(ancestor::H5 | preceding::H5)
[last()]
)"/>
<xsl:template match="/">
<html>
<xsl:apply-templates select="/*/*/H1" mode="TOC"/>
<xsl:apply-templates/>
</html>
</xsl:template>

<xsl:template match="H1 | H2 | H3 | H4 | H5 | H6"
mode="TOC">
<LI><a href="#{.}"><xsl:value-of select="."/></a>
<OL>
<xsl:apply-templates
select="key('kChildren', generate-id())"
mode="TOC"/>
</OL>
</LI>
</xsl:template>

<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>

<xsl:template match="H1 | H2 | H3 | H4 | H5 | H6">
<a name="#{.}"/>
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>

When applied on this source.xhtml:

<HTML>
<BODY>
<H1>H1 1</H1>
<H2>H2 1.1</H2>
<div>
<H3>H3 1.1.1</H3>
<H3>H3 1.1.2</H3>
</div>
<H3>H3 1.1.3</H3>
<H2>H2 1.2</H2>
<H3>H3 1.2.1</H3>
<H3>H3 1.2.2</H3>
<H3>H3 1.2.3</H3>
<H1>H1 2</H1>
<H2>H2 2.1</H2>
<H3>H3 2.1.1</H3>
<H3>H3 2.1.2</H3>
<H3>H3 2.1.3</H3>
<H2>H2 2.2</H2>
<H3>H3 2.2.1</H3>
<H3>H3 2.2.2</H3>
<H3>H3 2.2.3</H3>
</BODY>
</HTML>

the wanted result is produced:

<html>
<LI><a href="#H1 1">H1 1</a><OL>
<LI><a href="#H2 1.1">H2 1.1</a><OL>
<LI><a href="#H3 1.1.1">H3 1.1.1</a><OL></OL>
</LI>
<LI><a href="#H3 1.1.2">H3 1.1.2</a><OL></OL>
</LI>
<LI><a href="#H3 1.1.3">H3 1.1.3</a><OL></OL>
</LI>
</OL>
</LI>
<LI><a href="#H2 1.2">H2 1.2</a><OL>
<LI><a href="#H3 1.2.1">H3 1.2.1</a><OL></OL>
</LI>
<LI><a href="#H3 1.2.2">H3 1.2.2</a><OL></OL>
</LI>
<LI><a href="#H3 1.2.3">H3 1.2.3</a><OL></OL>
</LI>
</OL>
</LI>
</OL>
</LI>
<LI><a href="#H1 2">H1 2</a><OL>
<LI><a href="#H2 2.1">H2 2.1</a><OL>
<LI><a href="#H3 2.1.1">H3 2.1.1</a><OL></OL>
</LI>
<LI><a href="#H3 2.1.2">H3 2.1.2</a><OL></OL>
</LI>
<LI><a href="#H3 2.1.3">H3 2.1.3</a><OL></OL>
</LI>
</OL>
</LI>
<LI><a href="#H2 2.2">H2 2.2</a><OL>
<LI><a href="#H3 2.2.1">H3 2.2.1</a><OL></OL>
</LI>
<LI><a href="#H3 2.2.2">H3 2.2.2</a><OL></OL>
</LI>
<LI><a href="#H3 2.2.3">H3 2.2.3</a><OL></OL>
</LI>
</OL>
</LI>
</OL>
</LI>
<HTML>

<BODY>
<a name="#H1 1"></a><H1>H1 1</H1>
<a name="#H2 1.1"></a><H2>H2 1.1</H2>

<div>
<a name="#H3 1.1.1"></a><H3>H3 1.1.1</H3>
<a name="#H3 1.1.2"></a><H3>H3 1.1.2</H3>

</div>
<a name="#H3 1.1.3"></a><H3>H3 1.1.3</H3>
<a name="#H2 1.2"></a><H2>H2 1.2</H2>
<a name="#H3 1.2.1"></a><H3>H3 1.2.1</H3>
<a name="#H3 1.2.2"></a><H3>H3 1.2.2</H3>
<a name="#H3 1.2.3"></a><H3>H3 1.2.3</H3>
<a name="#H1 2"></a><H1>H1 2</H1>
<a name="#H2 2.1"></a><H2>H2 2.1</H2>
<a name="#H3 2.1.1"></a><H3>H3 2.1.1</H3>
<a name="#H3 2.1.2"></a><H3>H3 2.1.2</H3>
<a name="#H3 2.1.3"></a><H3>H3 2.1.3</H3>
<a name="#H2 2.2"></a><H2>H2 2.2</H2>
<a name="#H3 2.2.1"></a><H3>H3 2.2.1</H3>
<a name="#H3 2.2.2"></a><H3>H3 2.2.2</H3>
<a name="#H3 2.2.3"></a><H3>H3 2.2.3</H3>

</BODY>

</HTML>
</html>

Hope this helped.


=====
Cheers,

Dimitre Novatchev.
http://fxsl.sourceforge.net/ -- the home of FXSL
 
Reply With Quote
 
Alex Geller
Guest
Posts: n/a
 
      09-12-2003

> The following transformation implements all your requirements,
> including the one that different Hx may not always be siblings, but
> may be children of other elements, e.g. div.

Looks elegant, works, thank you very much!
Regards,
Alex
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
convert xhtml to another xhtml using xslt Usha2009 XML 0 12-20-2009 01:13 PM
sorting index-15, index-9, index-110 "the human way"? Tomasz Chmielewski Perl Misc 4 03-04-2008 05:01 PM
Should I Convert Site To XHTML or XHTML mobile? chronos3d HTML 9 12-05-2006 04:46 PM
parse URL (href) from xhtml, xhtml -> text, for data hawat.thufir@gmail.com XML 7 02-08-2006 07:39 PM
Re: Problems creating an automatic index for XHTML with XSLT Dimitre Novatchev XML 1 09-11-2003 09:38 AM



Advertisments