Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > XML > XML traversal in level-order (breadth-first) with XSLT

Reply
Thread Tools

XML traversal in level-order (breadth-first) with XSLT

 
 
Christian Rühl
Guest
Posts: n/a
 
      12-11-2007
Hi all!

I need to traverse a XML file in level-order (breadth-first) in order
to number it's nodes. The XML structure looks a little like this:

<Component>
<Component>
<Component/>
</Component>
<Component/>
</Component>

I want to append attributes to each node carrying its level and its
occurence as integers.
Is there a chance to do this using XSLT?

The result should then look like:

<Component level="1" number="1">
<Component level="2" number="2">
<Component level="3" number="4"/>
</Component>
<Component level="2" number="3"/>
</Component>

After googling a little I found this:
<http://www.tkachenko.com/blog/archives/000268.html>

What do you think? What is a simple way to start here?

Thanks in advance!

//Chris
 
Reply With Quote
 
 
 
 
Christian Rühl
Guest
Posts: n/a
 
      12-11-2007
Okay, just found out that setting each node's level ain't that tough.
This is done with:

<xsl:variable name="mncl"><xsl:value-of select="count(ancestor::*)"/></
xsl:variable>
<xsl:attribute name="level"><xsl:value-of select="$mncl"/></
xsl:attribute>

But how can I achieve a correct level-ordered numbering of my
"Component" nodes?
 
Reply With Quote
 
 
 
 
Pavel Lepin
Guest
Posts: n/a
 
      12-11-2007

Christian Rühl <(E-Mail Removed)> wrote in
<(E-Mail Removed)>:
> I want to append attributes to each node carrying its
> level and its occurence as integers.
>
> <Component level="1" number="1">
> <Component level="2" number="2">
> <Component level="3" number="4"/>
> </Component>
> <Component level="2" number="3"/>
> </Component>


<xsl:stylesheet version="1.0"
xmlnssl="http://www.w3.org/1999/XSL/Transform">
<xslutput method="xml" indent="yes"/>
<xsl:template match="@*|node()[not(self::*)]">
<xsl:copy/>
</xsl:template>
<xsl:template match="*">
<xsl:copy>
<xsl:attribute name="level">
<xsl:call-template name="calc-level"/>
</xsl:attribute>
<xsl:attribute name="number">
<xsl:call-template name="calc-number"/>
</xsl:attribute>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template name="calc-level">
<xsl:value-of select="1+count(ancestor::*)"/>
</xsl:template>
<xsl:template name="calc-number">
<xsl:variable name="level">
<xsl:call-template name="calc-level"/>
</xsl:variable>
<xsl:value-of
select=
"
1
+count(//*[1+count(ancestor::*) &lt; $level])
+count(preceding::*[1+count(ancestor::*)=$level])
"/>
</xsl:template>
</xsl:stylesheet>

Note that while this works, calc-number named template is
computationally expensive. Using a general-purpose language
together with DOM API might be a much better solution for
large documents.

--
....also, I submit that we all must honourably commit seppuku
right now rather than serve the Dark Side by producing the
HTML 5 spec.
 
Reply With Quote
 
Christian Rühl
Guest
Posts: n/a
 
      12-11-2007
Thank you, Pavel!

> Note that while this works, calc-number named template is
> computationally expensive. Using a general-purpose language
> together with DOM API might be a much better solution for
> large documents.


I thought of that. But I don't use too large XML files, so that a XSLT
solution should not have noticeable effects here. The thing is, that
later I need to traverse the tree in pre-order - but of course then as
a DOM. So I was looking for a chance to avoid having two different DOM
traversals whereof one only gets the components numbered.

 
Reply With Quote
 
Christian Rühl
Guest
Posts: n/a
 
      12-11-2007
I played around a little and figured out, that this part always
returns 0:

<xsl:value-of select="count(//*[1+count(ancestor::*) &lt; $mncl])"/>

But I don't see whats wrong with it. In my eyes it looks okay though,
for it counts all direct ancestors with a smaller level, doesn't it?
 
Reply With Quote
 
Pavel Lepin
Guest
Posts: n/a
 
      12-11-2007

Please quote what you're replying to.

Christian Rühl <(E-Mail Removed)> wrote in
<(E-Mail Removed)>:
> I played around a little and figured out, that this part
> always returns 0:
>
> <xsl:value-of select="count(//*[1+count(ancestor::*) &lt;
> $mncl])"/>


Works fine for me, using a modified version of your sample
document, the transformation that I posted originally and
Saxon-8B:

pavel@debian:~/dev/xslt$ saxon -t comp.xml comp.xsl
Saxon 8.8J from Saxonica
Java version 1.5.0
Warning: at xsl:stylesheet on line 2 of
file:///var/www/dev/xslt/comp.xsl:
Running an XSLT 1.0 stylesheet with an XSLT 2.0 processor
Stylesheet compilation time: 778 milliseconds
Processing file:/var/www/dev/xslt/comp.xml
Building tree for file:/var/www/dev/xslt/comp.xml using
class net.sf.saxon.tinytree.TinyBuilder
Tree built in 11 milliseconds
Tree size: 45 nodes, 0 characters, 0 attributes
<?xml version="1.0" encoding="UTF-8"?>
<Component level="1" number="1">
<Component level="2" number="2">
<Component level="3" number="10"/>
<Component level="3" number="11"/>
<Component level="3" number="12"/>
</Component>
<Component level="2" number="3"/>
<Component level="2" number="4">
<Component level="3" number="13"/>
<Component level="3" number="14"/>
</Component>
<Component level="2" number="5"/>
<Component level="2" number="6">
<Component level="3" number="15"/>
<Component level="3" number="16"/>
<Component level="3" number="17"/>
</Component>
<Component level="2" number="7"/>
<Component level="2" number="8">
<Component level="3" number="18"/>
<Component level="3" number="19"/>
</Component>
<Component level="2" number="9"/>
</Component>Execution time: 241 milliseconds
Memory used: 15732736
NamePool contents: 19 entries in 17 chains. 7 prefixes, 8
URIs
pavel@debian:~/dev/xslt$

Using libxslt or Xalan-C++ yields the same results. Please
post minimal example that reproducibly demonstrates the
problem, and mention the transformation engine you're
using. Might be an XPath precedence problem, now that I
think about it. If that is the case, it's likely a problem
with your XSLT processor, although it is marginally
possible that three major engines would all get it wrong.

--
....also, I submit that we all must honourably commit seppuku
right now rather than serve the Dark Side by producing the
HTML 5 spec.
 
Reply With Quote
 
Christian Rühl
Guest
Posts: n/a
 
      12-11-2007
On 11 Dez., 14:34, Pavel Lepin <(E-Mail Removed)> wrote:
> Using libxslt or Xalan-C++ yields the same results. Please
> post minimal example that reproducibly demonstrates the
> problem, and mention the transformation engine you're
> using. Might be an XPath precedence problem, now that I
> think about it. If that is the case, it's likely a problem
> with your XSLT processor, although it is marginally
> possible that three major engines would all get it wrong.


Hm, then maybe I'm doing something wrong.

I'm transforming with javax.xml.transform.*; in Eclipse. Here's the
code I'm using:

------------------------------------------------------------------------------------
// set target location and xslt location
m_result = new StreamResult( new FileOutputStream(m_prodTreeFile) );
m_xsltSource = new StreamSource( new
FileInputStream(m_prodTreeXslt) );

// create transformer factory and transformer instance
m_factory = TransformerFactory.newInstance();
m_transformer = m_factory.newTransformer(m_xsltSource);

// set parameters (files to copy)
m_transformer.setParameter("tree", m_prodTree);
m_transformer.setParameter("archive", m_archiveFile);

// copy file contents and fill result target
m_transformer.transform(new StreamSource(), m_result);

// clear both parameters
m_transformer.clearParameters();
------------------------------------------------------------------------------------

I don't think that the problem is due to that code. Maybe I'm calling
my templates and/or parameters wrong.

Here's that code part (I modified the calculations due to 2 more node-
levels on top that I don't want to count):

------------------------------------------------------------------------------------
<xsl:template name="calculate-level">
<xsl:value-of select="count(ancestor::*)-1"/>
</xsl:template>

<xsl:template name="calculate-number">
<xsl:variable name="level">
<xsl:call-template name="calculate-level"/>
</xsl:variable>
<xsl:value-of select="1 + count(//*[count(ancestor::*)-1 &lt;
$level]) + count(preceding::*[count(ancestor::*)-1 = $level])"/>
</xsl:template>

<xsl:template name="top-level-component" match="/">
<xsl:for-each select="$treeDoc//PRODUCT_TREE">
<xsl:if test="Component">
<Component>
<xsl:attribute name="mncl"><xsl:call-template
name="calculate-level"/></xsl:attribute>
<xsl:attribute name="mncn"><xsl:call-template
name="calculate-number"/></xsl:attribute>
<xsl:if test="Component">
<xsl:call-template name="component"/>
</xsl:if>
</Component>
</xsl:if>
</xsl:for-each>
</xsl:template>

<xsl:template match="/">
<xsl:call-template name="top-level-component"/>
</xsl:template>
------------------------------------------------------------------------------------

My main problem here might be, that I can't put "$treeDoc" in the
match-tag of a template. Therefore I have one template for the top-
level-component which calls an analog template "component" that then
handles all following nodes. Maybe you can give me a hint here, for I
really doubt that this is a good way to go.

To show you the results I'm getting, my input file currently looks
like this:

------------------------------------------------------------------------------------
<top>
<product>
<Component>
<Component>
<Component>
<Component/>
</Component>
<Component/>
</Component>
<Component>
<Component/>
</Component>
</Component>
</product>
</top>
------------------------------------------------------------------------------------

And the result looks like:

------------------------------------------------------------------------------------
<top>
<product>
<Component level="1" number="1">
<Component level="2" number="2">
<Component level="3" number="2">
<Component level="4" number="4"/>
</Component>
<Component level="3" number="3"/>
</Component>
<Component level="2" number="3">
<Component level="3" number="5"/>
</Component>
</Component>
</product>
</top>
------------------------------------------------------------------------------------
 
Reply With Quote
 
Pavel Lepin
Guest
Posts: n/a
 
      12-11-2007

Once again, this post contains a good deal of critique. If
you find that somehow offensive, please just ignore it.

Christian Rühl <(E-Mail Removed)> wrote in
<(E-Mail Removed)>:
> On 11 Dez., 14:34, Pavel Lepin <(E-Mail Removed)>
> wrote:
>> Using libxslt or Xalan-C++ yields the same results.
>> Please post minimal example that reproducibly
>> demonstrates the problem, and mention the transformation
>> engine you're using. Might be an XPath precedence
>> problem, now that I think about it. If that is the case,
>> it's likely a problem with your XSLT processor, although
>> it is marginally possible that three major engines would
>> all get it wrong.

>
> I'm transforming with javax.xml.transform.*; in Eclipse.


I strongly advise that you get a standalone XSLT processor
somewhere and use it for debugging your transformations.

I'm not a Java programmer by trade, so correct me if I'm
wrong, but I believe it's just a generic API to various
transformation engines.

> // set target location and xslt location
> m_result = new StreamResult( new
> FileOutputStream(m_prodTreeFile) );
> m_xsltSource = new StreamSource( new
> FileInputStream(m_prodTreeXslt) );
>
> // create transformer factory and transformer instance
> m_factory = TransformerFactory.newInstance();
> m_transformer = m_factory.newTransformer(m_xsltSource);


Once again, correct me if I'm wrong, but this gives no
indication what engine you're actually using. Could be
Saxon, Xalan-J or any other transformation engine you
happen to have registered with your factory.

> // set parameters (files to copy)
> m_transformer.setParameter("tree", m_prodTree);
> m_transformer.setParameter("archive", m_archiveFile);


Are those file names or parsed XML documents? I believe
passing anything but integral data to your stylesheet is
ill-specified (mmm... if not outright disallowed - can't be
bothered to look it up right now), so I wouldn't do that if
there was any way around it.

> <xsl:template name="calculate-level">
> <xsl:value-of select="count(ancestor::*)-1"/>
> </xsl:template>


Bad idea. Instead of tinkering with the total count, modify
the XPath expression so that it returns a nodeset
consisting solely of the nodes that you actually want to
count:

1+count(ancestor::Component)

> <xsl:template name="calculate-number">
> <xsl:variable name="level">
> <xsl:call-template name="calculate-level"/>
> </xsl:variable>
> <xsl:value-of select="1 +
> count(//*[count(ancestor::*)-1 &lt;
> $level]) + count(preceding::*[count(ancestor::*)-1 =
> $level])"/> </xsl:template>


Same here.

> <xsl:template name="top-level-component" match="/">


Generally bad idea, unless you have a very good reason to do
this.

> <xsl:for-each select="$treeDoc//PRODUCT_TREE">


for-each? $treeDoc?

> <xsl:if test="Component">
> <Component>
> <xsl:attribute name="mncl"><xsl:call-template
> name="calculate-level"/></xsl:attribute>
> <xsl:attribute name="mncn"><xsl:call-template
> name="calculate-number"/></xsl:attribute>


Wrong. The current node is in the nodeset resulting from
evaluating the XPath expression in for-each. <xsl:if> does
not affect the current node. So you're creating a Component
element, then invoke the named templates meant to calculate
the level and number in context of PRODUCT_TREE element.

> <xsl:if test="Component">
> <xsl:call-template name="component"/>
> </xsl:if>


You haven't defined a component named template, though...

> </Component>
> </xsl:if>
> </xsl:for-each>
> </xsl:template>
>
> <xsl:template match="/">
> <xsl:call-template name="top-level-component"/>
> </xsl:template>


This whole idea is wrong. That's what identity
transformation is for (google it, it's THE ultimate
essential technique in XSLT).

> My main problem here might be, that I can't put "$treeDoc"
> in the match-tag of a template.


*shrug* What is $treeDoc anyway? You seem to be using it,
you're talking about, but haven't defined it anywhere.

> Therefore I have one template for the top-level-component
> which calls an analog template "component" that then
> handles all following nodes. Maybe you can give me a hint
> here, for I really doubt that this is a good way to go.


Why do you want to process top Component element
differently, anyway?

> And the result looks like:
>
> <top>
> <product>
> <Component level="1" number="1">
> <Component level="2" number="2">
> <Component level="3" number="2">
> <Component level="4" number="4"/>
> </Component>
> <Component level="3" number="3"/>
> </Component>
> <Component level="2" number="3">
> <Component level="3" number="5"/>
> </Component>
> </Component>
> </product>
> </top>


Well, I cannot run your version of it. But everything works
just fine in my version:

pavel@debian:~/dev/xslt$ xmllint comp2.xml
<?xml version="1.0"?>
<top>
<product>
<Component>
<Component>
<Component>
<Component/>
</Component>
<Component/>
</Component>
<Component>
<Component/>
</Component>
</Component>
</product>
</top>
pavel@debian:~/dev/xslt$ xmllint comp2.xsl
<?xml version="1.0"?>
<xsl:stylesheet
xmlnssl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xslutput method="xml" indent="yes"/>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="Component">
<xsl:copy>
<xsl:attribute name="level">
<xsl:call-template name="calc-level"/>
</xsl:attribute>
<xsl:attribute name="number">
<xsl:call-template name="calc-number"/>
</xsl:attribute>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template name="calc-level">
<xsl:value-of select="1+count(ancestor::Component)"/>
</xsl:template>
<xsl:template name="calc-number">
<xsl:variable name="level">
<xsl:call-template name="calc-level"/>
</xsl:variable>
<xsl:value-of select=" 1 +count
( //Component
[1+count(ancestor::Component) &lt; $level] )
+count ( preceding::Component
[1+count(ancestor::Component)=$level] ) "/>
</xsl:template>
</xsl:stylesheet>
pavel@debian:~/dev/xslt$ saxon -t comp2.xml comp2.xsl
Saxon 8.8J from Saxonica
Java version 1.5.0
Warning: at xsl:stylesheet on line 2 of
file:///var/www/dev/xslt/comp2.xsl:
Running an XSLT 1.0 stylesheet with an XSLT 2.0 processor
Stylesheet compilation time: 754 milliseconds
Processing file:/var/www/dev/xslt/comp2.xml
Building tree for file:/var/www/dev/xslt/comp2.xml using
class net.sf.saxon.tinytree.TinyBuilder
Tree built in 10 milliseconds
Tree size: 25 nodes, 0 characters, 0 attributes
<?xml version="1.0" encoding="UTF-8"?>
<top>
<product>
<Component level="1" number="1">
<Component level="2" number="2">
<Component level="3" number="4">
<Component level="4" number="7"/>
</Component>
<Component level="3" number="5"/>
</Component>
<Component level="2" number="3">
<Component level="3" number="6"/>
</Component>
</Component>
</product>
</top>Execution time: 145 milliseconds
Memory used: 14512128
NamePool contents: 20 entries in 19 chains. 7 prefixes, 8
URIs
pavel@debian:~/dev/xslt$

--
....also, I submit that we all must honourably commit seppuku
right now rather than serve the Dark Side by producing the
HTML 5 spec.
 
Reply With Quote
 
Joseph Kesselman
Guest
Posts: n/a
 
      12-11-2007
I'd suggest a worklist algorithm.

--
Joe Kesselman / Beware the fury of a patient man. -- John Dryden
 
Reply With Quote
 
Christian Rühl
Guest
Posts: n/a
 
      12-11-2007
On 11 Dez., 15:48, Pavel Lepin <(E-Mail Removed)> wrote:
> Once again, this post contains a good deal of critique. If
> you find that somehow offensive, please just ignore it.


Don't worry. I'm glad you're showing me my mistakes and helping me
making it better.

> I strongly advise that you get a standalone XSLT processor
> somewhere and use it for debugging your transformations.
>
> I'm not a Java programmer by trade, so correct me if I'm
> wrong, but I believe it's just a generic API to various
> transformation engines.


You are right:
"[...] This package defines the generic APIs for processing
transformation instructions, and performing a transformation from
source to result. These interfaces have no dependencies on SAX or the
DOM standard, and try to make as few assumptions as possible about the
details of the source and result of a transformation. It achieves this
by defining Source and Result interfaces. [...]"
<http://java.sun.com/j2se/1.4.2/docs/...sform/package-
summary.html>

> Once again, correct me if I'm wrong, but this gives no
> indication what engine you're actually using. Could be
> Saxon, Xalan-J or any other transformation engine you
> happen to have registered with your factory.
>
> > // set parameters (files to copy)
> > m_transformer.setParameter("tree", m_prodTree);
> > m_transformer.setParameter("archive", m_archiveFile);

>
> Are those file names or parsed XML documents? I believe
> passing anything but integral data to your stylesheet is
> ill-specified (mmm... if not outright disallowed - can't be
> bothered to look it up right now), so I wouldn't do that if
> there was any way around it.


These parameters carry the full paths of the files I want to work
with. The first one doesn't matter here. The second one (String
m_prodTree) holds the path of my input product tree.

> 1+count(ancestor::Component)


Okay, that's what I did. Thanks, works fine!

> > <xsl:template name="calculate-number">
> > <xsl:variable name="level">
> > <xsl:call-template name="calculate-level"/>
> > </xsl:variable>
> > <xsl:value-of select="1 +
> > count(//*[count(ancestor::*)-1 &lt;
> > $level]) + count(preceding::*[count(ancestor::*)-1 =
> > $level])"/> </xsl:template>

>
> Same here.
>
> > <xsl:template name="top-level-component" match="/">

>
> Generally bad idea, unless you have a very good reason to do
> this.


Why is this bad and what would be a good reason? I can't follow you
here. How would you do this?

> > <xsl:for-each select="$treeDoc//PRODUCT_TREE">

>
> for-each? $treeDoc?


Yeah, I know that's a bad one. But I didn't find a better way... I've
been searching a couple of days now...
And as I said, i can't put "$treeDoc" in the match-tag of a template.

> > <xsl:if test="Component">
> > <Component>
> > <xsl:attribute name="mncl"><xsl:call-template
> > name="calculate-level"/></xsl:attribute>
> > <xsl:attribute name="mncn"><xsl:call-template
> > name="calculate-number"/></xsl:attribute>

>
> Wrong. The current node is in the nodeset resulting from
> evaluating the XPath expression in for-each. <xsl:if> does
> not affect the current node. So you're creating a Component
> element, then invoke the named templates meant to calculate
> the level and number in context of PRODUCT_TREE element.


Hm, that makes sense.
Ooops, and sorry: I forgot to add "/Component" to the select-tag. So
the whole thing actually looks like <xsl:for-each select="$treeDoc//
PRODUCT_TREE/Component"> which then makes the <xsl:if> even more
useless.

> > <xsl:if test="Component">
> > <xsl:call-template name="component"/>
> > </xsl:if>

>
> You haven't defined a component named template, though...


Oh, I did. I just didn't post it here, because it looks exactly like
the "top-level-component" template. Except the <xsl:for-each
select="Component"> (Here I got rid of the "$treeDoc" part).

> This whole idea is wrong. That's what identity
> transformation is for (google it, it's THE ultimate
> essential technique in XSLT).


Sorry, but I don't really know how to work with that. I think Joe
Kesselman postet a Link to it last time. Looks "good" to me, but I
have no idea how to work with that.

> *shrug* What is $treeDoc anyway? You seem to be using it,
> you're talking about, but haven't defined it anywhere.


That's a DOM of the input file:
<xsl:variable name="treeDoc" select="document($tree)"/>

> > Therefore I have one template for the top-level-component
> > which calls an analog template "component" that then
> > handles all following nodes. Maybe you can give me a hint
> > here, for I really doubt that this is a good way to go.

>
> Why do you want to process top Component element
> differently, anyway?


Actually I don't, I just didn't find a "workaround" As said above.
The problem is, that I need to do this "level-order component
numbering" within another transformation. Therefore I have the second
(String m_archiveFile) parameter. If you want, I can upload my
stylesheet and my input files for you. So maybe then you are able to
run my version.

And btw: what would be a good stand-alone XSLT processor? I'm
currently editing my files with Altova XMLSpy 2006, but I'm not quite
sure if there's a XSLT processor integrated. Haven't tried yet.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
XML traversal in levelorder C. Rühl Java 2 12-06-2007 12:38 PM
Problem to insert an XML-element by XSLT-converting from one XML-file into another XML-file jkflens XML 2 05-30-2006 09:41 AM
NEWB: reverse traversal of xml file manstey Python 3 05-24-2006 04:45 AM
Including XSLT/XML document within a XSLT document dar_imiro@hotmail.com XML 4 12-13-2005 02:26 AM
ANN: New low-cost XML Editor, XSLT Editor, XSLT Debugger, DTD/Schema Editor Stylus Studio Java 0 08-03-2004 03:53 PM



Advertisments