Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > XML > Has anyone solved the problem of lists in WordML (Word 2003)?

Reply
Thread Tools

Has anyone solved the problem of lists in WordML (Word 2003)?

 
 
Clifford W. Racz
Guest
Posts: n/a
 
      05-13-2004
Has anyone solved the issue of translating lists in Word 2003 (WordML)
into xHTML? I have been trying to get the nested table code for my XSLT
to work for a while now, with no way to get the collection that I need.

To begin, I am using xsltproc that conmes with Cygwin as my processor.
I have no particular affinity to this processor except that it is open
source and standards compliant. I don't like M$, but if using a M$
processing program will fix this transformation, then I will use it.
xsltproc can be gotten here (for Windows platform):
http://www.zlatkovic.com/libxml.en.html
ftp://ftp.zlatkovic.com/pub/libxml/
(This is a windows port of libxslt, that comes with GNOME).

The problem is this:

As those of you who have worked with this type of problem, the WordML
structure is a flat structure where the focus is on visual formatting.
So, instead of a nicely nested list structure like HTML has, WordML has
a linear collection of w elements that contain a child <w:listPr>
element, containing the list information. A typical Word paragraph that
represents a list item is shown here:

<w>
<wPr>
<w:listPr>
<w:ilvl w:val="0"/>
<w:ilfo w:val="2"/>
<wx:t wx:val="" wx:wTabBefore="360" wx:wTabAfter="240"/>
<wx:font wx:val="Symbol"/>
</w:listPr>
</wPr>
<w:r>
<w:t>Bulleted item 1</w:t>
</w:r>
</w>

The item <w:ilvl w:val="0"/> tells me that the level of nesting for this
item is "0", i.e. the first level (zero based counting).

My model for processing this list was this: As I encounter the first
<w> that is a list item, represented by the xPath
match="w[descendant-or-self::wPr/w:listPr][1]", then I grab the
entire collection of following-sibling elements that are paragraphs with
listPr children. This is "grabbing the list". I call a template and
pass this list to the template.

The template itself is a recursive template. Whenever I encounter a
"transitional list item" (one that is at a level greater then the
current level being processed by the template), I want to grab the
sub-collection of list elements above my current level, enclose them in
<ol></ol> and then call the template again with the new collection.

So... what is my problem? Let us pretend that my list looks like this:

* Bulleted item 1
* Bulleted item 2
o First level nesting, bulleted item 2-1
o First level nesting, bulleted item 2-2
* Bulleted item 3
* Bulleted item 4
o First level nesting, bulleted item 4-1
o First level nesting, bulleted item 4-2
o First level nesting, bulleted item 4-3
o First level nesting, bulleted item 4-4
* Bulleted item 5
* Bulleted item 6


When I am processing level 0, I don't have any issues until I grab the
items on level 1. When I do, I not only get the items 2-1 and 2-2, but
also 4-1, 4-2, 4-3, and 4-4. I have tried tweaking the xPath for this
list, but to no avail. My output looks like this with my method:

* Bulleted item 1
* Bulleted item 2
o First level nesting, bulleted item 2-1
o First level nesting, bulleted item 2-2
o First level nesting, bulleted item 4-1
o First level nesting, bulleted item 4-2
o First level nesting, bulleted item 4-3
o First level nesting, bulleted item 4-4
* Bulleted item 3
* Bulleted item 4
o First level nesting, bulleted item 4-1
o First level nesting, bulleted item 4-2
o First level nesting, bulleted item 4-3
o First level nesting, bulleted item 4-4
* Bulleted item 5
* Bulleted item 6

This following 2 items are the stripped down WordML and stripped down
XSLT for this transformation, to make this posting not insanely long.
If anyone can contribute to this problem or has already solved it, I
would be most grateful for feedback.

Cliff

************************************************** ******************************
XSLT for processing the WordML
************************************************** ******************************
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xsl:stylesheet [
<!ENTITY tab " ">
<!ENTITY sp " ">
<!ENTITY crlf " ">
<!ENTITY nbsp " ">
<!ENTITY bullet "•">
]>
<xsl:stylesheet xmlnssl="http://www.w3.org/1999/XSL/Transform"
xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml"
xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:w10="urn:schemas-microsoft-comffice:word"
xmlns:sl="http://schemas.microsoft.com/schemaLibrary/2003/core"
xmlns:aml="http://schemas.microsoft.com/aml/2001/core"
xmlns:wx="http://schemas.microsoft.com/office/word/2003/auxHint"
xmlns="urn:schemas-microsoft-comfficeffice"
xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882"
xmlns:st1="urn:schemas-microsoft-comffice:smarttags" version="1.0"
exclude-result-prefixes="w v w10 sl aml wx o dt st1">
<!-- START stylesheet commands -->
<xslutput method="xml" version="1.0" encoding="UTF-8" indent="yes"
doctype-system="fubar.dtd" />
<xsl:strip-space elements="*" />
<xslreserve-space elements="w:binData w:tab" />
<!-- End stylesheet commands -->
<!-- START variable declarations -->
<!-- null value for text comparisons -->
<xsl:variable name="null"></xsl:variable>
<!-- null value for text comparisons -->
<xsl:variable name="space">&sp;</xsl:variable>
<!-- null value for text comparisons -->
<xsl:variable name="bullet">À·</xsl:variable>
<!-- END variable declarations -->
<!-- START template declarations -->

<xsl:template match="/w:wordDocument">
<html>
<!-- Process the head information -->
<xsl:apply-templates select="//oocumentProperties" mode="head" />
<!-- Process the body information -->
<xsl:apply-templates select="//w:body" mode="body" />
</html>
</xsl:template>

<xsl:template match="w:body" mode="body">
<body>
<xsl:apply-templates select="*" mode="body" />
</body>
</xsl:template>

<xsl:template match="wx:sect" mode="body">
<xsl:apply-templates mode="body" />
</xsl:template>

<xsl:template match="wx:sub-section" mode="body">
<xsl:apply-templates mode="body" />
</xsl:template>

<xsl:template match="w[descendant-or-self::wPr/w:listPr][1]"
mode="body">
<!-- <xsl:comment> w[1] template match found... </xsl:comment> -->
<xsl:call-template name="listProcessor" mode="list">
<xsl:with-param name="myCollectionOfSiblingListItems"
select=".|following-sibling::w[descendant-or-self::wPr/w:listPr]" />
</xsl:call-template>
</xsl:template>

<xsl:template name="listProcessor" mode="list">
<xslaram name="myCollectionOfSiblingListItems" />

<xsl:variable name="myCurrentListLevel"
select="$myCollectionOfSiblingListItems[1]/wPr/w:listPr/w:ilvl/@w:val" />
<ul>
<xsl:for-each select="$myCollectionOfSiblingListItems">

<xsl:variable name="previousSiblingListLevel"
select="preceding-sibling::w[position() =
1]/wPr/w:listPr/w:ilvl/@w:val" />
<xsl:variable name="myOwnCurrentListLevel"
select="descendant-or-self::w/wPr/w:listPr/w:ilvl/@w:val" />
<xsl:variable name="nextSiblingListLevel"
select="following-sibling::w[position() =
1]/wPr/w:listPr/w:ilvl/@w:val" />

<xsl:variable name="attempToGetTheRightSetIntoAVariable"
select="following-sibling::w[child::wPr/w:listPr/w:ilvl/@w:val][generate-id(preceding-sibling::w[child::wPr/w:listPr/w:ilvl/@w:val
= 0]) = generate-id(current())]" />

<!-- <xsl:comment> current contents: <xsl:value-of
select="current()" /><xsl:text> </xsl:text></xsl:comment> -->
<!-- <xsl:comment><xsl:text> *****Found a collection of this many
items: </xsl:text><xsl:value-of
select="count($attempToGetTheRightSetIntoAVariable )" /><xsl:text>
</xsl:text></xsl:comment> -->

<xsl:choose>
<xsl:when
test="number(descendant-or-self::wPr/w:listPr/w:ilvl/@w:val) =
number($myCurrentListLevel)">
<li>
<xsl:call-template name="processParagraphAsListItemContents"
mode="list" />
</li>
</xsl:when>
<xsl:when test="(
number(descendant-or-self::wPr/w:listPr/w:ilvl/@w:val) &gt;
number($myCurrentListLevel) ) and (
number(descendant-or-self::wPr/w:listPr/w:ilvl/@w:val) &gt;
number($previousSiblingListLevel))">
<xsl:variable name="nextListItemIndexOnOrBelowMyLevel"
select="following-sibling::w[wPr/w:listPr/w:ilvl/@w:val &lt;=
number($myCurrentListLevel)]" />
<xsl:variable name="subCollection"
select=".|following-sibling::w[descendant-or-self::wPr/w:listPr/w:ilvl/@w:val
&gt; number($myCurrentListLevel)]"></xsl:variable>

<!-- <xsl:comment> My current list level for recursive call:
<xsl:value-of select="number($myCurrentListLevel)" /> , with current
contents: <xsl:value-of select="." /><xsl:text>
</xsl:text></xsl:comment> -->
<li>
<xsl:call-template name="listProcessor" mode="list" >
<xsl:with-param name="myCollectionOfSiblingListItems"
select="$subCollection" />
</xsl:call-template>
</li>

</xsl:when>
<xsltherwise>
<!-- Do nothing! -->
</xsltherwise>
</xsl:choose>
</xsl:for-each>
</ul>
</xsl:template>

<xsl:template name="processParagraphAsListItemContents" mode="list">
<xsl:if test="descendant-or-self::text()">
<xsl:apply-templates mode="body" />
</xsl:if>
</xsl:template>

<xsl:template match="w:t" mode="body">
<xsl:value-of select="." />
</xsl:template>

<xsl:template match="w:r|w:b|w:u|w:i" mode="body">
<xsl:apply-templates mode="body" />
</xsl:template>

<xsl:template match="*" mode="body">
<!-- Do nothing... drop content here... -->
</xsl:template>
<!-- END template declarations -->
</xsl:stylesheet>


************************************************** ******************************
Sample stripped down WordML
************************************************** ******************************
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<?mso-application progid="Word.Document"?>
<w:wordDocument
xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml"
xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:w10="urn:schemas-microsoft-comffice:word"
xmlns:sl="http://schemas.microsoft.com/schemaLibrary/2003/core"
xmlns:aml="http://schemas.microsoft.com/aml/2001/core"
xmlns:wx="http://schemas.microsoft.com/office/word/2003/auxHint"
xmlns="urn:schemas-microsoft-comfficeffice"
xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882"
w:macrosPresent="no" w:embeddedObjPresent="no" wcxPresent="no"
xml:space="preserve">
<w:body>
<wx:sect>
<wx:sub-section>
<w>
<wPr>
<wStyle w:val="Heading1"/></wPr>
<w:r>
<w:t>Test #9</w:t></w:r></w>
<w>
<w:r>
<w:t>Here is a bulleted test list with 2 levels deep
nesting:</w:t></w:r></w>
<w>
<wPr>
<w:listPr>
<w:ilvl w:val="0"/>
<w:ilfo w:val="2"/>
<wx:t wx:val="" wx:wTabBefore="360" wx:wTabAfter="240"/>
<wx:font wx:val="Symbol"/></w:listPr></wPr>
<w:r>
<w:t>Bulleted item 1</w:t></w:r></w>
<w>
<wPr>
<w:listPr>
<w:ilvl w:val="0"/>
<w:ilfo w:val="2"/>
<wx:t wx:val="" wx:wTabBefore="360" wx:wTabAfter="240"/>
<wx:font wx:val="Symbol"/></w:listPr></wPr>
<w:r>
<w:t>Bulleted item 2</w:t></w:r></w>
<w>
<wPr>
<w:listPr>
<w:ilvl w:val="1"/>
<w:ilfo w:val="2"/>
<wx:t wx:val="o" wx:wTabBefore="1080" wx:wTabAfter="210"/>
<wx:font wx:val="Courier New"/></w:listPr></wPr>
<w:r>
<w:t>First level nesting, bulleted item 2-1</w:t></w:r></w>
<w>
<wPr>
<w:listPr>
<w:ilvl w:val="1"/>
<w:ilfo w:val="2"/>
<wx:t wx:val="o" wx:wTabBefore="1080" wx:wTabAfter="210"/>
<wx:font wx:val="Courier New"/></w:listPr></wPr>
<w:r>
<w:t>First level nesting, bulleted item 2-2</w:t></w:r></w>
<w>
<wPr>
<w:listPr>
<w:ilvl w:val="0"/>
<w:ilfo w:val="2"/>
<wx:t wx:val="" wx:wTabBefore="360" wx:wTabAfter="240"/>
<wx:font wx:val="Symbol"/></w:listPr></wPr>
<w:r>
<w:t>Bulleted item 3</w:t></w:r></w>
<w>
<wPr>
<w:listPr>
<w:ilvl w:val="0"/>
<w:ilfo w:val="2"/>
<wx:t wx:val="" wx:wTabBefore="360" wx:wTabAfter="240"/>
<wx:font wx:val="Symbol"/></w:listPr></wPr>
<w:r>
<w:t>Bulleted item 4</w:t></w:r></w>
<w>
<wPr>
<w:listPr>
<w:ilvl w:val="1"/>
<w:ilfo w:val="2"/>
<wx:t wx:val="o" wx:wTabBefore="1080" wx:wTabAfter="210"/>
<wx:font wx:val="Courier New"/></w:listPr></wPr>
<w:r>
<w:t>First level nesting, bulleted item 4-1</w:t></w:r></w>
<w>
<wPr>
<w:listPr>
<w:ilvl w:val="1"/>
<w:ilfo w:val="2"/>
<wx:t wx:val="o" wx:wTabBefore="1080" wx:wTabAfter="210"/>
<wx:font wx:val="Courier New"/></w:listPr></wPr>
<w:r>
<w:t>First level nesting, bulleted item 4-2</w:t></w:r></w>
<w>
<wPr>
<w:listPr>
<w:ilvl w:val="1"/>
<w:ilfo w:val="2"/>
<wx:t wx:val="o" wx:wTabBefore="1080" wx:wTabAfter="210"/>
<wx:font wx:val="Courier New"/></w:listPr></wPr>
<w:r>
<w:t>First level nesting, bulleted item 4-3</w:t></w:r></w>
<w>
<wPr>
<w:listPr>
<w:ilvl w:val="1"/>
<w:ilfo w:val="2"/>
<wx:t wx:val="o" wx:wTabBefore="1080" wx:wTabAfter="210"/>
<wx:font wx:val="Courier New"/></w:listPr></wPr>
<w:r>
<w:t>First level nesting, bulleted item 4-4</w:t></w:r></w>
<w>
<wPr>
<w:listPr>
<w:ilvl w:val="0"/>
<w:ilfo w:val="2"/>
<wx:t wx:val="" wx:wTabBefore="360" wx:wTabAfter="240"/>
<wx:font wx:val="Symbol"/></w:listPr></wPr>
<w:r>
<w:t>Bulleted item 5</w:t></w:r></w>
<w>
<wPr>
<w:listPr>
<w:ilvl w:val="0"/>
<w:ilfo w:val="2"/>
<wx:t wx:val="" wx:wTabBefore="360" wx:wTabAfter="240"/>
<wx:font wx:val="Symbol"/></w:listPr></wPr>
<w:r>
<w:t>Bulleted item 6</w:t></w:r></w>
<w>
<w:r>
<w:t>Here is some following text...</w:t></w:r></w></wx:sub-section>
<wx:sub-section>
<w>
<wPr>
<wStyle w:val="Heading1"/></wPr>
<w:r>
<w:t>Test #10</w:t></w:r></w>
<w>
<w:r>
<w:t>Here is another bulleted test list for testing
purposes:</w:t></w:r></w>
<w>
<wPr>
<w:listPr>
<w:ilvl w:val="0"/>
<w:ilfo w:val="2"/>
<wx:t wx:val="" wx:wTabBefore="360" wx:wTabAfter="240"/>
<wx:font wx:val="Symbol"/></w:listPr></wPr>
<w:r>
<w:t>Another list entirely, Bulleted item 1</w:t></w:r></w>
<w>
<wPr>
<w:listPr>
<w:ilvl w:val="0"/>
<w:ilfo w:val="2"/>
<wx:t wx:val="" wx:wTabBefore="360" wx:wTabAfter="240"/>
<wx:font wx:val="Symbol"/></w:listPr></wPr>
<w:r>
<w:t>Another list entirely, Bulleted item 2</w:t></w:r></w>
<w/>
<w:sectPr>
<wgSz w:w="12240" w:h="15840"/>
<wgMar w:top="1440" w:right="1800" w:bottom="1440" w:left="1800"
w:header="720" w:footer="720" w:gutter="0"/>
<w:cols w:space="720"/>
<w:docGrid
w:line-pitch="360"/></w:sectPr></wx:sub-section></wx:sect></w:body></w:wordDocument>
 
Reply With Quote
 
 
 
 
Mary McRae
Guest
Posts: n/a
 
      05-14-2004
Check Oleg Tkachenko's site; he created a WordML -> HTML XSLT app.
http://blog.tkachenko.com

--
Mary McRae
blogs: http://blogs.officezealot.com/mary
web: http://www.office-xml.com


 
Reply With Quote
 
 
 
 
Ben Edgington
Guest
Posts: n/a
 
      05-14-2004
"Clifford W. Racz" <(E-Mail Removed)> writes:
<snip/>
> This following 2 items are the stripped down WordML and stripped down
> XSLT for this transformation, to make this posting not insanely
> long. If anyone can contribute to this problem or has already solved
> it, I would be most grateful for feedback.


I don't think you are going to be able to solve this by tinkering with
the XPath - the XML just doesn't have enough structure. The problem
is that you need to track transitions between list levels, and they
are not accessible with XPath in this flat XML structure

Here's a radically simplified version that simulates your problem that
you should be able to adapt to your code easily enough.

It again uses a recursive template to keep track of the list level,
but now the list items are considered sequentially rather than in
groups of the same level. Using the recursion we can detect when the
list level changes and insert some markup accordingly (this has to be
done "by hand" using disable-output-escaping, which is ugly. But it
works).

[Note by the way: you can't use the mode attribute on
xsl:call-template, but that's not the problem here]

This transformation

- - -
<xsl:stylesheet
xmlnssl="http://www.w3.org/1999/XSL/Transform"
version="1.0">

<xsl:template match="/">
<xsl:apply-templates/>
</xsl:template>

<xsl:template match="list">
<ul>
<xsl:call-template name="process-list-items">
<xsl:with-param name="item-number" select="1"/>
<xsl:with-param name="level" select="0"/>
</xsl:call-template>
</ul>
</xsl:template>

<xsl:template name="process-list-items">
<xslaram name="item-number"/>
<xslaram name="level"/>

<xsl:variable name="current-item" select="./item[$item-number]"/>

<!-- If the list level has increased we start a sublist -->
<xsl:if test="$level &lt; $current-item/level/@val">
<xsl:text disable-output-escaping="yes">&lt;ul></xsl:text>
</xsl:if>

<!-- If the list level has decreased we end the sublist -->
<xsl:if test="$level &gt; $current-item/level/@val">
<xsl:text disable-output-escaping="yes">&lt;/ul></xsl:text>
</xsl:if>

<!-- Output the list item -->
<li><xsl:value-of select="$current-item/text"/></li>

<!-- Process the next list item -->
<xsl:if test="./item[$item-number+1]">
<xsl:call-template name="process-list-items">
<xsl:with-param name="item-number" select="$item-number+1"/>
<xsl:with-param name="level" select="$current-item/level/@val"/>
</xsl:call-template>
</xsl:if>

</xsl:template>

</xsl:stylesheet>
- - -

with this XML

- - -
<list>
<item>
<level val="0"/>
<text>Item 1</text>
</item>
<item>
<level val="0"/>
<text>Item 2</text>
</item>
<item>
<level val="1"/>
<text>Item 2-1</text>
</item>
<item>
<level val="1"/>
<text>Item 2-2</text>
</item>
<item>
<level val="0"/>
<text>Item 3</text>
</item>
<item>
<level val="0"/>
<text>Item 4</text>
</item>
<item>
<level val="1"/>
<text>Item 4-1</text>
</item>
<item>
<level val="1"/>
<text>Item 4-2</text>
</item>
<item>
<level val="0"/>
<text>Item 5</text>
</item>
</list>
- - -

gives this output (after reformatting)

- - -
<?xml version="1.0"?>
<ul>
<li>Item 1</li>
<li>Item 2</li>
<ul>
<li>Item 2-1</li>
<li>Item 2-2</li>
</ul>
<li>Item 3</li>
<li>Item 4</li>
<ul>
<li>Item 4-1</li>
<li>Item 4-2</li>
</ul>
<li>Item 5</li>
</ul>
- - -

--
Ben Edgington
Mail to the address above is discarded.
Mail to ben at that address might be read.
http://www.edginet.org/
 
Reply With Quote
 
Oleg Tkachenko [MVP]
Guest
Posts: n/a
 
      05-16-2004
Clifford W. Racz wrote:

> Has anyone solved the issue of translating lists in Word 2003 (WordML)
> into xHTML? I have been trying to get the nested table code for my XSLT
> to work for a while now, with no way to get the collection that I need.


You may want to download Microsoft's WordML viewer and take a look at
their XSLT stylesheet.

--
Oleg Tkachenko [XML MVP]
http://blog.tkachenko.com
 
Reply With Quote
 
Clifford W. Racz
Guest
Posts: n/a
 
      05-17-2004
I have looked at the M$ WordML viewer... of course, that was one of the first things I did.

What is spit out of that thing is a paragraph that is styled to sort-of look like a list item, just as word handles it in WordML.

For example, here is the first list item when transformed by the word2html.xsl:

<p class="Normal-P" style="margin-left:36pt;text-indent:-18pt;">
<span class="Normal-H"><span style="font-family:Symbol;font-style:normal;text-decoration:none;font-weight:normal;"><span style="padding-left:12pt;"></span></span>Bulleted item 1</span>
</p>

And so, that is useless when trying to export this to html. Word internally handles it differently because the "save as..." html option does export it properly. However, I am not wanting html output, only xml that is compatable with the html list model.

Clifford


Oleg Tkachenko [MVP] wrote:
> Clifford W. Racz wrote:
>
>> Has anyone solved the issue of translating lists in Word 2003 (WordML)
>> into xHTML? I have been trying to get the nested table code for my XSLT
>> to work for a while now, with no way to get the collection that I need.

>
>
> You may want to download Microsoft's WordML viewer and take a look at
> their XSLT stylesheet.
>

 
Reply With Quote
 
Clifford W. Racz
Guest
Posts: n/a
 
      05-20-2004
I am trying to write a simple XSLT to "beautify" any arbitrary xml, i.e. to indent it for readability and convert it to UTF-8 for use in some scripts that I authored.

If I use something like this:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlnssl="http://www.w3.org/1999/XSL/Transform">
<xslutput method="xml" indent="yes" encoding="UTF-8" />
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:copy-of select="."/>
</xsl:template>
</xsl:stylesheet>

It does the trick nicely. I want to accept any arbitrary xml language, so I don't specify a default namespace. Not a problem.

Problem: I want to output the proper DOCTYPE statement for the input file, so that I can validte it.

So, does anyone know a way to access the public and system dtd names for an input filetype?


Clifford
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
RE: Populating a dictionary, fast [SOLVED SOLVED] Michael Bacarella Python 26 11-20-2007 03:02 PM
List of lists of lists of lists... =?UTF-8?B?w4FuZ2VsIEd1dGnDqXJyZXogUm9kcsOtZ3Vleg==?= Python 5 05-15-2006 11:47 AM
Re: WordML processing tools? Evert XML 0 07-11-2003 09:27 PM



Advertisments