Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > XML > Split element with another one

Reply
Thread Tools

Split element with another one

 
 
Eivind
Guest
Posts: n/a
 
      04-29-2005
Hi,

I'm creating XML-files from printed documents. According to the DTD I
have to use, there has to be pagebreaks in the XML-file. These
pagebrakes must be located whenever a new page in the printed version
occurs. This is fairly simple to accomplish.
The problem is however, the DTD states that the pagebreak cannot occur
inside paragraph-element, but must be in between them.
Is it possible, using XSLT, to end the paragraph-element before the
pagebreak, and start a new one after it?

To illustrastrate:

Illegal text block:
<para>Blah blah
<pagebreak/>
more blah blah</para>

Must become:
<para>Blah blah</para>
<pagebreak/>
<para>more blah blah</para>

I'm grateful for any help!

regards,
Eivind Andersen

 
Reply With Quote
 
 
 
 
David Carlisle
Guest
Posts: n/a
 
      04-29-2005
"Eivind" <(E-Mail Removed)> writes:

> Hi,
>
> I'm creating XML-files from printed documents. According to the DTD I
> have to use, there has to be pagebreaks in the XML-file. These
> pagebrakes must be located whenever a new page in the printed version
> occurs. This is fairly simple to accomplish.
> The problem is however, the DTD states that the pagebreak cannot occur
> inside paragraph-element, but must be in between them.
> Is it possible, using XSLT, to end the paragraph-element before the
> pagebreak, and start a new one after it?
>
> To illustrastrate:
>
> Illegal text block:
> <para>Blah blah
> <pagebreak/>
> more blah blah</para>
>
> Must become:
> <para>Blah blah</para>
> <pagebreak/>
> <para>more blah blah</para>
>
> I'm grateful for any help!
>
> regards,
> Eivind Andersen


XSLT can do essentially arbitrary tree transformations so the answer is
yes, but in this case the transformation may be more or less hard
depending where pagebreak can be. Do you know that it's at the top level
of para (this makes it fairly easy or can it be nested aywhere

<para>Blah blah <italic> xxx <bold> zzz</bold>
<pagebreak/> yyy</italic>
more blah blah</para>

In the latter case things are "interesting" as you have to close an
arbitrary number of elements, and things get more interesting if
the pagebreak appears in table markup and you have to correcly close all
teh elemenst and open up everything needed for a new table...

Assuming the simple case this is a grouping problem you just want to
group all children of para depending on their position related to
pagebreak, searching for xslt grouping on google will show lots of
possibilities

eg

<xsl:template match="para">
<xsl:copy-of select="."/>
</xsl:template>

<xsl:template match="para[pagebreak]">
<para>
<xsl:copy-of select="@*|pagebreak[1]/preceding-sibling::node()"/>
</para>
<xsl:for-each select="pagebreak">
<xsl:copy-of select="."/>
<para>
<xsl:copy-of select="../@*"><!-- re-copy attributes, you might not want that-->
<xsl:apply-templates select="following-sibling::node()[1] mode="p"/>
</para>
</xsl:for-each>
</xsl:template>

<xsl:template match="node()" mode="p">
<xsl:copy-of select="."/>
<xsl:apply-templates select="following-sibling::node()[1] mode="p"/>
</xsl:template>

<xsl:template match="pagebreak" mode="p"/>

David
 
Reply With Quote
 
 
 
 
Eivind
Guest
Posts: n/a
 
      04-29-2005
Wow! Thank you!

Fortunately the pagebreaks wont occur inside a table, but it's possible
to have one inside an <italic> or <bold> element.

I havent been able to test this code yet, but I get on it first thing
monday morning, and I'll report back a littel bit later .

Again, thank you for ble incredlble quick and helpful reply!

Eivind

 
Reply With Quote
 
David Carlisle
Guest
Posts: n/a
 
      04-29-2005

> Fortunately the pagebreaks wont occur inside a table, but it's possible
> o have one inside an <italic> or <bold> element.


It's really a lot harder if that can happen.
The general case where you have to close an arbitrary number of elements
would need a completely different approach essentially walking over
the whole tree one node at a a time building up a data structure of
currently open elements as you go along. Ie implementing a parser in
xslt. This is certainly possible but probably not a lot of fun (it would
be a bit more fun in xslt2 than xslt1) But if you can tie down a secific
list of bad things that can happen, in practice most cases can be done
fairly easily in xslt, usually, on a good day...

David
 
Reply With Quote
 
Eivind
Guest
Posts: n/a
 
      05-04-2005
Hi,

I've tried using the xsl templates you provided, and they seem to work
quite good. However, the templates inserts some new attributes to the
para and pagebreak elements:

<pagebreak xmlnslink="http://www.w3.org/1999/xlink"
xmlns:mml="http://www.w3.org/1998/Math/MathML">116</pagebreak>

How can you remove these? (I must admit I don't entirely undestand
what's going on in the templates you gave me, so I don't see where the
new attributes are inserted, and how to remove them)

Eivind

 
Reply With Quote
 
David Carlisle
Guest
Posts: n/a
 
      05-04-2005

I've tried using the xsl templates you provided, and they seem to work
quite good. However, the templates inserts some new attributes to the
para and pagebreak elements:


<pagebreak xmlnslink="http://www.w3.org/1999/xlink"
xmlns:mml="http://www.w3.org/1998/Math/MathML">116</pagebreak>


These namespace declarations do not come from the templates I provided in
this thread, they must be declared either elsewhere in your stylesheet
or in your source file. How to get rid of them depends on where they
came from.

they may have come from me originally, I quite often use mml as the
mathml namespace prefix, but mathml hasn't been mentioned so far in this
thread has it?

David
 
Reply With Quote
 
Eivind
Guest
Posts: n/a
 
      05-04-2005
It seems they come from the root-element of the source file.
(I tried to delete them from the source file, and then run the xslt
again. Result: no namespace declarations throughout the resulting
xml-file)

Thank you for all your help!

Eivind

 
Reply With Quote
 
David Carlisle
Guest
Posts: n/a
 
      05-04-2005

In general of course removing namespace declarations from the input will
break the the input. If your document has any mathml in it then you
can't remove the mathml declaration.

To avoid copying, just don't use copy-of,

so I think i originally said something like:


<xsl:for-each select="pagebreak">
<xsl:copy-of select="."/>

doing


<xsl:for-each select="pagebreak">
<pagebreak/>

would generate a new pagebreak element rather than copying one from the
source so wouldn't copy any namespace nodes from the source.
(but would use any in scope namespaces from the stylesheet)





<xsl:for-each select="pagebreak">
<xsl:element name="pagebreak"/>


is similar but wouldn't use any namespaces from the stylesheet either
(other than the default namepsace, if that has been declared)

David
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Move value from one form element to another, hidden element viaJavaScript OccasionalFlyer Javascript 6 07-29-2009 03:33 AM
how to Update/insert an xml element's text----> (<element>text</element>) HANM XML 2 01-29-2008 03:31 PM
split on '' (and another for split -1) trans. (T. Onoma) Ruby 10 12-28-2004 06:36 AM
In Schema, how to say "If one element exist, another element must exist"? Y.S. XML 3 09-17-2003 02:51 PM
Passing value from one script on one page to another script on another page. Robert Cohen ASP General 3 07-15-2003 01:46 PM



Advertisments