Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > XML > Inlining or de-xmlifying xml using XSLT

Reply
Thread Tools

Inlining or de-xmlifying xml using XSLT

 
 
Simon Brooke
Guest
Posts: n/a
 
      01-17-2011
I'm trying to write an XSL template to generate Google's data input
format for blogger, which is documented here: http://goo.gl/tydYA

As you can see, it has the particularly delightful property that the
content of the 'content' element actually /is/ markup, but has been
inlined or entified so that it appears as raw text. Yes, I know this is
bizarre and ugly, but I don't control it - Google do. And as there's no
documentation on how to import normally-well-formed XML, this is what I
need to generate.

I've been trying to write a template which does this awful mangling,
and what I've come up with seems to work:

<xsl:template name="mangle-xml">
<xslaram name="content"/>
<xsl:choose>
<xsl:when test="$content/*">
&lt;<xsl:value-of select="local-name()"/>
<xsl:for-each select="$content/@*">
<xsl:value-of select="concat(' ', local-name(),
'=&quot;', ., '&quot;')"/>
</xsl:for-each>&gt;
<xsl:for-each select="$content/node()">
<xsl:call-template name="mangle-xml">
<xsl:with-param name="content" select="."/>
</xsl:call-template>
</xsl:for-each>
&lt;/<xsl:value-of select="local-name()"/>&gt;
</xsl:when>
<xsltherwise>?text?
<xsl:value-of select="$content"/>
</xsltherwise>
</xsl:choose>
</xsl:template>

This mangles the XML correctly(!), but it's pretty ugly, dark and
mysterious. Is there a cleaner way of doing this?

--
http://www.journeyman.cc/~simon/ :: PGP public key on home page

;; USER ERROR: replace user and press any key to continue


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAk00WPoACgkQPj28Ek2lI8X4rwCgsA+l1yGGMV 2pXho124NuDzBP
0yMAn23M0QVt2FDVn2EMGcSY179rwyqW
=0j07
-----END PGP SIGNATURE-----

 
Reply With Quote
 
 
 
 
Mayeul
Guest
Posts: n/a
 
      01-17-2011
On 17/01/2011 15:58, Simon Brooke wrote:
> I'm trying to write an XSL template to generate Google's data input
> format for blogger, which is documented here: http://goo.gl/tydYA
>
> As you can see, it has the particularly delightful property that the
> content of the 'content' element actually /is/ markup, but has been
> inlined or entified so that it appears as raw text. Yes, I know this is
> bizarre and ugly, but I don't control it - Google do. And as there's no
> documentation on how to import normally-well-formed XML, this is what I
> need to generate.
>
> I've been trying to write a template which does this awful mangling,
> and what I've come up with seems to work:
>
> <xsl:template name="mangle-xml">
> <xslaram name="content"/>
> <xsl:choose>
> <xsl:when test="$content/*">
> &lt;<xsl:value-of select="local-name()"/>
> <xsl:for-each select="$content/@*">
> <xsl:value-of select="concat(' ', local-name(),
> '=&quot;', ., '&quot;')"/>
> </xsl:for-each>&gt;
> <xsl:for-each select="$content/node()">
> <xsl:call-template name="mangle-xml">
> <xsl:with-param name="content" select="."/>
> </xsl:call-template>
> </xsl:for-each>
> &lt;/<xsl:value-of select="local-name()"/>&gt;
> </xsl:when>
> <xsltherwise>?text?
> <xsl:value-of select="$content"/>
> </xsltherwise>
> </xsl:choose>
> </xsl:template>
>
> This mangles the XML correctly(!), but it's pretty ugly, dark and
> mysterious. Is there a cleaner way of doing this?
>


Ugh. Personally I'd renounce making it in one integrated XSL
transformation. I'd make one for the <content>, another one for the
whole document, and feed the result from the first as a string parameter
to the second.

In idea, this is probably what was meant.

--
Mayeul
 
Reply With Quote
 
 
 
 
Martin Honnen
Guest
Posts: n/a
 
      01-17-2011
Simon Brooke wrote:
> I'm trying to write an XSL template to generate Google's data input
> format for blogger, which is documented here: http://goo.gl/tydYA
>
> As you can see, it has the particularly delightful property that the
> content of the 'content' element actually /is/ markup, but has been
> inlined or entified so that it appears as raw text. Yes, I know this is
> bizarre and ugly, but I don't control it - Google do. And as there's no
> documentation on how to import normally-well-formed XML, this is what I
> need to generate.
>
> I've been trying to write a template which does this awful mangling,
> and what I've come up with seems to work:
>
> <xsl:template name="mangle-xml">
> <xslaram name="content"/>
> <xsl:choose>
> <xsl:when test="$content/*">
> &lt;<xsl:value-of select="local-name()"/>
> <xsl:for-each select="$content/@*">
> <xsl:value-of select="concat(' ', local-name(),
> '=&quot;', ., '&quot;')"/>
> </xsl:for-each>&gt;
> <xsl:for-each select="$content/node()">
> <xsl:call-template name="mangle-xml">
> <xsl:with-param name="content" select="."/>
> </xsl:call-template>
> </xsl:for-each>
> &lt;/<xsl:value-of select="local-name()"/>&gt;
> </xsl:when>
> <xsltherwise>?text?
> <xsl:value-of select="$content"/>
> </xsltherwise>
> </xsl:choose>
> </xsl:template>
>
> This mangles the XML correctly(!), but it's pretty ugly, dark and
> mysterious. Is there a cleaner way of doing this?


Some XSLT processors supply extension functions to serialize nodes as
XML, for instance Saxon has
http://www.saxonica.com/documentatio.../serialize.xml. I
would use that if available. If it needs to be done in XSLT itself then
I would probably do it with templates in a particular mode, not with a
single named template. And sophisticated approaches like
http://lenzconsulting.com/xml-to-str...-to-string.xsl to deal with
more complex problems like namespaces exist. But I am not sure you need
that, the markup is probably escaped as it is supposed to be some
text/html tag soup and not clean XML.

--

Martin Honnen
http://msmvps.com/blogs/martin_honnen/
 
Reply With Quote
 
Simon Brooke
Guest
Posts: n/a
 
      01-17-2011
On Mon, 17 Jan 2011 18:00:14 +0100
Martin Honnen <(E-Mail Removed)> wrote:

> But I am not sure you need
> that, the markup is probably escaped as it is supposed to be some
> text/html tag soup and not clean XML.


Yes, probably the only reason they adopted this horrible kluge in the
first place was to deal with tag soup.

[fx: a look of ineffable disgust and derision plays across his features]

--
http://www.journeyman.cc/~simon/ :: PGP public key on home page

;; USER ERROR: replace user and press any key to continue


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAk00lbQACgkQPj28Ek2lI8V2pwCePqu7t/RpOf2R928SCWvOUXdf
yDYAmQExmFeJIRxccn9B3YbRty0rqrNs
=OboJ
-----END PGP SIGNATURE-----

 
Reply With Quote
 
Joe Kesselman
Guest
Posts: n/a
 
      01-19-2011
On 1/17/2011 2:17 PM, Simon Brooke wrote:
> Yes, probably the only reason they adopted this horrible kluge in the
> first place was to deal with tag soup.


.... It would accomplish that, I suppose. I've also seen this sort of
thing done simply because people didn't understand that a single tree --
especially if namespaced -- is actually _easier_ to handle than
reparsing the content.

Second the suggestion that this be generated via a Mode. Much of it
could be handled by rewriting the Identity Transform to output text
rather than XML. The really hideous thing is handling the case of nexted
<[![CDATA]]> sections; detecting and handling those requires string
manipulation on the contained text, which is not XSLT's strongest point.


--
Joe Kesselman,
http://www.love-song-productions.com...lam/index.html

{} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
/\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."
 
Reply With Quote
 
Peter Flynn
Guest
Posts: n/a
 
      01-20-2011
On 17/01/11 19:17, Simon Brooke wrote:
> On Mon, 17 Jan 2011 18:00:14 +0100
> Martin Honnen<(E-Mail Removed)> wrote:
>
>> But I am not sure you need
>> that, the markup is probably escaped as it is supposed to be some
>> text/html tag soup and not clean XML.

>
> Yes, probably the only reason they adopted this horrible kluge in the
> first place was to deal with tag soup.
>
> [fx: a look of ineffable disgust and derision plays across his features]


Cheer up, the recent announcements about HTML5 will only make things
worse

///Peter
--
XML FAQ: http://xml.silmaril.ie/
 
Reply With Quote
 
Joe Kesselman
Guest
Posts: n/a
 
      01-21-2011
> Cheer up, the recent announcements about HTML5 will only make things
> worse


Yeah, it seems the term "HTML5" has been hijacked from its original
intent, which was to bring HTML back to being well-formed, make it
XML-based rather than SGML-based, and make it extendable via namespaces.
Sigh. That's the web for ya -- never mind what would be useful and
robust, just let me make it pretty.

--
Joe Kesselman,
http://www.love-song-productions.com...lam/index.html

{} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
/\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."
 
Reply With Quote
 
Peter Flynn
Guest
Posts: n/a
 
      01-21-2011
On 21/01/11 01:56, Joe Kesselman wrote:
>> Cheer up, the recent announcements about HTML5 will only make things
>> worse

>
> Yeah, it seems the term "HTML5" has been hijacked from its original
> intent, which was to bring HTML back to being well-formed, make it
> XML-based rather than SGML-based, and make it extendable via namespaces.
> Sigh. That's the web for ya -- never mind what would be useful and
> robust, just let me make it pretty.


http://blog.whatwg.org/html-is-the-new-html5#comments

///Peter
--
XML FAQ: http://xml.silmaril.ie/
 
Reply With Quote
 
Joe Kesselman
Guest
Posts: n/a
 
      01-25-2011
On 1/24/2011 6:53 PM, William F Hammond wrote:
> For example,<p> ...<a href="..."> ...</p><p> ...</p><p>...</a>...</p>


Ill-formed in either XML or SGML, right?


--
Joe Kesselman,
http://www.love-song-productions.com...lam/index.html

{} ASCII Ribbon Campaign | "may'ron DaroQbe'chugh vaj bIrIQbej" --
/\ Stamp out HTML mail! | "Put down the squeezebox & nobody gets hurt."
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Problem to insert an XML-element by XSLT-converting from one XML-file into another XML-file jkflens XML 2 05-30-2006 09:41 AM
Including XSLT/XML document within a XSLT document dar_imiro@hotmail.com XML 4 12-13-2005 02:26 AM
ANN: New low-cost XML Editor, XSLT Editor, XSLT Debugger, DTD/Schema Editor Stylus Studio Java 0 08-03-2004 03:53 PM
Inlining Images Using Rublog Lipper, Matthew Ruby 1 07-27-2004 05:46 PM
Using One XSLT and multiple XML Problem (One is XML and another one is XBRL) loveNUNO XML 2 11-20-2003 06:47 AM



Advertisments