Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > XML > Re: xsl/xslt pipeline: missing core concept?

Reply
Thread Tools

Re: xsl/xslt pipeline: missing core concept?

 
 
Mike Brown
Guest
Posts: n/a
 
      08-03-2003
"James A. Robinson" <(E-Mail Removed)> wrote
> What do you do
> when you need to take your unicode rich document and transform it to a
> format which which needs something other numeric character references?


I'm having trouble parsing that question.

> For example, what would I do if I wanted to output an XML document to a
> format where I needed to replace the greek delta character with a very
> specific string


As you probably discovered, XPath 1.0 / XSLT 1.0 only has one character
substitution function, translate(), which can only replace one character
with one character throughout a string. However, there are some substring
functions which can be applied in a recursive template to replace a
character with a string. See the XSLT FAQ at http://www.dpawson.co.uk/.
Other options: Dimitre's FXSL library has a string map template that is
probably more efficient -- see
http://sources.redhat.com/ml/xsl-lis.../msg01172.html. And some
processors may support EXSLT's str:replace() as documented at
http://exslt.org/str/functions/replace/index.html (the page says none do
natively, but I know of at least one that does, and there are some templates
available for download that simulate native support).

> (say to something in TeX, or just 'delta' if outputting
> to a plain text us-ascii encoded file)? I think I understand how core
> xml parsing (input) handles unicode (be it native encoding or NCR), but
> I don't understand what's supposed to happen when transforming the XML
> to an output which does NOT support unicode or SGML style character
> entities.


When XML is parsed, the bytes of the encoded document, the NCRs, the entity
refs, and all other lexical hoo-hah goes away and you're left with a
structured arrangement of Unicode strings representing the essential parts
of the information in the document (elements, attributes, character data,
etc.). XSLT does its business on this information, constructing a new tree.

This new tree is (optionally) output somehow. Processors all support
serialization in "text", "xml", or "html" formats, each of which spits out
bytes in some encoding, depending on what you asked for in the xslutput
instruction. The text output method just emits the character data from text
nodes, no others. Unencodable characters (e.g., you wanted to output a
Chinese character in ASCII) are typically replaced with '?', omitted
entirely, or an error is raised, depending on the implementation (the spec
doesn't mandate what should be done). XML and HTML methods emit all nodes
according to the syntax rules of XML or HTML, so for example an element node
gets a start tag and end tag, or empty element tag if it has no content. In
these output methods, unencodable characters occurring in character data or
attribute values are replaced with an entity reference or NCR, as would be
most appropriate.

> I am aware of the possibility of using an XSLT function which does a
> substring search and replace for characters, but surely there is a
> better option? I'm also aware I can do a final stage processing with a
> tool of my own design (something which walks through and replaces
> unicode characters with something else from a mapping). But is there a
> core XSLT or other XML technology which I am missing that is intended
> to solve this type of problem? Which lets me say that, if I'm
> outputting a US-ASCII encoded document that instead of &#x00394; I want
> to dump out 'Delta' or '$\Delta$' or '<IMG SRC="/images/Delta.gif">'?


The general idea of the recursive template I mentioned above is demonstrated
here for substring-to-substring replacement:
http://skew.org/xml/stylesheets/replace/

and here for substring-to-element replacement:
http://skew.org/xml/stylesheets/linefeed2br/

They should get you going in the right direction.


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
From single core to dual core =?Utf-8?B?Q2FybG9z?= Windows 64bit 26 08-06-2006 09:08 PM
Core Solo & Core Duo are not Core microarchitecture; 65nm Pentium M chips bigal Hardware 0 03-22-2006 11:24 AM
Dual Core Vs Single Core Processor Real World Performance Difference Edge Computer Information 3 03-15-2006 01:30 AM
posible: dual core + single core =?Utf-8?B?TmllbHMgQ2hyLg==?= Windows 64bit 7 11-22-2005 06:11 PM
Fedora Core 3 & Core 4 Password questions Brandon Computer Security 4 08-15-2005 04:30 AM



Advertisments