"James A. Robinson" <> wrote
> What do you do
> when you need to take your unicode rich document and transform it to a
> format which which needs something other numeric character references?
I'm having trouble parsing that question.
> For example, what would I do if I wanted to output an XML document to a
> format where I needed to replace the greek delta character with a very
> specific string
As you probably discovered, XPath 1.0 / XSLT 1.0 only has one character
substitution function, translate(), which can only replace one character
with one character throughout a string. However, there are some substring
functions which can be applied in a recursive template to replace a
character with a string. See the XSLT FAQ at
http://www.dpawson.co.uk/.
Other options: Dimitre's FXSL library has a string map template that is
probably more efficient -- see
http://sources.redhat.com/ml/xsl-lis.../msg01172.html. And some
processors may support EXSLT's str:replace() as documented at
http://exslt.org/str/functions/replace/index.html (the page says none do
natively, but I know of at least one that does, and there are some templates
available for download that simulate native support).
> (say to something in TeX, or just 'delta' if outputting
> to a plain text us-ascii encoded file)? I think I understand how core
> xml parsing (input) handles unicode (be it native encoding or NCR), but
> I don't understand what's supposed to happen when transforming the XML
> to an output which does NOT support unicode or SGML style character
> entities.
When XML is parsed, the bytes of the encoded document, the NCRs, the entity
refs, and all other lexical hoo-hah goes away and you're left with a
structured arrangement of Unicode strings representing the essential parts
of the information in the document (elements, attributes, character data,
etc.). XSLT does its business on this information, constructing a new tree.
This new tree is (optionally) output somehow. Processors all support
serialization in "text", "xml", or "html" formats, each of which spits out
bytes in some encoding, depending on what you asked for in the xsl

utput
instruction. The text output method just emits the character data from text
nodes, no others. Unencodable characters (e.g., you wanted to output a
Chinese character in ASCII) are typically replaced with '?', omitted
entirely, or an error is raised, depending on the implementation (the spec
doesn't mandate what should be done). XML and HTML methods emit all nodes
according to the syntax rules of XML or HTML, so for example an element node
gets a start tag and end tag, or empty element tag if it has no content. In
these output methods, unencodable characters occurring in character data or
attribute values are replaced with an entity reference or NCR, as would be
most appropriate.
> I am aware of the possibility of using an XSLT function which does a
> substring search and replace for characters, but surely there is a
> better option? I'm also aware I can do a final stage processing with a
> tool of my own design (something which walks through and replaces
> unicode characters with something else from a mapping). But is there a
> core XSLT or other XML technology which I am missing that is intended
> to solve this type of problem? Which lets me say that, if I'm
> outputting a US-ASCII encoded document that instead of Δ I want
> to dump out 'Delta' or '$\Delta$' or '<IMG SRC="/images/Delta.gif">'?
The general idea of the recursive template I mentioned above is demonstrated
here for substring-to-substring replacement:
http://skew.org/xml/stylesheets/replace/
and here for substring-to-element replacement:
http://skew.org/xml/stylesheets/linefeed2br/
They should get you going in the right direction.