![]() |
|
|
|||||||
![]() |
XML - Re: Emitting to HTML Output |
|
|
Thread Tools | Search this Thread |
|
|
#1 |
|
In article <bd71va$1fc4$>, says... > You may notice that people are reluctant to give you a straight > answer. This is because it's not really in the spirit of XSLT to do > such a thing. XSLT stylesheets transform XML documents as trees, not > as text. Messing with the text output can produce documents that are > well-formed or (as in this case) refer to entities that may not be > defined. > > But if you really want to do it, > > <xsl:text disable-output-escaping="yes">&nbsp;</xsl:text> This information was helpful to me. I've been trying to include a © character in my output but when I include "©" in my XSL style sheet I get errors about the undefined entity. Based on your posting I did <xsl:text disable-output-escaping="yes">&copy;</xsl:text> and with Xalan it works just fine. Thanks! Interestingly with Mozilla v1.3 it does not work. I get "©" displayed. So apparently Mozilla v1.3 does not know what to do with the "disable-output-escaping" attribute. I haven't tried it with IE yet, but I may do so later. Peter Peter C. Chapin |
|
|
|
|
#2 |
|
Posts: n/a
|
"Peter C. Chapin" <pchapin-> schrieb im Newsbeitrag
news:... > > In article <bd71va$1fc4$>, > says... > > > You may notice that people are reluctant to give you a straight > > answer. This is because it's not really in the spirit of XSLT to do > > such a thing. XSLT stylesheets transform XML documents as trees, not > > as text. Messing with the text output can produce documents that are > > well-formed or (as in this case) refer to entities that may not be > > defined. > > > > But if you really want to do it, > > > > <xsl:text disable-output-escaping="yes">&nbsp;</xsl:text> > > This information was helpful to me. I've been trying to include a © Why? Just use "©" > character in my output but when I include "©" in my XSL style sheet > I get errors about the undefined entity. Based on your posting I did > > <xsl:text disable-output-escaping="yes">&copy;</xsl:text> > > and with Xalan it works just fine. Thanks! > > Interestingly with Mozilla v1.3 it does not work. I get "©" > displayed. So apparently Mozilla v1.3 does not know what to do with the > "disable-output-escaping" attribute. I haven't tried it with IE yet, but > I may do so later. d-o-e is an *optional* XSLT feature. Some engines do not support it at all (Mozilla/transformix). Others only support it in specific cases. Do not rely on int. |
|
|
|
#3 |
|
Posts: n/a
|
"Peter C. Chapin" <pchapin-> schrieb im Newsbeitrag
news:... > In article <bde4vg$sbp68$>, > says... > > > > This information was helpful to me. I've been trying to include a © > > > > Why? Just use "©" > > That's less readible. Also, since © is more abstract, if a new > symbol was widely adopted for copyright the meaning of © could be > changed accordingly in the specification and my documents would > automatically be upgraded. I admit that's probably not too likely to be Alas, this won't happen, because these code mappings are standardized. > an issue in this case. However, it seems a shame for HTML to have all > sorts of nice character entities and yet not be able to use them in an > XSLT style sheet without redefining them all. This seems like a > deficiency of XSLT to me. > > > d-o-e is an *optional* XSLT feature. > > Good to know. Thanks. > > Peter > |
|
|
|
#4 |
|
Posts: n/a
|
Peter C. Chapin wrote:
> In article <bde4vg$sbp68$>, > says... > > >>>This information was helpful to me. I've been trying to include a © >> >>Why? Just use "©" > > > That's less readible. If you want to _use_ © in XSLT, you have to define it in an internal DTD subset, as someone has already posted in this thread. -- Johannes Koch In te domine speravi; non confundar in aeternum. (Te Deum, 4th cent.) |
|
|
|
#5 |
|
Posts: n/a
|
In article <bdencq$rn0r8$>,
says... > > > Why? Just use "©" > > > > That's less readible. Also, since © is more abstract, if a new > > symbol was widely adopted for copyright the meaning of © could be > > changed accordingly in the specification and my documents would > > automatically be upgraded. I admit that's probably not too likely to be > > Alas, this won't happen, because these code mappings are standardized. Well, suppose the publishing industry decided that, for whatever reason, they wanted to use a different symbol for copyright. Imagine a symbol resembling "-c-" instead of "(c)". Precisely because the current code mappings are standardized it probably wouldn't be a great idea to change the "normal" appearance of the character that currently looks like "(c)". Thus one might be tempted to introduce a new character with a different mapping for the new symbol (wasn't something like this done for the Euro?). One imagines that in this case the HTML specification would be eventually revised so that the entity © would refer to the new symbol. However a reference to © would not follow the new convention (it would still be the old symbol, of course) and documents using it would need to be edited. I'm not saying that this is a likely scenerio. In fact, I'd say it's pretty darn unlikely in this case. However, the point is this: named entities offer a layer of abstraction over the numeric characters they represent. Such a layer allows, in general, for more robust documents in the face of changing standards. It's exactly the same issue as occurs in, for example, C programming: #define MAX_BUFFER_SIZE 1024 // Might want to change this later. ... if (index >= MAX_BUFFER_SIZE) error(); Using 1024 in the body of the program is not recommended because if a change to that value is made the program must be (in general) manually updated. If many programs depend on this parameter the work involved could be considerable. Defining the entities in the document, as has been suggested, doesn't really address this matter. If in my XSLT stylesheet I define © to be &#a9; then the character U+00A9 will be inserted into my HTML. However, if that character stops being appropriate I'll have modify my stylesheet... exactly as if I had used &#a9; directly (I can see that the modification would be somewhat easier to make, however). It seems to me like the "right" solution would be for XSLT to pass undefined entities directly to the target document literally somehow. I shouldn't have to know how HTML (or any other target markup) has defined a character entity to use it in a style sheet. The solution involving disable-output-escaping meets the requirements... but if it's optional in XSLT then it isn't as good a solution as it might be. Peter |
|
|
|
#6 |
|
Posts: n/a
|
In article <>,
Peter C. Chapin <pchapin-> wrote: >Interestingly with Mozilla v1.3 it does not work. I get "©" >displayed. So apparently Mozilla v1.3 does not know what to do with the >"disable-output-escaping" attribute. Output escaping - and not escaping - only makes sense if you're outputting the data as XML (or HTML). In a browser, the transformed tree is not output in that sense, but displayed. Or to put it another way, disabling output escaping is a trick that lets you output something which will have a different syntactic significance when read in again; since it never gets read again in a browser, that significance never applies. -- Richard -- Spam filter: to mail me from a .com/.net site, put my surname in the headers. FreeBSD rules! |
|
|
|
#7 |
|
Posts: n/a
|
Peter C. Chapin <pchapin-> writes:
> In article <bde4vg$sbp68$>, > says... > > > > This information was helpful to me. I've been trying to include a © > > > > Why? Just use "©" > > That's less readible. Also, since © is more abstract, if a new > symbol was widely adopted for copyright the meaning of © could be > changed accordingly in the specification and my documents would > automatically be upgraded. I admit that's probably not too likely to be > an issue in this case. However, it seems a shame for HTML to have all > sorts of nice character entities and yet not be able to use them in an > XSLT style sheet without redefining them all. This seems like a > deficiency of XSLT to me. Nonsense. You don't need to *explicitly* redefine them all. Just alter the DTD to include them. My stylesheets typically start with something like: <!DOCTYPE xsl:stylesheet [ <!ENTITY % HTMLlat1 PUBLIC "-//W3C//ENTITIES Latin 1 for XHTML//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent"> %HTMLlat1; <!ENTITY % HTMLsymbol PUBLIC "-//W3C//ENTITIES Symbols for XHTML//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent"> %HTMLsymbol; <!ENTITY % HTMLspecial PUBLIC "-//W3C//ENTITIES Special for XHTML//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent"> %HTMLspecial; ]> This should shut up any problems you have using HTML entities in your XSLT source. Of course, the DTD above won't make your XSLT source valid XML (as opposed to well-formed), but it wasn't anyway. (Making XSLT source to be valid XML is *way* too much work to be worth it). -Micah |
|
|
|
#8 |
|
Posts: n/a
|
"Julian F. Reschke" <> writes:
> "Peter C. Chapin" <pchapin-> schrieb im Newsbeitrag > news:... > > In article <bde4vg$sbp68$>, > > says... > > > > > > This information was helpful to me. I've been trying to include a > © > > > > > > Why? Just use "©" > > > > That's less readible. Also, since © is more abstract, if a new > > symbol was widely adopted for copyright the meaning of © could be > > changed accordingly in the specification and my documents would > > automatically be upgraded. I admit that's probably not too likely to be > > Alas, this won't happen, because these code mappings are standardized. And it shouldn't. If a new symbol were widely adopted with the same semantics, it really ought to be implemented as a font change, rather than getting its own code-point. -Micah |
|
|
|
#9 |
|
Posts: n/a
|
Peter C. Chapin <pchapin-> writes:
> In article <bdencq$rn0r8$>, > says... > > > > > Why? Just use "©" > > > > > > That's less readible. Also, since © is more abstract, if a new > > > symbol was widely adopted for copyright the meaning of © could be > > > changed accordingly in the specification and my documents would > > > automatically be upgraded. I admit that's probably not too likely to be > > > > Alas, this won't happen, because these code mappings are standardized. > > Well, suppose the publishing industry decided that, for whatever reason, > they wanted to use a different symbol for copyright. Imagine a symbol > resembling "-c-" instead of "(c)". There is absolutely no reason why © couldn't be used for that as well: Unicode (in general) does not specify how the glyph should look, it only decrees that U+00A9 corresponds to a character with the semantic of representing "a copyright symbol", whatever that may mean. That symbol can look like anything the font-writer wants, provided it conveys the intended meaning. > Thus one might be tempted to introduce a new character with a different > mapping for the new symbol (wasn't something like this done for the > Euro?). Ah, but the euro is a completely new currency symbol -- a completely new semantic meaning. Or did you mean an alternative glyph? Unicode is supposed to avoid having different code-points for a single semantic meaning, but in practice it was unavoidable that many alternate versions be encoded, in order to support round-trip encoding for as many encodings as possible (i.e., that a document in a non-Unicode encoding could be transliterated into Unicode and back again without change). However, I believe they would avoid it in all cases which do not affect compatibility with other encodings. > I'm not saying that this is a likely scenerio. In fact, I'd say it's > pretty darn unlikely in this case. However, the point is this: named > entities offer a layer of abstraction over the numeric characters they > represent. Such a layer allows, in general, for more robust documents in > the face of changing standards. It's exactly the same issue as occurs in, > for example, C programming: > > #define MAX_BUFFER_SIZE 1024 // Might want to change this later. > > ... > > if (index >= MAX_BUFFER_SIZE) error(); In general, yes; but in the case of the HTML character entities, not really. They serve more as a mnemonic than anything else: I doubt very much that ISO/W3C/Mr. Berners-Lee had any intentions of changing these once released; otherwise, they'd have said so. > It seems to me like the "right" solution would be for XSLT to pass > undefined entities directly to the target document literally > somehow. It can't do this and still produce well-formed XML, which is a reasonable expectation. > shouldn't have to know how HTML (or any other target markup) has defined > a character entity to use it in a style sheet. The solution involving > disable-output-escaping meets the requirements... but if it's optional in > XSLT then it isn't as good a solution as it might be. Include the appropriate external entities in your DTD instead (see my other post). -Micah |
|
|
|
#10 |
|
Posts: n/a
|
In article <>, says... > > Well, suppose the publishing industry decided that, for whatever reason, > > they wanted to use a different symbol for copyright. Imagine a symbol > > resembling "-c-" instead of "(c)". > > There is absolutely no reason why © couldn't be used for that as > well: Unicode (in general) does not specify how the glyph should look, > it only decrees that U+00A9 corresponds to a character with the > semantic of representing "a copyright symbol", whatever that may > mean. I understand that. However, in the event of a character changing its traditional glyph it seems more likely to me that a new code point would be allocated. Some documents might specifically want to continue using the old form of the character for historical or compatibility reasons. Yet other documents would, one assumes, want to use the new version of the character instead. Thus both glyphs would probably have to be available. I don't know if this situation has ever really come up but if it does (or has), I can imagine an argument like this being made by those dealing with the relevant standards. > Ah, but the euro is a completely new currency symbol -- a completely > new semantic meaning. Or did you mean an alternative glyph? My example of the euro was not totally accurate. I was only referring to the idea that new symbols (not necessarly existing things) do get introduced now and then. Thus the idea of a new symbol for copyright coming along isn't at crazy as it might at first seem. > In general, yes; but in the case of the HTML character entities, not > really. They serve more as a mnemonic than anything else: I doubt very > much that ISO/W3C/Mr. Berners-Lee had any intentions of changing these > once released; otherwise, they'd have said so. Perhaps, but what of other markups besides HTML? I could imagine a DTD author defining entities specifically to hide their representations so that later changes to the spec could be made without requiring documents to be edited. It seems like a powerful and useful feature of entities in general and one that the community should endeavor to support. > Include the appropriate external entities in your DTD instead (see my > other post). Yes, this seems like a reasonable solution. I've made a note of your other posting for future reference. Thanks! I still think it's less than ideal to require that the XSLT engine expand all entities before writing them into the output tree (or output document). For example, suppose one uses XSLT to produce a large collection of documents and then later the expansion of an entity changes. I'd have to reprocess the original XML again to make new documents; the documents I produced before won't contain the entity references and thus won't be automatically updated by the change in the entity expansion. In a different post Richard Tobin pointed out that output escaping only makes sense when one is outputing a document and not when one is acting directly on the output tree. However, I dispute that. For example, Xalan must internally mark in the tree somehow which text regions are to be free of escaping when it outputs the final document. (Recall that Xalan seems to implement the disable-output-escaping mechanism). Thus the information about what is and is not escaped needs to be stored in the tree in some sort of implementation defined way. It thus seems reasonable that a program like Mozilla could also store that information and then act on it accordingly if it choose to do so. In particular if I get the text © into the output tree marked in such a way as to indicate that no output escaping should take place, I'd like to think that Mozilla could treat the '&' literally and then notice that '©' is a valid HTML entity reference. Of course it doesn't currently do that (or so it appears) and if disable-output-escaping is optional then it is within its rights to ignore the feature. However, I think it is as least *meaningful* to talk about implementing that feature. I can see that there might be problems with doing what I'm talking about. Right now neither Xalan nor Mozilla need to interpret the character nodes in the output tree. For Mozilla to recognize HTML entities in the tree directly it would have to look for entities in the character nodes of the tree. I'm guessing that doing so would introduce some serious issues, but I'm not really sure. I seem to recall reading someplace that the DOM does not contain entities... so this would be a violation of that policy. Right? I suppose what all this means, in general, is that entities don't really work as well as one might like. Could this is a fundamental problem with using an XML format to control styling? Or is it a limitation with the DOM? Perhaps my real issue is that I'm trying to make a *transformation* standard do things that don't really make sense for it to do. Hmmm. I'm rambling. I'll stop now. Peter |
|