Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   XML (http://www.velocityreviews.com/forums/f32-xml.html)
-   -   Re: Emitting   to HTML Output (http://www.velocityreviews.com/forums/t164968-re-emitting-to-html-output.html)

Peter C. Chapin 06-26-2003 03:21 AM

Re: Emitting   to HTML Output
 

In article <bd71va$1fc4$1@pc-news.cogsci.ed.ac.uk>,
richard@cogsci.ed.ac.uk says...

> You may notice that people are reluctant to give you a straight
> answer. This is because it's not really in the spirit of XSLT to do
> such a thing. XSLT stylesheets transform XML documents as trees, not
> as text. Messing with the text output can produce documents that are
> well-formed or (as in this case) refer to entities that may not be
> defined.
>
> But if you really want to do it,
>
> <xsl:text disable-output-escaping="yes">&amp;nbsp;</xsl:text>


This information was helpful to me. I've been trying to include a &copy;
character in my output but when I include "&copy;" in my XSL style sheet
I get errors about the undefined entity. Based on your posting I did

<xsl:text disable-output-escaping="yes">&amp;copy;</xsl:text>

and with Xalan it works just fine. Thanks!

Interestingly with Mozilla v1.3 it does not work. I get "&copy;"
displayed. So apparently Mozilla v1.3 does not know what to do with the
"disable-output-escaping" attribute. I haven't tried it with IE yet, but
I may do so later.

Peter


Julian F. Reschke 06-26-2003 06:48 AM

Re: Emitting &nbsp; to HTML Output
 
"Peter C. Chapin" <pchapin-news1@ecet.vtc.edu> schrieb im Newsbeitrag
news:MPG.1963705e6e156d10989697@news.sover.net...
>
> In article <bd71va$1fc4$1@pc-news.cogsci.ed.ac.uk>,
> richard@cogsci.ed.ac.uk says...
>
> > You may notice that people are reluctant to give you a straight
> > answer. This is because it's not really in the spirit of XSLT to do
> > such a thing. XSLT stylesheets transform XML documents as trees, not
> > as text. Messing with the text output can produce documents that are
> > well-formed or (as in this case) refer to entities that may not be
> > defined.
> >
> > But if you really want to do it,
> >
> > <xsl:text disable-output-escaping="yes">&amp;nbsp;</xsl:text>

>
> This information was helpful to me. I've been trying to include a &copy;


Why? Just use "&#xa9;"

> character in my output but when I include "&copy;" in my XSL style sheet
> I get errors about the undefined entity. Based on your posting I did
>
> <xsl:text disable-output-escaping="yes">&amp;copy;</xsl:text>
>
> and with Xalan it works just fine. Thanks!
>
> Interestingly with Mozilla v1.3 it does not work. I get "&copy;"
> displayed. So apparently Mozilla v1.3 does not know what to do with the
> "disable-output-escaping" attribute. I haven't tried it with IE yet, but
> I may do so later.


d-o-e is an *optional* XSLT feature. Some engines do not support it at all
(Mozilla/transformix). Others only support it in specific cases. Do not rely
on int.



Julian F. Reschke 06-26-2003 12:02 PM

Re: Emitting &nbsp; to HTML Output
 
"Peter C. Chapin" <pchapin-news1@ecet.vtc.edu> schrieb im Newsbeitrag
news:MPG.1964a46bc299a0cb989698@news.sover.net...
> In article <bde4vg$sbp68$1@ID-98527.news.dfncis.de>, reschke@muenster.de
> says...
>
> > > This information was helpful to me. I've been trying to include a

&copy;
> >
> > Why? Just use "&#xa9;"

>
> That's less readible. Also, since &copy; is more abstract, if a new
> symbol was widely adopted for copyright the meaning of &copy; could be
> changed accordingly in the specification and my documents would
> automatically be upgraded. I admit that's probably not too likely to be


Alas, this won't happen, because these code mappings are standardized.

> an issue in this case. However, it seems a shame for HTML to have all
> sorts of nice character entities and yet not be able to use them in an
> XSLT style sheet without redefining them all. This seems like a
> deficiency of XSLT to me.
>
> > d-o-e is an *optional* XSLT feature.

>
> Good to know. Thanks.
>
> Peter
>




Johannes Koch 06-26-2003 01:40 PM

Re: Emitting &nbsp; to HTML Output
 
Peter C. Chapin wrote:
> In article <bde4vg$sbp68$1@ID-98527.news.dfncis.de>, reschke@muenster.de
> says...
>
>
>>>This information was helpful to me. I've been trying to include a &copy;

>>
>>Why? Just use "&#xa9;"

>
>
> That's less readible.


If you want to _use_ &copy; in XSLT, you have to define it in an
internal DTD subset, as someone has already posted in this thread.
--
Johannes Koch
In te domine speravi; non confundar in aeternum.
(Te Deum, 4th cent.)


Peter C. Chapin 06-26-2003 08:31 PM

Re: Emitting &nbsp; to HTML Output
 
In article <bdencq$rn0r8$1@ID-98527.news.dfncis.de>, reschke@muenster.de
says...

> > > Why? Just use "&#xa9;"

> >
> > That's less readible. Also, since &copy; is more abstract, if a new
> > symbol was widely adopted for copyright the meaning of &copy; could be
> > changed accordingly in the specification and my documents would
> > automatically be upgraded. I admit that's probably not too likely to be

>
> Alas, this won't happen, because these code mappings are standardized.


Well, suppose the publishing industry decided that, for whatever reason,
they wanted to use a different symbol for copyright. Imagine a symbol
resembling "-c-" instead of "(c)". Precisely because the current code
mappings are standardized it probably wouldn't be a great idea to change
the "normal" appearance of the character that currently looks like "(c)".
Thus one might be tempted to introduce a new character with a different
mapping for the new symbol (wasn't something like this done for the
Euro?). One imagines that in this case the HTML specification would be
eventually revised so that the entity &copy; would refer to the new
symbol. However a reference to &#xa9; would not follow the new convention
(it would still be the old symbol, of course) and documents using it
would need to be edited.

I'm not saying that this is a likely scenerio. In fact, I'd say it's
pretty darn unlikely in this case. However, the point is this: named
entities offer a layer of abstraction over the numeric characters they
represent. Such a layer allows, in general, for more robust documents in
the face of changing standards. It's exactly the same issue as occurs in,
for example, C programming:

#define MAX_BUFFER_SIZE 1024 // Might want to change this later.

...

if (index >= MAX_BUFFER_SIZE) error();

Using 1024 in the body of the program is not recommended because if a
change to that value is made the program must be (in general) manually
updated. If many programs depend on this parameter the work involved
could be considerable.

Defining the entities in the document, as has been suggested, doesn't
really address this matter. If in my XSLT stylesheet I define &copy; to
be &#a9; then the character U+00A9 will be inserted into my HTML.
However, if that character stops being appropriate I'll have modify my
stylesheet... exactly as if I had used &#a9; directly (I can see that the
modification would be somewhat easier to make, however).

It seems to me like the "right" solution would be for XSLT to pass
undefined entities directly to the target document literally somehow. I
shouldn't have to know how HTML (or any other target markup) has defined
a character entity to use it in a style sheet. The solution involving
disable-output-escaping meets the requirements... but if it's optional in
XSLT then it isn't as good a solution as it might be.

Peter


Richard Tobin 06-26-2003 10:25 PM

Re: Emitting &nbsp; to HTML Output
 
In article <MPG.1963705e6e156d10989697@news.sover.net>,
Peter C. Chapin <pchapin-news1@ecet.vtc.edu> wrote:

>Interestingly with Mozilla v1.3 it does not work. I get "&copy;"
>displayed. So apparently Mozilla v1.3 does not know what to do with the
>"disable-output-escaping" attribute.


Output escaping - and not escaping - only makes sense if you're
outputting the data as XML (or HTML). In a browser, the transformed
tree is not output in that sense, but displayed.

Or to put it another way, disabling output escaping is a trick that
lets you output something which will have a different syntactic
significance when read in again; since it never gets read again
in a browser, that significance never applies.

-- Richard
--
Spam filter: to mail me from a .com/.net site, put my surname in the headers.

FreeBSD rules!

Micah Cowan 06-27-2003 09:09 AM

Re: Emitting &nbsp; to HTML Output
 
Peter C. Chapin <pchapin-news1@ecet.vtc.edu> writes:

> In article <bde4vg$sbp68$1@ID-98527.news.dfncis.de>, reschke@muenster.de
> says...
>
> > > This information was helpful to me. I've been trying to include a &copy;

> >
> > Why? Just use "&#xa9;"

>
> That's less readible. Also, since &copy; is more abstract, if a new
> symbol was widely adopted for copyright the meaning of &copy; could be
> changed accordingly in the specification and my documents would
> automatically be upgraded. I admit that's probably not too likely to be
> an issue in this case. However, it seems a shame for HTML to have all
> sorts of nice character entities and yet not be able to use them in an
> XSLT style sheet without redefining them all. This seems like a
> deficiency of XSLT to me.


Nonsense. You don't need to *explicitly* redefine them all. Just alter
the DTD to include them. My stylesheets typically start with something
like:

<!DOCTYPE xsl:stylesheet [
<!ENTITY % HTMLlat1 PUBLIC
"-//W3C//ENTITIES Latin 1 for XHTML//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent">
%HTMLlat1;

<!ENTITY % HTMLsymbol PUBLIC
"-//W3C//ENTITIES Symbols for XHTML//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent">
%HTMLsymbol;

<!ENTITY % HTMLspecial PUBLIC
"-//W3C//ENTITIES Special for XHTML//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent">
%HTMLspecial;
]>

This should shut up any problems you have using HTML entities in your
XSLT source. Of course, the DTD above won't make your XSLT source
valid XML (as opposed to well-formed), but it wasn't anyway. (Making
XSLT source to be valid XML is *way* too much work to be worth it).

-Micah

Micah Cowan 06-27-2003 09:11 AM

Re: Emitting &nbsp; to HTML Output
 
"Julian F. Reschke" <reschke@muenster.de> writes:

> "Peter C. Chapin" <pchapin-news1@ecet.vtc.edu> schrieb im Newsbeitrag
> news:MPG.1964a46bc299a0cb989698@news.sover.net...
> > In article <bde4vg$sbp68$1@ID-98527.news.dfncis.de>, reschke@muenster.de
> > says...
> >
> > > > This information was helpful to me. I've been trying to include a

> &copy;
> > >
> > > Why? Just use "&#xa9;"

> >
> > That's less readible. Also, since &copy; is more abstract, if a new
> > symbol was widely adopted for copyright the meaning of &copy; could be
> > changed accordingly in the specification and my documents would
> > automatically be upgraded. I admit that's probably not too likely to be

>
> Alas, this won't happen, because these code mappings are standardized.


And it shouldn't. If a new symbol were widely adopted with the same
semantics, it really ought to be implemented as a font change, rather
than getting its own code-point.

-Micah


Micah Cowan 06-27-2003 09:28 AM

Re: Emitting &nbsp; to HTML Output
 
Peter C. Chapin <pchapin-news1@ecet.vtc.edu> writes:

> In article <bdencq$rn0r8$1@ID-98527.news.dfncis.de>, reschke@muenster.de
> says...
>
> > > > Why? Just use "&#xa9;"
> > >
> > > That's less readible. Also, since &copy; is more abstract, if a new
> > > symbol was widely adopted for copyright the meaning of &copy; could be
> > > changed accordingly in the specification and my documents would
> > > automatically be upgraded. I admit that's probably not too likely to be

> >
> > Alas, this won't happen, because these code mappings are standardized.

>
> Well, suppose the publishing industry decided that, for whatever reason,
> they wanted to use a different symbol for copyright. Imagine a symbol
> resembling "-c-" instead of "(c)".


There is absolutely no reason why &#xA9; couldn't be used for that as
well: Unicode (in general) does not specify how the glyph should look,
it only decrees that U+00A9 corresponds to a character with the
semantic of representing "a copyright symbol", whatever that may
mean. That symbol can look like anything the font-writer wants,
provided it conveys the intended meaning.

> Thus one might be tempted to introduce a new character with a different
> mapping for the new symbol (wasn't something like this done for the
> Euro?).


Ah, but the euro is a completely new currency symbol -- a completely
new semantic meaning. Or did you mean an alternative glyph?

Unicode is supposed to avoid having different code-points for a single
semantic meaning, but in practice it was unavoidable that many
alternate versions be encoded, in order to support round-trip encoding
for as many encodings as possible (i.e., that a document in a
non-Unicode encoding could be transliterated into Unicode and back
again without change).

However, I believe they would avoid it in all cases which do not
affect compatibility with other encodings.

> I'm not saying that this is a likely scenerio. In fact, I'd say it's
> pretty darn unlikely in this case. However, the point is this: named
> entities offer a layer of abstraction over the numeric characters they
> represent. Such a layer allows, in general, for more robust documents in
> the face of changing standards. It's exactly the same issue as occurs in,
> for example, C programming:
>
> #define MAX_BUFFER_SIZE 1024 // Might want to change this later.
>
> ...
>
> if (index >= MAX_BUFFER_SIZE) error();


In general, yes; but in the case of the HTML character entities, not
really. They serve more as a mnemonic than anything else: I doubt very
much that ISO/W3C/Mr. Berners-Lee had any intentions of changing these
once released; otherwise, they'd have said so.

> It seems to me like the "right" solution would be for XSLT to pass
> undefined entities directly to the target document literally
> somehow.


It can't do this and still produce well-formed XML, which is a
reasonable expectation.

> shouldn't have to know how HTML (or any other target markup) has defined
> a character entity to use it in a style sheet. The solution involving
> disable-output-escaping meets the requirements... but if it's optional in
> XSLT then it isn't as good a solution as it might be.


Include the appropriate external entities in your DTD instead (see my
other post).

-Micah

Peter C. Chapin 06-27-2003 03:17 PM

Re: Emitting &nbsp; to HTML Output
 

In article <m3of0jswcr.fsf@localhost.localdomain>, micah@cowan.name
says...

> > Well, suppose the publishing industry decided that, for whatever reason,
> > they wanted to use a different symbol for copyright. Imagine a symbol
> > resembling "-c-" instead of "(c)".

>
> There is absolutely no reason why &#xA9; couldn't be used for that as
> well: Unicode (in general) does not specify how the glyph should look,
> it only decrees that U+00A9 corresponds to a character with the
> semantic of representing "a copyright symbol", whatever that may
> mean.


I understand that. However, in the event of a character changing its
traditional glyph it seems more likely to me that a new code point would
be allocated. Some documents might specifically want to continue using
the old form of the character for historical or compatibility reasons.
Yet other documents would, one assumes, want to use the new version of
the character instead. Thus both glyphs would probably have to be
available. I don't know if this situation has ever really come up but if
it does (or has), I can imagine an argument like this being made by those
dealing with the relevant standards.

> Ah, but the euro is a completely new currency symbol -- a completely
> new semantic meaning. Or did you mean an alternative glyph?


My example of the euro was not totally accurate. I was only referring to
the idea that new symbols (not necessarly existing things) do get
introduced now and then. Thus the idea of a new symbol for copyright
coming along isn't at crazy as it might at first seem.

> In general, yes; but in the case of the HTML character entities, not
> really. They serve more as a mnemonic than anything else: I doubt very
> much that ISO/W3C/Mr. Berners-Lee had any intentions of changing these
> once released; otherwise, they'd have said so.


Perhaps, but what of other markups besides HTML? I could imagine a DTD
author defining entities specifically to hide their representations so
that later changes to the spec could be made without requiring documents
to be edited. It seems like a powerful and useful feature of entities in
general and one that the community should endeavor to support.

> Include the appropriate external entities in your DTD instead (see my
> other post).


Yes, this seems like a reasonable solution. I've made a note of your
other posting for future reference. Thanks! I still think it's less than
ideal to require that the XSLT engine expand all entities before writing
them into the output tree (or output document). For example, suppose one
uses XSLT to produce a large collection of documents and then later the
expansion of an entity changes. I'd have to reprocess the original XML
again to make new documents; the documents I produced before won't
contain the entity references and thus won't be automatically updated by
the change in the entity expansion.

In a different post Richard Tobin pointed out that output escaping only
makes sense when one is outputing a document and not when one is acting
directly on the output tree. However, I dispute that. For example, Xalan
must internally mark in the tree somehow which text regions are to be
free of escaping when it outputs the final document. (Recall that Xalan
seems to implement the disable-output-escaping mechanism). Thus the
information about what is and is not escaped needs to be stored in the
tree in some sort of implementation defined way. It thus seems reasonable
that a program like Mozilla could also store that information and then
act on it accordingly if it choose to do so. In particular if I get the
text

&copy;

into the output tree marked in such a way as to indicate that no output
escaping should take place, I'd like to think that Mozilla could treat
the '&' literally and then notice that '&copy;' is a valid HTML
entity reference. Of course it doesn't currently do that (or so it
appears) and if disable-output-escaping is optional then it is within its
rights to ignore the feature. However, I think it is as least
*meaningful* to talk about implementing that feature.

I can see that there might be problems with doing what I'm talking about.
Right now neither Xalan nor Mozilla need to interpret the character nodes
in the output tree. For Mozilla to recognize HTML entities in the tree
directly it would have to look for entities in the character nodes of the
tree. I'm guessing that doing so would introduce some serious issues, but
I'm not really sure. I seem to recall reading someplace that the DOM does
not contain entities... so this would be a violation of that policy.
Right?

I suppose what all this means, in general, is that entities don't really
work as well as one might like. Could this is a fundamental problem with
using an XML format to control styling? Or is it a limitation with the
DOM? Perhaps my real issue is that I'm trying to make a *transformation*
standard do things that don't really make sense for it to do. Hmmm.

I'm rambling. I'll stop now. :-)

Peter



All times are GMT. The time now is 12:59 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.