Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > XML > How to differentiate between <XX></XX> and <XX/> with SAX

Reply
Thread Tools

How to differentiate between <XX></XX> and <XX/> with SAX

 
 
dpj5754@yahoo.fr
Guest
Posts: n/a
 
      07-26-2004
Is there a simple and determinist way to make the difference
between the 2 sequences:

<XX></XX>

and

<XX/>

The EndElement callback does not provide this information.

Thanks,
Pascal.
 
Reply With Quote
 
 
 
 
Rolf Magnus
Guest
Posts: n/a
 
      07-26-2004
http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:

> Is there a simple and determinist way to make the difference
> between the 2 sequences:
>
> <XX></XX>
>
> and
>
> <XX/>


No. Their meaning is exactly the same. Why do you think you need that?

> The EndElement callback does not provide this information.



 
Reply With Quote
 
 
 
 
Franck Guillaud
Guest
Posts: n/a
 
      07-26-2004
Rolf Magnus wrote:

> (E-Mail Removed) wrote:
>
>
>>Is there a simple and determinist way to make the difference
>>between the 2 sequences:
>>
>><XX></XX>
>>
>>and
>>
>><XX/>

>
>
> No. Their meaning is exactly the same. Why do you think you need that?


Doesn't the first sample have an empty text() node as first child, and
the second doesn't ?

Franck,e-

>
>
>>The EndElement callback does not provide this information.

>
>
>

 
Reply With Quote
 
Richard Tobin
Guest
Posts: n/a
 
      07-26-2004
In article <4104fbf8$0$15283$(E-Mail Removed)>,
Franck Guillaud <(E-Mail Removed)> wrote:

>>><XX></XX>
>>>
>>><XX/>


> Doesn't the first sample have an empty text() node as first child, and
>the second doesn't ?


No.

(XML itself doesn't define any such thing as a "text node". The
Infoset has character information items, and there aren't any of them
in either case. The XPath data model doesn't have a text node in either
case, and SAX parsers do not call the characters method.)

-- Richard
 
Reply With Quote
 
Pascal Dufour
Guest
Posts: n/a
 
      07-26-2004
Rolf Magnus wrote:

>
> No. Their meaning is exactly the same. Why do you think you need that?
>


Ok, everywhere, I read that they are the same.
But this is only true for XML, not for HTML, and even it if was
true for HTML, it is still not true due to the way browsers interpret it.

What I need is to parse manually written HTML.
In HTML, <BR/> is interpreted differently than <BR></BR>.

So, I have to basic reasons to do this:

- I need it, the parser must make the difference, because
it must ouput tag that it does not process like they were entered
in order for the ouput to be correctly interpreted.

- Even if it was not needed due to a technical reason, if the
developper who wrote the HTML page decided that it is <XX/>, i
prefer to output <XX/> rather than the other form. So that the
developper can easily read the output of my program, and do not have
to wonder about some "strange" conversion.

Summary;

We do no live in a perfect world, with perfect standard perfectly
implemented by perfect developper. So we need a "stable" way to
do the difference. I like standards very much (I have a networking
background, you know ISO, IETF, IEEE, ATM FORUM, FR FORUIM, EIA, etc etc
....), but I live in a non standard world. I must adapt to survive

Thanks for your help.
Pascal.

 
Reply With Quote
 
Philippe Poulard
Guest
Posts: n/a
 
      07-27-2004
Pascal Dufour wrote:
> Rolf Magnus wrote:
>
> >
> > No. Their meaning is exactly the same. Why do you think you need that?
> >

>
> Ok, everywhere, I read that they are the same.
> But this is only true for XML, not for HTML, and even it if was
> true for HTML, it is still not true due to the way browsers interpret it.
>
> What I need is to parse manually written HTML.
> In HTML, <BR/> is interpreted differently than <BR></BR>.


you can't parse html with an xml parser ; however, you can parse html
with an sgml parser ; additionally, you can use a tool that converts
html in xml (with best effort), like Cyber Neko HTML Parser
http://www.apache.org/%7Eandyc/neko/doc/html/

>
> So, I have to basic reasons to do this:
>
> - I need it, the parser must make the difference, because
> it must ouput tag that it does not process like they were entered
> in order for the ouput to be correctly interpreted.


there's something quite confusing : you're talking about parsing like
outputing ; these 2 processes are totally opposite : parsing gives
access to a data model, and serializing (i prefer this term) renders
this data model to an xml characters form (file, char flow...)

you can't act on the xml data model because it is governed by a set of
stable specifications, but you can act on the serialization ; for this
purpose, formatter tools often provide a set of options that allow to
tune the output ; you can also write your own formatter

>
> - Even if it was not needed due to a technical reason, if the
> developper who wrote the HTML page decided that it is <XX/>, i
> prefer to output <XX/> rather than the other form. So that the
> developper can easily read the output of my program, and do not have
> to wonder about some "strange" conversion.
>
> Summary;
>
> We do no live in a perfect world, with perfect standard perfectly
> implemented by perfect developper. So we need a "stable" way to
> do the difference. I like standards very much (I have a networking
> background, you know ISO, IETF, IEEE, ATM FORUM, FR FORUIM, EIA, etc etc
> ...), but I live in a non standard world. I must adapt to survive
>
> Thanks for your help.
> Pascal.
>



--
Cordialement,

///
(. .)
-----ooO--(_)--Ooo-----
| Philippe Poulard |
-----------------------
 
Reply With Quote
 
David Carlisle
Guest
Posts: n/a
 
      07-27-2004

Ok, everywhere, I read that they are the same.
But this is only true for XML, not for HTML, and even it if was
true for HTML, it is still not true due to the way browsers interpret it.


well for HTML (but this is after all an XML newsgroup) the situation is
completely different.
<BR/> and <BR></BR>
are _both_ syntax errors ( /> is always a syntax error in HTML, and BR
has no end tag as it is declared EMPTY in the HTML DTD, so </BR> is also
an error)

Of course a browser may or may not have some lax silent error recovery
from either of these situtations but in any case the behaviour will be
browser specific.


> - Even if it was not needed due to a technical reason, if the
> developper who wrote the HTML page decided that it is <XX/>, i
> prefer to output <XX/> rather than the other form.


So long as you are clearly writing HTML rather than XML there's nothing
wrong with you doing that. XSLT for example, if writing html can not
distinguish the inputs of <BR/> and <BR></BR> as the input is XML and
these are the same, but in either case an "identity" transform will
produce the HTML syntax
<BR>
if the html output method is being used (which it is by default if the
top level output element is <html>.

David

 
Reply With Quote
 
Stefan Ram
Guest
Posts: n/a
 
      07-27-2004
David Carlisle <(E-Mail Removed)> writes:
> XSLT for example, if writing html can not
>distinguish the inputs of <BR/> and <BR></BR> as the input is XML and
>these are the same,


Actually, in XML, the notion "element" is not an abstract one,
but a concrete non-terminal symbol of the syntax.

Therefore, as elements, the element "<br/>" and the element
"<br></br>" are two /different/ elements, just as "<br/>" also
is a different element than "<br />".

You might say, that they have the same element type, the same
contents and the same number, names and value of attributes
(here: none). Or, possibly, that they have the same
"infoset", but the infoset specification is not part of the
XML specification.


 
Reply With Quote
 
Patrick TJ McPhee
Guest
Posts: n/a
 
      07-27-2004
In article <(E-Mail Removed)>,
David Carlisle <(E-Mail Removed)> wrote:

% are _both_ syntax errors ( /> is always a syntax error in HTML, and BR

Actually, it's not, although its meaning is not the same as in XML. <br />
means the same as <br>>.
--

Patrick TJ McPhee
East York Canada
(E-Mail Removed)
 
Reply With Quote
 
David Carlisle
Guest
Posts: n/a
 
      07-28-2004

> Actually, it's not, although its meaning is not the same as in XML. <br />
> means the same as <br>>.


Ooops sorry I was thinking that was turned off in HTML's SGML decl, but
apparently not. Still (most of my point holds, in fact that means
that the situation is worse than I indicated: if you rely on <br/>
working in the browser after sending the file with an html mime type you
are not just relying on lax error recovery, you are relying on
non-conformant HTML parsing.


David
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Differentiate between user caused events and script generated events jmpinchot@gmail.com Javascript 1 04-27-2007 03:05 AM
Differentiate between static and dynamic allocated objects Jo C++ 7 12-28-2006 11:19 AM
How to differentiate between click event of multiple server buttons mayur_hirpara@hotmail.com ASP .Net 5 06-06-2006 03:42 PM
Two questions: datagrid with string[] and how to differentiate between columns Bob Weiner ASP .Net Datagrid Control 1 05-06-2005 01:09 PM
A Plea: Differentiate between Lens Chromatic Aberration and Sensor Chromatic Abbertion digiboy Digital Photography 3 12-06-2004 07:25 AM



Advertisments