On 11/07/2011 01:38, lbrt chx _ gemale kom wrote:
> Space between ending and starting tag not ignorable in a XML document? ...
> ~
> I am parsing some XML document, which is validated by a schema, using apache Xerces
> ~
> What I don't get is that spaces between ending and starting tags is reported by:
> ~
> characters(char[] ch, int start, int length)
> ~
> instead of:
> ~
> ignorableWhitespace(char[] ch, int start, int length)
> ~
> I thought one of the aspects of well-formedness is that such spaces are not relevant (not even accessible) in an XML document.
> ~
> Have I forgotten to set some flag or something? In case this behavior is per spec. how do you tell apart such textual sequences?
For starters, well-formedness would not make anything irrelevant nor
inaccessible. Anything between an end-tag and a start-tag is necessarily
a direct child of the parent of the elements surrounding it. It is just
a neighbour of these elements. Remember XHTML? Spaces between inline
tags are rather relevant, aren't they?
Bottomline: whitespace between tags is just the same whitespace as
whitespace within tags, because it is indeed within a tag.
Which leaves us to the question of ignorable whitespace and not
ignorable whitespace. Sometimes whitespace is not supposed to be
ignorable: clue again inline XHTML, as well as any mixed content. Clue
also PRE-like behaviour.
Supposedly (and in fact,) whitespace is usually meant ignorable. If it
should be preserved, the xml:space attribute should be set, thus
modifying content model.
So, by default it is ignorable, but the default is enforced only when
the XML parser is /able at all/ to distinguish between ignorable and not
ignorable whitespace. Otherwise some important whitespace might be lost.
A validating parser has to be able to distinguish. A non-validating
parser may, but does not have to.
Conclusion: Most likely you need to set your XML parser so that it
performs DTD validation. Or, if such a thing exists, to set it up so
that it enforces xml:space (and lack thereof) where it finds it.
Remember though that DTD may make it a default attribute on some tags,
so it should still check the DTD when it finds one.
--
Mayeul
|