Mark wrote:
> I've working on processing xml from an external source (who I can
> contact to change things if they turn out to be incorrect) and in their
> xml documents empty elements span multiple lines e.g.
>
> <element>
> </element>
>
> rather than <element /> or <element></element>.
As you observe, that isn't an empty element. It contains at least a
newline character.
> When I'm parsing the xml
> the text of the element is '\n' which is what I would expect, but not
> what they intend - it should be null or an empty string or whatever. Is
> what they're doing wrong in expressing an empty element, or is it simply
> ugly (but not wrong)?
Not ugly, IMHO, but certainly wrong if they believe it means "empty",
and a very common misunderstanding by people who haven't understood
markup. It's easily overcome in the processing, but it's bad practice,
and symptomatic of the file being generated by someone who is just doing
what their imagination wants to see, rather than what is required. I
find this occurs rather frequently among programmers and database
engineers moved in from unrelated projects, because of the assumption
that XML contains "fields". A similar hallmark to watch out for is the
use of pretty-printing of elements containing text, eg
<element>
This is the content.
</element>
which contains superfluous newlines and other white-space.
///Peter
|