Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > XML > If DTD is unspecifed XML should not parse

Reply
Thread Tools

If DTD is unspecifed XML should not parse

 
 
Mithil
Guest
Posts: n/a
 
      08-01-2007
Hello everyone,

I have a question regarding DTD and XML, is there any way to stop the
parser in parsing the XML file if the DTD is not specified in the
Doctype of the XML file and also throw an error ? I am using java by
the way any help is greatly appreciated.

Regards,
Mithil

 
Reply With Quote
 
 
 
 
Joe Kesselman
Guest
Posts: n/a
 
      08-01-2007
Mithil wrote:
> I have a question regarding DTD and XML, is there any way to stop the
> parser in parsing the XML file if the DTD is not specified in the
> Doctype of the XML file and also throw an error ? I am using java by
> the way any help is greatly appreciated.


If the DTD is not specified by the document type, validation is not
performed and parsing runs normally.

If you really insist on rejecting these documents... Depending on the
parser and API you're using, you may be able to detect that no DTD has
been specified and have your program do something appropriate. If you're
using a SAX parser which presents this information, your handler may be
able to crash the parser by throwing an exception. Hope that helps.



--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry
 
Reply With Quote
 
 
 
 
Richard Tobin
Guest
Posts: n/a
 
      08-02-2007
In article <(E-Mail Removed)>,
Joe Kesselman <(E-Mail Removed)> wrote:

>> I have a question regarding DTD and XML, is there any way to stop the
>> parser in parsing the XML file if the DTD is not specified in the
>> Doctype of the XML file and also throw an error ? I am using java by
>> the way any help is greatly appreciated.


>If the DTD is not specified by the document type, validation is not
>performed and parsing runs normally.


But presumably the "invalid" indicator will be set (whatever that is
for the parser in question), so if you want to reject invalid documents
are well as ones without a DTD you can use that.

-- Richard
--
"Consideration shall be given to the need for as many as 32 characters
in some alphabets" - X3.4, 1963.
 
Reply With Quote
 
RedGrittyBrick
Guest
Posts: n/a
 
      08-03-2007
Richard Tobin wrote:
> In article <(E-Mail Removed)>,
> Joe Kesselman <(E-Mail Removed)> wrote:
>
>>> I have a question regarding DTD and XML, is there any way to stop the
>>> parser in parsing the XML file if the DTD is not specified in the
>>> Doctype of the XML file and also throw an error ? I am using java by
>>> the way any help is greatly appreciated.

>
>> If the DTD is not specified by the document type, validation is not
>> performed and parsing runs normally.

>
> But presumably the "invalid" indicator will be set (whatever that is
> for the parser in question), so if you want to reject invalid documents
> are well as ones without a DTD you can use that.
>


Just because an XML document lacks a DTD doesn't mean it is invalid does
it? It might conform to an external XSD schema or external DTD?
 
Reply With Quote
 
Richard Tobin
Guest
Posts: n/a
 
      08-03-2007
In article <(E-Mail Removed)>,
RedGrittyBrick <(E-Mail Removed)> wrote:
>Just because an XML document lacks a DTD doesn't mean it is invalid does
>it? It might conform to an external XSD schema or external DTD?


The word "valid" is used in various ways, but the XML spec use it to
mean valid with respect to the DTD referred to in the document. If it
doesn't refer to a DTD, it's invalid.

-- Richard
--
"Consideration shall be given to the need for as many as 32 characters
in some alphabets" - X3.4, 1963.
 
Reply With Quote
 
Joe Kesselman
Guest
Posts: n/a
 
      08-04-2007
> The word "valid" is used in various ways, but the XML spec use it to
> mean valid with respect to the DTD referred to in the document. If it
> doesn't refer to a DTD, it's invalid.


There are arguably multiple states: Not validated (well-formed only, not
tested), invalid (DTD validation attempted and failed), valid (DTD
validation attempted and succeeded), schema-invalid and schema-valid.
(The latter two are distinguished only in the Post-Schema-Validation
infoset, not in the basic infoset.)

As far as I can tell, the basic XML Infoset doesn't actually included
any indication of these states as part of its information content. There
are pieces of information which are only available when a document is
valid, or when it was at least processed with a validating parser, but
that's the closest I can find. Apparently detecting validation success
or failure was left to whatever mechanism you use to invoke the parser
and/or validator.


--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry
 
Reply With Quote
 
Richard Tobin
Guest
Posts: n/a
 
      08-04-2007
In article <(E-Mail Removed)>,
Joe Kesselman <(E-Mail Removed)> wrote:

>There are arguably multiple states: Not validated (well-formed only, not
>tested), invalid (DTD validation attempted and failed), valid (DTD
>validation attempted and succeeded),


True, but the XML spec says that validating parsers must report
violations of validity constraints, and a document without a DTD
will violate at least one.

>As far as I can tell, the basic XML Infoset doesn't actually included
>any indication of these states as part of its information content.


Yes, the Infoset doesn't address validity except in the cases where
invalidity prevents an item from having a value (notably the
[references] property of attributes).

>Apparently detecting validation success
>or failure was left to whatever mechanism you use to invoke the parser
>and/or validator.


All that's required is there must be such a mechanism for a validating
parser.

-- Richard
--
"Consideration shall be given to the need for as many as 32 characters
in some alphabets" - X3.4, 1963.
 
Reply With Quote
 
Joe Kesselman
Guest
Posts: n/a
 
      08-04-2007
>> Apparently detecting validation success
>> or failure was left to whatever mechanism you use to invoke the parser
>> and/or validator.

>
> All that's required is there must be such a mechanism for a validating
> parser.


Yep. And certainly the various parser APIs (SAX, JAXP, the DOM3 document
load operations) do report this.

I just would have been a bit happier, from an architectural point of
view, if this had been made one of the properties of the Infoset.

Oh well. In an ideal world we would have developed the Infoset first,
including all the afterthoughts like namespaces, then developed the
schema language and XML markup syntax from that. Maybe if/when XML ever
graduates from Recommendation to Standard (the semi-mythical XML 2.0?)
we'll have the luxury of being able to do it that way. Meanwhile, the
advantage of developing from the syntax forward was that we were able to
put XML into use immediately; the disadvantage is that it has a bunch of
minor warts.

--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry
 
Reply With Quote
 
Mithil
Guest
Posts: n/a
 
      08-06-2007
wow thanks guys I think this argument gave me insight into more stuff.
I really appreciated it thanks again.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
XML file from one DTD to another DTD test Java 2 07-28-2006 08:48 PM
How to specify DTD to DTD.getDTD for DocumentParser? Ronald Fischer Java 4 03-17-2005 09:37 AM
Removing the dtd name when using print(...) on the dtd generated class Joseph Tilian Java 0 12-21-2004 02:58 PM
How to get Java to read in XML file and parse it against DTD? Stuart Miller Java 2 08-03-2004 10:48 AM
How to get Java to read in XML file and parse it against DTD? Stuart Miller XML 0 07-26-2004 02:28 PM



Advertisments