Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > XML > Updating DTD to agree with its use in doc's

Reply
Thread Tools

Updating DTD to agree with its use in doc's

 
 
christopher.c.brewster@lmco.com
Guest
Posts: n/a
 
      01-25-2005
A few years ago my department defined a DTD for a projected class of
documents. Like the US Constitution, this DTD has details that are
never actually used, so I want to clean it up. Is there any tool that
looks at existing documents and compares with the DTD they use?

[I can think of other possible uses for such a tool, so I thought
someone might have invented it. I have XML Spy but do not see a feature
that would do this.]

Christopher Brewster

 
Reply With Quote
 
 
 
 
=?ISO-8859-1?Q?J=FCrgen_Kahrs?=
Guest
Posts: n/a
 
      01-25-2005
http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:

> A few years ago my department defined a DTD for a projected class of
> documents. Like the US Constitution, this DTD has details that are
> never actually used, so I want to clean it up. Is there any tool that
> looks at existing documents and compares with the DTD they use?


I have written a tool that reads an XML file
and produces a DTD. The DTD covers only those
parts that are actually used in the original
XML file.

http://home.vrweb.de/~juergen.kahrs/...-a-sample-file

It should not be too hard to change the script
so that it reads an arbitrary number of example
files and cumulates knowledge about their structure,
finally producing a DTD that covers all files.

If you don't find another tool and you really
need such a tool, I could write the script for
you. But you should be aware that the language
which is used (XMLgawk) is currently only in an
experimental state.
 
Reply With Quote
 
 
 
 
christopher.c.brewster@lmco.com
Guest
Posts: n/a
 
      01-25-2005
Juergen --

A script to do this would be amazing, if you're interested in doing it.
Here is a further question: I followed the link from the gawk page to
Saxon's site, which led me to a front-end for the program at HiT
Software:

http://www.hitsw.com/xml_utilites/

This utility does not work, however, for a reason that seems to
contradict what it's for: it wants to open the file's DTD! One would
think that this utility, of all utilties, would not need the DTD. It
also wants to pull in all the external entities, but again this seems
pointless for the utility's purpose. Any idea how to get around this?
Thanks for your information.

Chris Brewster

 
Reply With Quote
 
christopher.c.brewster@lmco.com
Guest
Posts: n/a
 
      01-25-2005
OK, I got this working by omitting the reference to the DTD, deleting
entity references, and deleting strings such as &text. But maybe this
utility should ignore these things. Thanks very much for the
information.

Other utilities that would help (which I might make my own versions
of): printing DTDs in structured formats for analysis (such as in table
form), and ways to compare and/or combine related DTDs.
Thanks again...

Chris Brewster

 
Reply With Quote
 
=?ISO-8859-1?Q?J=FCrgen_Kahrs?=
Guest
Posts: n/a
 
      01-26-2005
(E-Mail Removed) wrote:

> A script to do this would be amazing, if you're interested in doing it.


I just had a look at the DTD generator script again.
It looks like the script already does what you want.
On my RedHat Linux for example, I did this to generate
a DTD which covers all the files whose names are passed
on the command line:

gawk -f dtd_generator.awk /usr/share/doc/libxml2-devel-2.6.10/examples/test*.xml

<!ELEMENT doc ( dest | src | parent )* >
<!ELEMENT dest ( #PCDATA ) >
<!ATTLIST dest id CDATA #REQUIRED>
<!ELEMENT src ( #PCDATA ) >
<!ATTLIST src ref CDATA #REQUIRED>
<!ELEMENT parent ( discarded | preserved )* >
<!ELEMENT discarded ( discarded )* >
<!ELEMENT preserved ( child2 | preserved | child1 )* >
<!ELEMENT child2 ( #PCDATA ) >
<!ELEMENT child1 ( #PCDATA ) >

I guess that's what you wanted.
Such a DTD is far from perfect of course.
You should take it as a starting point, rearrange
the sequence of lines and insert comments from your
original (much larger) DTD.
 
Reply With Quote
 
Peter Flynn
Guest
Posts: n/a
 
      01-26-2005
(E-Mail Removed) wrote:

> Juergen --
>
> A script to do this would be amazing, if you're interested in doing it.


I did this as part of a migration from TEI SGML to XML. Basically:

a) run nsgmls over the documents and produce ESIS
b) use awk to extract the element type names
c) sort and uniq them
d) use Perl::SGML to read the DTD and list the element type names
e) sort them
f) caseless join the two lists with -a to spit out the non-matches

If you're not using a Unix-based system, I think Cygwin can run these tools.

///Peter
--
"The cat in the box is both a wave and a particle"
-- Terry Pratchett, introducing quantum physics in _The Authentic Cat_
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Its a bird, its a plane, its.. um, an Attribute based System? thunk Ruby 14 04-03-2010 10:08 AM
Its a bird, its a plane, its.. um, an Attribute based System? thunk Ruby 0 04-01-2010 10:25 PM
Its a bird, its a plane, no ummm, its a Ruide thunk Ruby 1 03-30-2010 11:10 AM
How to specify DTD to DTD.getDTD for DocumentParser? Ronald Fischer Java 4 03-17-2005 09:37 AM
Removing the dtd name when using print(...) on the dtd generated class Joseph Tilian Java 0 12-21-2004 02:58 PM



Advertisments