Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   XML (http://www.velocityreviews.com/forums/f32-xml.html)
-   -   container elements for repeating elements ('element farms') needed? (http://www.velocityreviews.com/forums/t166497-container-elements-for-repeating-elements-element-farms-needed.html)

Wolfgang Lipp 01-27-2004 11:33 AM

container elements for repeating elements ('element farms') needed?
 
my question is: do we need container elements for
repeating elements in data-centric xml documents? or is
it for some reason very advisable to introduce
containers in xml documents even where not strictly
needed? how can a recommendation on this in the light of
existing tools like w3c xml schema and relaxng as well
es established practice be answered? i would greatly
appreciate any words, pointers, and links.

the exposition of the problem has become a rather long
one, done partly to make the matter clear to myself, and
most people will probably not have to read all of it.

to ease the discussion, let me introduce a very simple
data schema, one that describes library with books,
employees, and readers. it looks like this:

#================================================= ====
library
address
*book
*employee
*reader

book
*author
title
isbn

author extends person

employee extends person

reader extends person
card-id

person
name
last
first
#================================================= ====

the star is to be read in the usual way as 'zero or more
instances of'. i believe the above structure, where
repeating elements are introduced without explicit
container elements, to be sufficient and extensible: in
case i plan to describe individual employees in more
detail, i can always amend the schema of <employee>
(which presently only holds first and last name) and
leave the schema of the <library> element untouched. (i
also believe that mixed content and order between
elements should be eschewed in most data-centric xml, so
i do not make an effort to express mixed content or
order between sibling elements in the above.)

now, there are people who do not agree with this kind of
schema (let's call it the implicit model) and insist on
container elements for repeatables. this means we have
to explicitly introduce <books>, <employees>, and
<readers>, so the library schema will look like this:

#================================================= ====
library
books
employees
readers

books
*book

employees
*employee

readers
*reader

book
authors
title
isbn

authors
*author

author extends person

employee extends person

reader extends person
card-id

person
name
last
first
#================================================= ====

the argument, if i understand correctly, goes that in
case i want to change the structure of a cointained
element, then only in the explicit model i can do so by
redefining e.g. <employee> (and perhaps <employees>),
but not the <library> element. it is also claimed that i
will only then be able to use typing and have employees
as an entity that i can change later on, and have it
changed in all the places it appears. third, it is
claimed that for reasons of object-oriented mapping,
container elements are desirable.

i would like to dub explicit container elements 'element
farms' (think of server farms -- many of the same
bundled) for short, and call the above set of claims the
'element farm constraint', which in essence says that
you should introduce a container element (a farm)
whenever you allow the repetition of elements in data-
centric xml.

now, the second argument is obviously correct in so far
as i can *only* in the explicit model modify an element
<employees> and have that change propagate everywhere,
for the simple reason there is no such element in the
implicit model. the question is, why should i want to do
such a thing? i think it is a design decision whether or
not a given entity or set of entities is modelled
explicitly or not. i do not have <books>, <readers>, or
<employees> in the implicit model since i have nothing
to say about these groups in general, only about each
individual. this could be different: for example, at
some point we discover that all readers are subject to a
same fee, and have a maximum of books to take out of the
library. then, the set of readers becomes more tangible,
and i will have to change the implicit model like this:

#================================================= ====
library
address
*book
*employee
readers

readers
fee
maximum-number-of-books
*reader

reader extends person
card-id
#================================================= ====

this is in fact a change in the model that did not so
automatically percolate through all tiers -- i had to
modify my definition of <library>. so what? new facts
are in town, and we make space for them. we did not
build a complete, all-embracing, all-extensible data
model with the first shot, but who ever will? sure the
explicit model would have made it easier, but it is also
somewhat bulkier. second, what do you do when you find
you have something new to say about the library itself?
you will have to change the <library> element, in both
models. but third and devastatingly, we are faced, in
both models, with the situation that not all repeated
elements are covered by container elements -- the
readers element, above, has two more children. that's
allright for the implicit model, but in order to satisfy
the element farm constraint, we must introduce one more
container <xxx>, like so:

#================================================= ====
readers
fee
maximum-number-of-books
xxx

xxx
*reader
#================================================= ====

at this juncture, it becomes clear that

* explicit containers for repeated elements will under
* the element farm constraint never be true useful
* entities in the sense of data modelling, since they
* are never allowed to hold any data pertaining to
* them per se.

by the way, i do not see a very strict reason why not to
add an element <readers> but not necessarily make it the
container for the <reader> elements -- sounds strange?
well:

#================================================= ====
library
address
*book
*employee
*reader
readers

readers
fee
maximum-number-of-books

reader extends person
card-id

#================================================= ====

this structure allows you to query for a collective
'readers' and to scan for individual instances of
'reader' -- in a way the collective is independent of
its members, since we can still say that there is a fee
to pay and a maximum number of books to take home even
with zero readers.

lastly, it is possible to model employees and readers
alike as sets of generic persons. in that case, we must
have both collective elements:

#================================================= ====
library
...
employees
readers

employees
*person

readers
*person

#================================================= ====

however, since it is easy to subclass and quite
foreseeable that employees and readers do differ from
generic persons in the eyes of a library's data
administration, this approach is perhaps not very much
to be recommended.

sorry again for the longish mail,

_wolfgang lipp
w.lipp at bgbm dot org

Patrick TJ McPhee 01-30-2004 04:09 PM

Re: container elements for repeating elements ('element farms') needed?
 
In article <f16e2eb2.0401270333.5af4198a@posting.google.com >,
Wolfgang Lipp <w.lipp@bgbm.org> wrote:

% my question is: do we need container elements for
% repeating elements in data-centric xml documents?

You can often get away with it, but you may find it limits you in
unexpected ways. For instance, if you wanted to move the lists of
employees and readers from your example to external documents, then you
must have a containing element for each of them. If you wanted to
include those documents as external parsed entities, then your library
schema must allow for the containing element.

There are certainly cases where people have elected to leave off
containers and it's made it more difficult to process the data. If your
book element didn't exist, and you just had a list of titles, authors,
and isbns, the data could still be unambiguous, but more complicated.
I'm inclined to think that it's not worth spending the effort to decide
whether any given container is an example of one that might not be
useful, and to put it in if it represents some identifiable entity
(the library's collection and its subscriber base can each be thought
of as distinct entities).
--

Patrick TJ McPhee
East York Canada
ptjm@interlog.com


All times are GMT. The time now is 03:22 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.