![]() |
|
|
|
#1 |
|
Hello. OK I know this is the most asked question in XML (it says in some
tutorial), but still. Please give me your insight on this (as I'm a newbie). I want to store parameters for a programm in an XML file. I can see 3 intelligent ways to this. 1) <?xml version="1.0" ?> <PARAMETERS> <LATTICE> <LPAR name="Coverage" unit="ML">0.1</LPAR> <LPAR name="Frequency" unit="Hz">10^3</LPAR> </LATTICE> ... (other params in other elements) </PARAMETERS> 2) <?xml version="1.0" ?> <PARAMETERS> <LATTICE> <Coverage unit="ML">0.1</Coverage> <Frequency unit="Hz">10^3</Frequency> </LATTICE> ... (other params in other elements) </PARAMETERS> 3) <?xml version="1.0" ?> <PARAMETERS> <LATTICE> <LPAR name="Coverage" unit="ML" value="0.1"\> <LPAR name="Frequency" unit="Hz" value="10^3"\> </LATTICE> ... (other params in other elements) </PARAMETERS> As far as I see, all three are valid - no multiple attributes with same name - the "value" is atomic, so can be stored in an attribute - I cannot think of any parameter (Coverage/Freq) which should need to be extensible later on. But - one attribute (unit) modifies another (value) in version (3), which seems to be bad practice - I would like to address the parameters directly (using tinyXML), so it's easier (is-it?) in version 2 where the Element name, reflects the purpose. So you can write something like (pseudocode) doc.getElement("Coverage") and not if(doc.getElement("LPAR").getAttribute("name") == "Coverage"){...} What do you think? Any good advice on which is better? Thanks Philipp Philipp |
|
|
|
|
#2 |
|
Posts: n/a
|
Philipp wrote: > Hello. OK I know this is the most asked question in XML (it says in some > tutorial), but still. Please give me your insight on this (as I'm a newbie). You've actually asked a rarer variant on it. > I want to store parameters for a programm in an XML file. I can see 3 > intelligent ways to this. > > > 1) > <?xml version="1.0" ?> > <PARAMETERS> > <LATTICE> > <LPAR name="Coverage" unit="ML">0.1</LPAR> > <LPAR name="Frequency" unit="Hz">10^3</LPAR> > </LATTICE> > ... (other params in other elements) > </PARAMETERS> > > 2) > <?xml version="1.0" ?> > <PARAMETERS> > <LATTICE> > <Coverage unit="ML">0.1</Coverage> > <Frequency unit="Hz">10^3</Frequency> > </LATTICE> > ... (other params in other elements) > </PARAMETERS> > > 3) > <?xml version="1.0" ?> > <PARAMETERS> > <LATTICE> > <LPAR name="Coverage" unit="ML" value="0.1"\> > <LPAR name="Frequency" unit="Hz" value="10^3"\> > </LATTICE> > ... (other params in other elements) > </PARAMETERS> There is no real difference between #2 and #3. Personally I'd go with #3 if it's strictly a config file, because it's consistent between the handling of name and value (an entirely trivial human-friendliess issue). If there's a risk of the file being "viewed" though, then I'd favour #2, although this is also a very minor issue. There's a vague de facto standard (from the way HTML gets processed) that "unknown" elements in the most anonymous default contexts are given a default human rendering by showing their text content and hiding their attributes. #1 is interesting though, and quite different. If we assume for the moment that <LATTICE> and <LPAR> have some generic meaning as "config files" then you've also introduced the concepts of "Coverage" and "Frequency" into the XML DTD and they're obviously very application-specific. Neither of these is either good or bad, but they are different -- a DTD that contains "Frequency" is now application specific, not just a generic one for doing config files. That has significant implications about project design - in general XML doesn't work well unless the entire DTD is mapped out before implementing the data / code that uses it. Getting this aspect is a regular source of problems, especially for big projects. There are project techniques to work round it, or you may even find yourself avoiding XML in favour or a more up-to-date technique. In the extreme case this becomes the "Nominals" problem which is a classic "hard problem" from the AI world. So swap between elements and attributes without two much thought -- in the simple case they're both simple atomic structures that are visible in the XML Infoset and they really are interchangeable (cardinality or internal structure might force one into becoming an element). When you start moving application concepts from XML values to XML names though, that's when it gets interesting. I'm a little more concerned about the "10^3" markup for exponents within the value itself. Although this is certainly a reasonable way of representing such values, it's not mainstream. I'd use a more common floating point notation such as "1E3" or "1.0E3" instead. PS - UPPER CASE tagnames get tiring to read after a while. I'd suggest you use lower case (mixed case is a pain) > As far as I see, all three are valid > - no multiple attributes with same name > - the "value" is atomic, so can be stored in an attribute Good basic rules to follow > - I cannot think of any parameter (Coverage/Freq) which should need to > be extensible later on. That's where your <Coverage> element would bite you later, if it would! > - one attribute (unit) modifies another (value) in version (3), which > seems to be bad practice That's fine. It's a reasonable and relevant qualification of the value (giving it dimensions) > - I would like to address the parameters directly (using tinyXML), so > it's easier (is-it?) in version 2 where the Element name, No. Any "useful" query language makes this almost transparent to you. If it's hard, get another XML query platform. The last statement isn't strictly accurate in complex cases involving Reasoners -- but it's actually <LPAR name="Coverage" ...> that's the easier case to process ! |
|
|
|
#3 |
|
Posts: n/a
|
Philipp <> writes:
>Hello. OK I know this is the most asked question in XML (it says in some >tutorial), but still. Please give me your insight on this (as I'm a newbie). When a new document type is to be defined, when should one choose child elements and when attributes? The criterion that makes sense regarding the meaning can not be used in XML due to syntactic restrictions. An element is describing something. A description is an assertion. An assertion might contain unary predicates or binary relations. Comparing this structure of assertions with the structure of XML, it seems to be natural to represent unary predicates with types and binary relations with attributes. Say, "x" is a rose and belongs to Jack. The assertion is: rose( x ) ^ owner( x, "Jack" ) This is written in XML as: <rose owner="Jack" /> Thus, my answer would be: use element types for unary predicates and attributes for binary relations. Unfortunately, in XML, this is not always possible, because in XML: - there might be at most one type per element, - there might be at most one attribute value per attribute name, and - attribute values are not allowed to be structured in XML. Therefore, the designers of XML document types are forced to abuse element /types/ in order to describe the /relation/ of an element to its parent element. This /is/ an abuse, because the designation "element type" obviously is supposed to give the /type of an element/, i.e., a property which is intrinsic to the element alone and has nothing to do with its relation to other elements. The document type designers, however, are being forced to commit this abuse, to reinvent poorly the missing structured attribute values using the means of XML. If a rose has two owners, the following element is not allowed in XML: <rose owner="Jack" owner="Jill" /> One is made to use representations such as the following: <rose> <owner>Jack</owner> <owner>Jill</owner></rose> Here the notion "element type" suggests that it is marked that Jack is "an owner", in the sense that "owner" is supposed to be the type (the kind) of Jack. The intention of the author, however, is that "owner" is supposed to give the /relation/ to the containing element "rose". This is the natural field of application for attributes, as the meaning of the word "attribute" outside of XML clearly indicates, but it is not possible to always use attributes for this purpose in XML. An alternative solution might be the following notation. <rose owner="Alexander Marie" /> Here a /new/ mini language (not XML anymore) is used within anattribute value, which, of course, can not be checked anymore by XML validators. This is really done so, for example, in XHTML, where classes are written this way. So in its most prominent XML application XHTML, the W3C has to abandon XML even to write class attributes. This is not such a good accomplishment given that the W3C was able to use the experience made with SGML and HTML when designing XML. The needless restrictions of XML inhibit the meaningful use of syntax. This makes many document type designers wondering, when attributes and when elements are should be used, which actually is an evidence of incapacity for the design of XML: XML does not have many more notations than these two: attributes and elements. And now the W3C failed to give even these two notations a clear and meaningful dedication! Without the restrictions described, XML alone would have nearly the expressive power of RDF/XML, which has to repair painfully some of the errors made in the XML-design. Now, some "experts" recommend to /always/ use subelements, because one can never know, whether an attribute value that seems to be unstructured today might need to become structured tomorrow. Other experts recommend to use attributes only when one is quite confident that they never will need to be structured. This recommendation does not even try to make a sense out of attributes, but just explains how to circumvent the obstacles the W3C has built into XML. Others recommend to use attributes for something they call "metadata". They ignore that this limits "metadata" to unstructured values. Others use an XML editor that happens to make the input of attributes more comfortable than the input of elements and seriously suggest, therefore, to use as many attributes as possible. Still others have studied how to use CSS to format XML documents and are using this to give recommendations about when to use attributes and when to use subelements. (So that the resulting document can be formatted most easily with CSS.) Of course: Mixing all these criteria (structured vs. unstructured, data vs. "metadata", by CSS, by the ease of editing, ...) often will give conflicting recommendations. Other notations than XML have solved the problem by either omitting attributes altogether or by allowing structured attributes. I believe that notations with structured attributes, which also allow multiple element types and multiple attribute values for the same attribute name, are helpful. |
|
|
|
#4 |
|
Posts: n/a
|
Stefan Ram wrote:
> Thus, my answer would be: use element types for unary > predicates and attributes for binary relations. > > Unfortunately, in XML, this is not always possible, [...] > Therefore, the designers of XML document types are forced to > abuse element /types/ in order to describe the /relation/ > of an element to its parent element. I disagree almost entirely with your fascinating analysis The history of XML is that it's "SGML lite". It's a document syntax that only had some semblance of a data model added to it 3 years later. Any understanding of "How it got to be this way" has to remember it's a document format that was defined formally, not a formal logic that had a serialization defined for it. Any interpretation, no matter how attractive it appears, has to be viewed through this perspective. If you want a format from the other view (data model, then define the serialization) then look at RDF. |
|
|
|
#5 |
|
Posts: n/a
|
Thank you for your answers.
Philipp |
|
|
|
#6 |
|
Posts: n/a
|
Hi Philipp,
Philipp <> writes: > 1) > <?xml version="1.0" ?> > <PARAMETERS> > <LATTICE> > <LPAR name="Coverage" unit="ML">0.1</LPAR> > <LPAR name="Frequency" unit="Hz">10^3</LPAR> > </LATTICE> > ... (other params in other elements) > </PARAMETERS> > > 2) > <?xml version="1.0" ?> > <PARAMETERS> > <LATTICE> > <Coverage unit="ML">0.1</Coverage> > <Frequency unit="Hz">10^3</Frequency> > </LATTICE> > ... (other params in other elements) > </PARAMETERS> > > 3) > <?xml version="1.0" ?> > <PARAMETERS> > <LATTICE> > <LPAR name="Coverage" unit="ML" value="0.1"\> > <LPAR name="Frequency" unit="Hz" value="10^3"\> > </LATTICE> > ... (other params in other elements) > </PARAMETERS> I would go with (2) since it is the least verbose and the most straightforward (KISS easiest to use with data binding, should you decide to go that way one day: Lattice l = ... Coverage c = l.Coverage (); Frequency f = l.Frequency (); hth, -boris -- Boris Kolpackov Code Synthesis Tools CC http://www.codesynthesis.com Open-Source, Cross-Platform C++ XML Data Binding |
|
|
|
#7 |
|
Posts: n/a
|
"Andy Dingley" <> writes:
>Any understanding of "How it got to be this way" has to >remember it's a document format that was defined formally The example that I showed, where the class-attribute had to be augmented with an additional mini-syntax (beyond XML) was exactly from this scope: a document format (XHMTL). |
|
|
|
#8 |
|
Posts: n/a
|
This is a religious debate. We aren't going to settle it here.
Re "mini-languages" -- remember that XML is raw syntax. As soon as you start getting into semantics, you *do* tend to wind up with structure in the data itself. That doesn't invalidate the concept of structuring at the XML level; no tool is equally appropriate at all levels of detail. |
|
|
|
#9 |
|
Posts: n/a
|
Joe Kesselman <keshlam-> writes:
>This is a religious debate. We aren't going to settle it here. Feel free to ignore it and not to take part in it. >Re "mini-languages" -- remember that XML is raw syntax. As soon >as you start getting into semantics, you *do* tend to wind up >with structure in the data itself. Still, one can imagine a »raw syntax« allowing for multiple attributes as in: <p class="alpha" class="beta">example</p> If this is too much freedom, it could be forbidden in the DTD (Schema) for all or for individual attributes. |
|
|
|
#10 |
|
Posts: n/a
|
On 23 May 2006 21:53:08 GMT, (Stefan Ram) wrote:
> Still, one can imagine a »raw syntax« allowing for multiple > attributes as in: > ><p class="alpha" class="beta">example</p> Just look at the trouble RDF/XML got into going down that route! <rdf:li> was fine, doing the same thing in attributes was anything but. |
|