Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > XML > XML and Ontologies

Reply
Thread Tools

XML and Ontologies

 
 
Alex Fawcett
Guest
Posts: n/a
 
      07-10-2003
I am interested in XML mediation and the use of ontlogies to link
similar but different element names in XML schema. Am I correct in my
understanding that an onltology is a language or set of commands that
is agreed upon thus making mediation between XML element names
uneccesary. Also is this the best method of mediation between XML
files.
thanks for any help
Alex
 
Reply With Quote
 
 
 
 
Andy Dingley
Guest
Posts: n/a
 
      07-12-2003
On 10 Jul 2003 05:35:27 -0700, http://www.velocityreviews.com/forums/(E-Mail Removed) (Alex
Fawcett) wrote:

>I am interested in XML mediation and the use of ontlogies to link
>similar but different element names in XML schema.


XML is a bit of an unhappy fit with ontologies - you start to
appreciate the differences between RDF and XML.

I suggest giving Protégé a whirl
http://protege.stanford.edu

It's an environment for editing both ontologies and instance data, in
a very approachable style. Certainly worth a look.

I spent much of last week here:
http://protege.stanford.edu/workshop_vi/schedule.html
and blogged a brief trip report here:
http://www.livejournal.com/users/quercus/20830.html


It's a frames-based approach, rather than a description logics
approach. This makes big differences, but you need to get a little
hands-on with both (and frames is perhaps simpler to start with). We
don't know where we'll end up finally, and we might need to combine
both approaches.

Take a look at the W3C's OWL (Web Ontology Language) and the older
work (SHOE, OIL, DAML+OIL) too. These are generally DL-based (take a
look at Manchester's OilEd, if you want a contrast to Protégé)


>Am I correct in my
>understanding that an onltology is a language or set of commands that
>is agreed upon thus making mediation between XML element names
>uneccesary.


What's an ontology ? I've written the "30 second elevator pitch" on
this about a dozen times over the last few years. It's very hard to
give one simple definition that meets all needs. Everyone who comes to
ontologies (and it's almost a stampede now) approaches from a
different angle.

Natasha Noy 's classic paper "Ontology 101" is a good place to start
http://protege.stanford.edu/publicat...cguinness.html

Broadly, I'd say that it was one definition of a set of entities and
their related properties, expressed in a style that was understood by
other systems.

It may also describe their metaphysical "meanings", which is the
difference between an ontology and a schema (or between DAML and
DAML+OIL)

An ontology does not describe mappings or mediation between two XML
schemas. Depending on your meaning of "mediation" this might be easy
(if you know they're ontologically identical, but you just need to
match up the names), but mapping is generally speaking a fiendishly
difficult problem.

You can approach it with ontologies. You use two ontologies,
describing both the source and target. Then you apply some form of
complex reasoning to identify commonality and as much "mapping" as is
possible. From this you then generate (or auto-generate) code to do
the mapping. Easy.

The problem is that any ontology beyond the trivial has no simple
mapping between entities. Does an employee have a "works-for"
relationship with their boss, or a "works-in" with their department
and a "manages" relationship between boss and department ? This stuff
just doesn't overlay cleanly, so an improved description technique
alone isn't going to fix things.

>Also is this the best method of mediation between XML
>files.


Depends on the scale of your problem. What's an "XML file" ? Are
these the same two schemas you see every day, or is it a dynamic
problem with every new message ? How different are the two models ?

Incidentally, the same problem between one XML document and an RDBMS
is also common.

There's a lot of very rudimentary work being passed off around this
problem (Oracle 9i being a case in point) where people in suits with a
product to sell are pushing very simple (often XSLT-based) solutions
as a panacea. Those who are seriously in the field know it's not so
easy.


There's also the problem of meta-languages. Many people are already
encountering this with database output, and it has a huge effect on
the use of XSLT.

Consider an RDBMS with a generic XML export filter. What should the
output look like ?

<order>
<order-item>
<a>1</a><b>2</b>
</order-item>
<order-item>
<b>3</b><c>4</c>
</order-item>
</order>


<query name="order" >
<row name="order-item" >
<column name="a" >1</column><column name="b" >2</column>
</row>
<row name="order-item" >
<column name="b" >3</column><column name="c" >4</column>
</row>
</query>


The first of these maps column names onto element names. It generates
comapct XML that's probably how most XML coders would do it manually.
The trouble is that it's a new DTD for every query.

The second is a meta-format. The DTD is the same for every query
output and only the name="" metadata changes. It's verbose (but we
don't care, because our computers deal with that for us)

Ontologically these are _identical_ (they ought to be, or our export
filter is broken). In terms of ease of use though, they're quite
different. The first is unstable and somewhat unpredictable
(although you can easily auto-export a DTD or even ontology at the
same time), the second is hard to process (with XSLT).

XSLT is a language for transformtions of XML data at the structural
level. This works fine for our "type 1" data above, or for much XML,
because XML's data model is inferred from the structure (go read
XML-Infoset). A structural transformation _is_ a transformation at the
level of the data-model.

The second one becomes much harder. We've now separated the structural
level (and the data model of our consistent "generic export format")
from the data model of our "real" data. An XSLT transform still
operates at the structural level (it has to - that's what XSLT does)
and so it's now divorced from the level the interesting data is
residing at. Using XSLT to make real "data-level" transformations
like this becomes a real PITA. In some formats it's straightforward,
but long-winded, in others (like RDF) it becomes well-nigh impossible.
Schematron can sometimes help.

RDF is a bit like "type 2" data, with a "generic export format" that's
already defined by the RDF/XML standards. You can't work with
non-trivial RDF in XSLT, because of just this problem. That's why RDF
is manipulated by tools such as Jena, that work at the data model
level.
 
Reply With Quote
 
 
 
 
Richard Light
Guest
Posts: n/a
 
      07-14-2003
In message <(E-Mail Removed)>, Andy Dingley
<(E-Mail Removed)> writes
>On 10 Jul 2003 05:35:27 -0700, (E-Mail Removed) (Alex
>Fawcett) wrote:
>
>>I am interested in XML mediation and the use of ontlogies to link
>>similar but different element names in XML schema.

>
>>Am I correct in my
>>understanding that an onltology is a language or set of commands that
>>is agreed upon thus making mediation between XML element names
>>uneccesary.


Further to Andy's excellent thoughts on this issue, I would add the
suggestion that you could look into using Topic Maps
(http://www.topicmaps.org/) to represent equivalences between concepts
in schemas. As it happens, I was doing exactly this only last week, as
preparation for a data mapping exercise.

I took the two schemas I wanted to compare, and used XSLT to convert
them to Topic Maps. I then wrote a "links" document containing
relationships between individual concepts. As it happens, I wrote this
in the sort of "compact" style Andy described, e.g.:

<link type="exact">
<member schema="nt" id="condition-check"/>
<member schema="spectrum" id="check"/>
</link>

but I could easily use XSLT to convert this to a proper Topic Map
(containing nothing but Associations).

What I actually did was to convert this "links" document into an HTML
table of links between equivalent concepts in the two schemas. This was
sufficient for the task at hand.

In principle I could instead have made my "links" document into a TM in
its own right, and then used it to merge the two schemas into a single
TM with all the correspondences expressed as TM Associations. This sort
of approach lets you work at a higher level of abstraction than the raw
XML (i.e. at a "Topic Map concepts" level). Conversely, TM XML is
pretty simple (if verbose) in its structure, so you may get more mileage
using XSLT than Andy suggests you would with RDF.

Richard Light
--
Richard Light
SGML/XML and Museum Information Consultancy
(E-Mail Removed)

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[Announce] Stylus Studio 2007 XML Enterprise Suite Released: XML Pipeline, XML Publishing and Data Conversion API's stylusstudio@gmail.com Java 0 09-26-2006 05:04 PM
[ANN] Stylus Studio 2007 XML Enterprise Suite Released: XML Pipeline, XML Publishing and Data Conversion API's Stylus Studio XML 0 09-26-2006 04:49 PM
Different results parsing a XML file with XML::Simple (XML::Sax vs. XML::Parser) Erik Wasser Perl Misc 5 03-05-2006 10:09 PM
OWL and ontologies David Allen XML 2 12-15-2004 02:29 AM
What XML technologies to learn first for "XML Processing" and "XML Mapping"? Bomb Diggy Java 0 07-28-2004 07:26 AM



Advertisments