| Home | Forums | Reviews | Guides | Newsgroups | Register | Search |
![]() |
| Thread Tools |
| Juliano |
|
|
|
| |
|
Himanshu
Guest
Posts: n/a
|
2009/11/14 Juliano <>:
> Hello, everybody. > > I'm a linguist with practical skills on computers/programming. > > We've been working with an ontology at my department, and now I need > to create a GUI viewer for the flat file we have. > I tried to write an Ontology class which manages the data read and > parsed from the flat file, but it takes a relatively long time. > Besides, we have plans to set up a website for online display of said > ontology. So, I have been being pushed towards changing the basic plan > and build a DB so that data access will be faster and easier for both > the desktop GUI and the web app. Right now, I'm trying to work with > sqlite, since it can be used as a separate file for the GUI and as a > DB for Django (which may be the choice for the web interface). > > I have been redaing some books on DBs but I kind of get stuck when it > comes to the normalization and the planning of the tables. The problem > is that we have basically four fields that can be arranged in a tree- > like structure. Eg: > > Concept > Â*|----- Slot > Â*| Â* Â* Â* Â*`------ Facet > Â*| Â* Â* Â* Â* Â* Â* Â* Â* Â*`------ Filler > Â*`----- Slot > Â* Â* Â* Â* Â* `------ Facet > Â* Â* Â* Â* Â* Â* Â* Â* Â* Â* `------ Filler > Â* Â* Â* Â* Â* Â* Â* Â* Â* Â* `------ Filler > Â* Â*... > > So, for ONE *concept*, we have, usually, MANY *slots*, each *slot* has > ONE *facet*, and each *facet* can have MORE THAN ONE *filler*. > Besides, some *slots* and *fillers* are themselves *concepts*, > creating a sort of recursive reference. > > <begin table> > line_no concepts Â* Â* Â* Â*slots Â* facets Â*fillers > ------------------------------------------------------------------------------ > 00000 Â* ABANDON DEFINITION Â* Â* Â*VALUE Â* "to leave or desert something or > someone" > 00001 Â* ABANDON IS-A Â* Â*VALUE Â* EXIT > 00002 Â* ABANDON LEXE Â* Â*MAP-LEX "leave behind-V1" > 00003 Â* ABANDON LEXE Â* Â*MAP-LEX abandon-V1 > (...) > 97420 Â* ZULU Â* Â*DEFINITION Â* Â* Â*VALUE Â* "a language or dialect spoken in south > africa and others" > 97421 Â* ZULU Â* Â*INSTANCE-OF Â* Â* VALUE Â* OTHER-NIGER-KORDOFANIAN-LANGUAGE > 97422 Â* ZULU Â* Â*LANGUAGE-OF Â* Â* INV Â* Â* LESOTHO > 97423 Â* ZULU Â* Â*LANGUAGE-OF Â* Â* INV Â* Â* SOUTH-AFRICA > <end table> > > I tried to create index tables for concepts, slots, facets and > fillers, which gave me the following table: > > <begin table> > line_no concepts Â* Â* Â* Â*slots Â* facets Â*fillers > ------------------------------------------------------------------------------ > 00000 Â* cn_00000 Â* Â* Â* Â*sl_00048 Â* Â* Â* Â*fc_00007 Â* Â* Â* Â*fl_07349 > 00001 Â* cn_00000 Â* Â* Â* Â*cn_02605 Â* Â* Â* Â*fc_00007 Â* Â* Â* Â*cn_01768 > 00002 Â* cn_00000 Â* Â* Â* Â*sl_00121 Â* Â* Â* Â*fc_00002 Â* Â* Â* Â*fl_04329 > 00003 Â* cn_00000 Â* Â* Â* Â*sl_00121 Â* Â* Â* Â*fc_00002 Â* Â* Â* Â*fl_15009 > (...) > 97420 Â* cn_05429 Â* Â* Â* Â*sl_00048 Â* Â* Â* Â*fc_00007 Â* Â* Â* Â*fl_01340 > 97421 Â* cn_05429 Â* Â* Â* Â*cn_02493 Â* Â* Â* Â*fc_00007 Â* Â* Â* Â*cn_03526 > 97422 Â* cn_05429 Â* Â* Â* Â*cn_02750 Â* Â* Â* Â*fc_00001 Â* Â* Â* Â*cn_02816 > 97423 Â* cn_05429 Â* Â* Â* Â*cn_02750 Â* Â* Â* Â*fc_00001 Â* Â* Â* Â*cn_04580 > <end table> > > (cn_XXXXX from concept index, sl_XXXXX from slot index, > fc_XXXXX from facet index, fl_XXXXX from filler index.) > > As you can see, only concepts and facets are populated by their own > type of data. > Whereas slots and fillers can be populated by their own types or by > concepts. > > What would be a good way to create tables for this situation? > In fact, this is the first time I've ever tried to create a DB, so I'm > completely lost. > > I'm looking forward to a reply... > > Thank you very much, > Juliano If you have an ontology that doesn't run into GB's of data you could also consider this. Load it into an in memory data structure of your choice from the text file. Here are the arguments in favour :- 1) The structure doesn't lend itself nicely to tables. So a relational database may not be the best choice when you start traversing the data. Imagine writing a query to get the data and show it as a tree. 2) Changes to the ontology are infrequent so you don't use most of the ACID facilities the database offers. 3) cyc uses its own proprietory data format which is probably not a _relational_ database 4) With your own data structure you know where to fix if the performance is bad. The current poor performance could be due to some other problem which may not go away on its own on switching to db. 5) Keeping the data in a flat file has the advantage of making it easy to update and version control. Otherwise you need another program for editing it. I am no database expert so let's see if someone has a better table design suggestion. Thank You, ++imanshu |
|
|
|
|
|||
|
|||
| Himanshu |
|
|
|
| |
|
Ken Seehart
Guest
Posts: n/a
|
Good idea to use Django. I've just started using it and I really like
it. However, I should give you a heads-up: You will probably want to use a Django migration tool (I'm using South) because the alternative is basically to rebuild your database each time your model changes. Unfortunately, South can sometimes get confused when using anything less sophisticated than PostgreSQL (I switched from MySQL to PostgreSQL for this reason). I don't know if South or the other various Django migration tools work with MySQL. Applying the DRY (don't repeat yourself), you might even consider running the same code as a local web server instead of implementing a separate desktop version. But it is just a suggestion; there are various reasons why you might not want to do that. On to the DB design question... One approach would be to make a Generic class that can represent a concept, slot, or filler, which would have a type field to identify which of these to use. class Concept(models.Model): ... class Slot(models.Model): ... class Filler(models.Model): ... class Facet(models.Model): ... class Generic(models.Model): TYPE_CHOICES = ( (u'c', u'concept'), (u's', u'slot'), (u'f', u'filler'), } # Only one of the following is used. The other two are blank. concept = models.ForeignKey(Concept) slot = models.ForeignKey(Slot) filler = models.ForeignKey(Filler) class ConceptDef(models.Model): concept = models.ForeignKey(Concept) slot = models.ForeignKey(Generic) facet = models.ForeignKey(Facet) filler = models.ForeignKey(Generic) Juliano wrote: > Hello, everybody. > > I'm a linguist with practical skills on computers/programming. > > We've been working with an ontology at my department, and now I need > to create a GUI viewer for the flat file we have. > I tried to write an Ontology class which manages the data read and > parsed from the flat file, but it takes a relatively long time. > Besides, we have plans to set up a website for online display of said > ontology. So, I have been being pushed towards changing the basic plan > and build a DB so that data access will be faster and easier for both > the desktop GUI and the web app. Right now, I'm trying to work with > sqlite, since it can be used as a separate file for the GUI and as a > DB for Django (which may be the choice for the web interface). > > I have been redaing some books on DBs but I kind of get stuck when it > comes to the normalization and the planning of the tables. The problem > is that we have basically four fields that can be arranged in a tree- > like structure. Eg: > > Concept > |----- Slot > | `------ Facet > | `------ Filler > `----- Slot > `------ Facet > `------ Filler > `------ Filler > ... > > So, for ONE *concept*, we have, usually, MANY *slots*, each *slot* has > ONE *facet*, and each *facet* can have MORE THAN ONE *filler*. > Besides, some *slots* and *fillers* are themselves *concepts*, > creating a sort of recursive reference. > > <begin table> > line_no concepts slots facets fillers > ------------------------------------------------------------------------------ > 00000 ABANDON DEFINITION VALUE "to leave or desert something or > someone" > 00001 ABANDON IS-A VALUE EXIT > 00002 ABANDON LEXE MAP-LEX "leave behind-V1" > 00003 ABANDON LEXE MAP-LEX abandon-V1 > (...) > 97420 ZULU DEFINITION VALUE "a language or dialect spoken in south > africa and others" > 97421 ZULU INSTANCE-OF VALUE OTHER-NIGER-KORDOFANIAN-LANGUAGE > 97422 ZULU LANGUAGE-OF INV LESOTHO > 97423 ZULU LANGUAGE-OF INV SOUTH-AFRICA > <end table> > > I tried to create index tables for concepts, slots, facets and > fillers, which gave me the following table: > > <begin table> > line_no concepts slots facets fillers > ------------------------------------------------------------------------------ > 00000 cn_00000 sl_00048 fc_00007 fl_07349 > 00001 cn_00000 cn_02605 fc_00007 cn_01768 > 00002 cn_00000 sl_00121 fc_00002 fl_04329 > 00003 cn_00000 sl_00121 fc_00002 fl_15009 > (...) > 97420 cn_05429 sl_00048 fc_00007 fl_01340 > 97421 cn_05429 cn_02493 fc_00007 cn_03526 > 97422 cn_05429 cn_02750 fc_00001 cn_02816 > 97423 cn_05429 cn_02750 fc_00001 cn_04580 > <end table> > > (cn_XXXXX from concept index, sl_XXXXX from slot index, > fc_XXXXX from facet index, fl_XXXXX from filler index.) > > As you can see, only concepts and facets are populated by their own > type of data. > Whereas slots and fillers can be populated by their own types or by > concepts. > > What would be a good way to create tables for this situation? > In fact, this is the first time I've ever tried to create a DB, so I'm > completely lost. > > I'm looking forward to a reply... > > Thank you very much, > Juliano > |
|
|
|
|
|||
|
|||
| Ken Seehart |
|
Ken Seehart
Guest
Posts: n/a
|
Oops, forgot the blank arg. Anyway, this is of course untested code...
# Only one of the following is used. The other two are blank. concept = models.ForeignKey(Concept, blank=True) slot = models.ForeignKey(Slot, blank=True) filler = models.ForeignKey(Filler, blank=True) Ken Seehart wrote: > Good idea to use Django. I've just started using it and I really like > it. However, I should give you a heads-up: You will probably want to > use a Django migration tool (I'm using South) because the alternative > is basically to rebuild your database each time your model changes. > Unfortunately, South can sometimes get confused when using anything > less sophisticated than PostgreSQL (I switched from MySQL to > PostgreSQL for this reason). I don't know if South or the other > various Django migration tools work with MySQL. > > Applying the DRY (don't repeat yourself), you might even consider > running the same code as a local web server instead of implementing a > separate desktop version. But it is just a suggestion; there are > various reasons why you might not want to do that. > > On to the DB design question... > > One approach would be to make a Generic class that can represent a > concept, slot, or filler, which would have a type field to identify > which of these to use. > > class Concept(models.Model): > ... > > class Slot(models.Model): > ... > > class Filler(models.Model): > ... > > class Facet(models.Model): > ... > > class Generic(models.Model): > TYPE_CHOICES = ( > (u'c', u'concept'), > (u's', u'slot'), > (u'f', u'filler'), > } > > # Only one of the following is used. The other two are blank. > concept = models.ForeignKey(Concept) > slot = models.ForeignKey(Slot) > filler = models.ForeignKey(Filler) > > class ConceptDef(models.Model): > concept = models.ForeignKey(Concept) > slot = models.ForeignKey(Generic) > facet = models.ForeignKey(Facet) > filler = models.ForeignKey(Generic) > > Juliano wrote: >> Hello, everybody. >> >> I'm a linguist with practical skills on computers/programming. >> >> We've been working with an ontology at my department, and now I need >> to create a GUI viewer for the flat file we have. >> I tried to write an Ontology class which manages the data read and >> parsed from the flat file, but it takes a relatively long time. >> Besides, we have plans to set up a website for online display of said >> ontology. So, I have been being pushed towards changing the basic plan >> and build a DB so that data access will be faster and easier for both >> the desktop GUI and the web app. Right now, I'm trying to work with >> sqlite, since it can be used as a separate file for the GUI and as a >> DB for Django (which may be the choice for the web interface). >> >> I have been redaing some books on DBs but I kind of get stuck when it >> comes to the normalization and the planning of the tables. The problem >> is that we have basically four fields that can be arranged in a tree- >> like structure. Eg: >> >> Concept >> |----- Slot >> | `------ Facet >> | `------ Filler >> `----- Slot >> `------ Facet >> `------ Filler >> `------ Filler >> ... >> >> So, for ONE *concept*, we have, usually, MANY *slots*, each *slot* has >> ONE *facet*, and each *facet* can have MORE THAN ONE *filler*. >> Besides, some *slots* and *fillers* are themselves *concepts*, >> creating a sort of recursive reference. >> >> <begin table> >> line_no concepts slots facets fillers >> ------------------------------------------------------------------------------ >> >> 00000 ABANDON DEFINITION VALUE "to leave or desert >> something or >> someone" >> 00001 ABANDON IS-A VALUE EXIT >> 00002 ABANDON LEXE MAP-LEX "leave behind-V1" >> 00003 ABANDON LEXE MAP-LEX abandon-V1 >> (...) >> 97420 ZULU DEFINITION VALUE "a language or dialect spoken >> in south >> africa and others" >> 97421 ZULU INSTANCE-OF VALUE >> OTHER-NIGER-KORDOFANIAN-LANGUAGE >> 97422 ZULU LANGUAGE-OF INV LESOTHO >> 97423 ZULU LANGUAGE-OF INV SOUTH-AFRICA >> <end table> >> >> I tried to create index tables for concepts, slots, facets and >> fillers, which gave me the following table: >> >> <begin table> >> line_no concepts slots facets fillers >> ------------------------------------------------------------------------------ >> >> 00000 cn_00000 sl_00048 fc_00007 fl_07349 >> 00001 cn_00000 cn_02605 fc_00007 cn_01768 >> 00002 cn_00000 sl_00121 fc_00002 fl_04329 >> 00003 cn_00000 sl_00121 fc_00002 fl_15009 >> (...) >> 97420 cn_05429 sl_00048 fc_00007 fl_01340 >> 97421 cn_05429 cn_02493 fc_00007 cn_03526 >> 97422 cn_05429 cn_02750 fc_00001 cn_02816 >> 97423 cn_05429 cn_02750 fc_00001 cn_04580 >> <end table> >> >> (cn_XXXXX from concept index, sl_XXXXX from slot index, >> fc_XXXXX from facet index, fl_XXXXX from filler index.) >> >> As you can see, only concepts and facets are populated by their own >> type of data. >> Whereas slots and fillers can be populated by their own types or by >> concepts. >> >> What would be a good way to create tables for this situation? >> In fact, this is the first time I've ever tried to create a DB, so I'm >> completely lost. >> >> I'm looking forward to a reply... >> >> Thank you very much, >> Juliano >> > |
|
|
|
|
|||
|
|||
| Ken Seehart |
|
Dennis Lee Bieber
Guest
Posts: n/a
|
On Sat, 14 Nov 2009 02:37:31 -0800 (PST), Juliano <>
declaimed the following in gmane.comp.python.general: > Concept > |----- Slot > | `------ Facet > | `------ Filler > `----- Slot > `------ Facet > `------ Filler > `------ Filler > ... > > So, for ONE *concept*, we have, usually, MANY *slots*, each *slot* has > ONE *facet*, and each *facet* can have MORE THAN ONE *filler*. > Besides, some *slots* and *fillers* are themselves *concepts*, > creating a sort of recursive reference. > First thing that initially strikes my mind is that "each *slot* has ONE *facet*" implies they are not a hierarchy, but on the same level. And your example seems to follow that -- you show "LEXE" always linked to "MAP-LEX" and "LANGUAGE-OF" always linked to "INV". At my level of understanding, that almost makes "facet" a redundant piece of information; it is only meaningful if it serves to control how the "filler" is to be processed -- that is, both "IS-A" "EXIT" and "DEFINITION" "to leave..." are processed the same way, as some generic "VALUE" entity > <begin table> > line_no concepts slots facets fillers > ------------------------------------------------------------------------------ > 00000 ABANDON DEFINITION VALUE "to leave or desert something or > someone" > 00001 ABANDON IS-A VALUE EXIT > 00002 ABANDON LEXE MAP-LEX "leave behind-V1" > 00003 ABANDON LEXE MAP-LEX abandon-V1 > (...) > 97420 ZULU DEFINITION VALUE "a language or dialect spoken in south > africa and others" > 97421 ZULU INSTANCE-OF VALUE OTHER-NIGER-KORDOFANIAN-LANGUAGE > 97422 ZULU LANGUAGE-OF INV LESOTHO > 97423 ZULU LANGUAGE-OF INV SOUTH-AFRICA > <end table> > > I tried to create index tables for concepts, slots, facets and > fillers, which gave me the following table: > > <begin table> > line_no concepts slots facets fillers > ------------------------------------------------------------------------------ > 00000 cn_00000 sl_00048 fc_00007 fl_07349 > 00001 cn_00000 cn_02605 fc_00007 cn_01768 > 00002 cn_00000 sl_00121 fc_00002 fl_04329 > 00003 cn_00000 sl_00121 fc_00002 fl_15009 > (...) > 97420 cn_05429 sl_00048 fc_00007 fl_01340 > 97421 cn_05429 cn_02493 fc_00007 cn_03526 > 97422 cn_05429 cn_02750 fc_00001 cn_02816 > 97423 cn_05429 cn_02750 fc_00001 cn_04580 > <end table> > > (cn_XXXXX from concept index, sl_XXXXX from slot index, > fc_XXXXX from facet index, fl_XXXXX from filler index.) > > As you can see, only concepts and facets are populated by their own > type of data. > Whereas slots and fillers can be populated by their own types or by > concepts. That is going to be the tricky one... First off -- is "line_no" significant? {simplified syntax here: tablename(column, column...) where _stuff_ defines a primary [unique, autoincrement] key and t.c indicates a foreign key linkage} Concept(_ID_, Term) a primary key the concept term/keyword itself Slot(_ID_, Concept.ID as related, slotType, Concept.ID as slotconcept, Term, Facet.ID as facet) a primary key link to the concept entry this slot is part of an indicator that this slot is distinct type or references another concept link to the referenced concept -- if slotType is "concept", else Null the concept term/keyword -- if slotType is "distinct", else Null link to the facet Facet(_ID_, Term) primary key term or keyword for the facet Filler(_ID_, Slot.ID, fillerType, Concept.ID as fillerconcept, Text) primary key link to the slot filler is part of indicator if the filler is distinct (likely text string) or references another concept link to the referenced concept or Null, based on fillerType distinct data (text string) or Null, based upon fillerType Concept: 0 ABANDON 1768 EXIT 2493 INSTANCE-OF 2605 IS-A 2750 LANGUAGE-OF 2816 LESOTHO 3526 OTHER-NIGER-KORDOFANIAN-LANGUAGE 4580 SOUTH-AFRICA .... n ZULU Facet: 0 VALUE 1 MAP-LEX 2 INV Slot: 0 0 slot Null DEFINITION 0 1 0 concept 2605 Null 0 2 0 slot Null LEXE 1 .... m n slot Null DEFINITION 0 m+1 n concept 2493 Null 0 m+2 n concept 2750 Null 2 Filler: 0 0 text Null "to leave..." 1 1 concept 1768 Null 2 2 text Null "leave behind..." 3 2 text Null "abandon..." 4 m text Null "a language..." 5 m+1 concept 3526 Null 6 m+2 concept 2816 Null 7 m+2 concept 4580 Null The alternative is to split Filler and Slot into two tables each: conceptSlot(_ID_, Concept.ID as related, Concept.ID as slotconcept, Facet.ID as facet) conceptSlot: 0 0 2605 0 .... m+1 n 2493 0 m+2 n 2750 2 distinctSlot(_ID_, Concept.ID as related, Term, Facet.ID as facet) distinctSlot: Slot: 0 0 DEFINITION 0 1 0 LEXE 1 .... m n DEFINITION 0 conceptFiller(_ID_, conceptSlot.ID as related, Concept.ID as fillerconcept) conceptFiller: 1 1 1768 2 m+1 3526 3 m+2 2816 4 m+2 4580 textFiller(_ID_, distinctSlot.ID related, Text) textFiller: 0 0 "to leave..." 1 1 "leave behind..." 2 1 "abandon..." 3 m "a language..." Assuming all terms etc are text strings, and hence compatible in data type, recovering (less line numbers) the data content looks something like... select Concept.Term, distinctSlot.Term, Facet.Term, textFiller.Text from Concept inner join distinctSlot on Concept.ID = distinctSlot.related inner join Facet on distinctSlot.facet = Facet.ID inner join textFiller on textFiller.related = distinctSlot.ID UNION select Concept.Term, distinctSlot.Term, Facet.Term, c.Term from Concept inner join distinctSlot on Concept.ID = distinctSlot.related inner join Facet on distinctSlot.facet = Facet.ID inner join conceptFiller on conceptFiller.related = distinctSlot.ID inner join Concept as c on c.ID = conceptFiller.fillerconcept UNION select Concept.Term, c2.Term, Facet.Term, textFiller.Text from Concept inner join conceptSlot on Concept.ID = conceptSlot.related inner join Facet on conceptSlot.facet = Facet.ID inner join textFiller on textFiller.related = conceptSlot.ID inner join Concept as c2 on conceptSlot.slotconcept = c2.ID UNION select Concept.Term, c3.Term, Facet.Term, c4.Term from Concept inner join conceptSlot on Concept.ID = conceptSlot.related inner join Facet on conceptSlot.facet = Facet.ID inner join conceptFiller on conceptFiller.related = conceptSlot.ID inner join Concept as c4 on c4.ID = conceptFiller.fillerconcept inner join Concept as c3 on conceptSlot.slotconcept = c3.ID -- Wulfraed Dennis Lee Bieber KD6MOG HTTP://wlfraed.home.netcom.com/ |
|
|
|
|
|||
|
|||
| Dennis Lee Bieber |
|
|
|
| |
![]() |
| Thread Tools | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Database Database Database Database | scott93727@gmail.com | Computer Information | 0 | 09-27-2012 02:43 AM |
| DataBase DataBase DataBase DataBase | scott93727@gmail.com | Computer Information | 0 | 09-26-2012 09:40 AM |
| CALL MANAGER EXPRESS NETWORK PLANNING | cricket | Cisco | 1 | 03-31-2005 09:53 PM |
| beginner needs help planning website | Peter | HTML | 10 | 02-07-2004 01:53 AM |
| IP planning for 2 connections to the ISP | Gary | Cisco | 2 | 10-19-2003 12:50 PM |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc..
SEO by vBSEO ©2010, Crawlability, Inc. |




