UNL Knowledge Base

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
 
(42 intermediate revisions by 2 users not shown)
Line 1: Line 1:
The '''UNL Knowledge Base''', or simply UNLKB, is a network structure where UWs are interconnected through any semantic relation of UNL. In that sense, the UNL KB comprises and extends the [[UNL Ontology]], which deals only with ontological relations. The UNLKB is claimed to improve the results of both the [[enconversion]] and the [[deconversion]] process, as it would provide them with extralinguistic information normally required for solving ambiguities, anaphora and co-reference in natural language analysis and generation. In what follows, we present the current specifications for the UNLKB. They are not mandatory but are required from those interested in using UNL Centre's and UNDL Foundation's tools.  
+
The '''UNL Knowledge Base''', or simply '''UNL<sup>KB</sup>''', constitutes a network structure where [[UW]]s are interconnected through [[Universal Relation]]s of UNL. Differently from the [[UNL Ontology]], which deals only with hierarchical relations ("icl" and "iof"), the UNL<sup>KB</sup> comprises any relation necessary to '''define''' a given UW. In that sense, the UNL<sup>KB</sup> contains and extends the UNL Ontology, and it is expected to include all the information normally available in ordinary dictionaries and ''thesauri''. The UNL<sup>KB</sup> is also a part of the [[UNL Memory]], a network structure that includes not only the essential (necessary) information about a concept but also the accidental information extracted from corpora.<br /><br />
 +
The UNL<sup>KB</sup> may be provided in two different formats:
 +
*Extended, in XML; or
 +
*Simplified, as a set of [[Grammar_Specs#Disambiguation_Rules|network disambiguation rules]]
  
== Syntax ==
 
  
The UNLKB is a plain text file with a single entry per line. UNLKB entries have the following format:
+
== Extended format ==
  
{|
+
UNL<sup>KB</sup> entries in extended format must have the following structure:
|<UNLKB entry>
+
|::=
+
|<binary relation>"="<degree of certainty>";"
+
|-
+
|<binary relation>
+
|::=
+
|<relation> "(" <source node> "," <target node> ")"
+
|-
+
|<relation>
+
|::=
+
|any relation of UNL
+
|-
+
|<source node>
+
|::=
+
|any existing UW
+
|-
+
|<target node>
+
|::=
+
|any existing UW
+
|-
+
|<degree of certainty>
+
|::=
+
|{0..255}
+
|}
+
  
Where:<br >
+
<relation name="RNAME" type="RTYPE" frequency="RFREQ">
0 = absolutely false <br >
+
  <source id="SID" attribute="ATT" lang="UNL" frequency="SFREQ" class="SCLASS">SOURCE</source>
255 = absolutely true <br >
+
  <target id="TID" attribute="ATT" lang="UNL" frequency="TFREQ" class="TCLASS">TARGET</target>
 +
</relation>
 +
 
 +
Where:<br />
 +
RNAME is the name of one existing UNL relation ("agt", "aoj", "obj", etc);<br />
 +
RTYPE is the type of the existing relation<br />
 +
RFREQ is the frequency of the relation TYPE between the SOURCE and the TARGET in the corpus;<br />
 +
SFREQ is the frequency of the SOURCE in the corpus;<br />
 +
TFREQ is the frequency of the TARGET in the corpus;<br />
 +
SID is a number used to identify the SOURCE;<br />
 +
TID is a number used to identify the TARGET;<br />
 +
ATT is one of the existing UNL attributes ("entry", "past", etc);<br />
 +
SCLASS is the general class of the SOURCE;<br />
 +
TCLASS is the general class of the TARGET;<br />
 +
SOURCE is the source node of the UNL relation; <br />
 +
TARGET is the target node of the UNL relation; <br />
 +
 
 +
=== XML Schema ===
 +
 
 +
<pre>
 +
<?xml version="1.0" encoding="utf-16"?>
 +
<xsd:schema attributeFormDefault="unqualified" elementFormDefault="qualified" version="1.0" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
 +
<xsd:element name="kb">
 +
  <xsd:complexType>
 +
    <xsd:sequence>
 +
      <xsd:element maxOccurs="unbounded" name="relation">
 +
        <xsd:complexType>
 +
          <xsd:sequence>
 +
            <xsd:element name="source">
 +
              <xsd:complexType>
 +
                <xsd:attribute name="id" type="xsd:unsignedLong" use="required" />
 +
                <xsd:attribute name="attribute" type="xsd:string" use="optional" />
 +
                <xsd:attribute name="lang" type="xsd:string" use="optional" />
 +
                <xsd:attribute name="frequency" type="xsd:int" use="optional"/>
 +
                <xsd:attribute name="class" type="xsd:string" use="optional"/>
 +
              </xsd:complexType>
 +
            </xsd:element>
 +
            <xsd:element name="target">
 +
              <xsd:complexType>
 +
                <xsd:attribute name="id" type="xsd:unsignedLong" use="required"/>
 +
                <xsd:attribute name="attribute" type="xsd:string" use="optional" />
 +
                <xsd:attribute name="lang" type="xsd:string" use="optional"/>
 +
                <xsd:attribute name="frequency" type="xsd:int" use="optional"/>
 +
                <xsd:attribute name="class" type="xsd:string" use="optional"/>
 +
              </xsd:complexType>
 +
            </xsd:element>
 +
          </xsd:sequence>
 +
          <xsd:attribute name="name" type="xsd:string" use="required"/>
 +
          <xsd:attribute name="type" type="xsd:string" use="optional"/>
 +
          <xsd:attribute name="frequency" type="xsd:int" use="optional"/>
 +
        </xsd:complexType>
 +
      </xsd:element>
 +
    </xsd:sequence>
 +
  </xsd:complexType>
 +
</xsd:element>
 +
</xsd:schema>
 +
</pre>
 +
 
 +
=== Example ===
 +
 
 +
<?xml version="1.0" encoding="utf-16"?>
 +
<kb>
 +
  <relation name="mod" frequency="2">
 +
  <source id="410" attribute="entry" lang="UNL" frequency="20" class="nou">book(icl>document)</source>
 +
  <target id="2243" lang="UNL" frequency="2" class="nou">republic(icl>form of government)</target>
 +
  </relation>
 +
  <relation name="mod" frequency="1">
 +
  <source id="410" attribute="entry" lang="UNL" frequency="20" class="nou">book(icl>document)</source>
 +
  <target id="466" lang="UNL" frequency="1" class="nou">certainty(icl>attribute)</target>
 +
  </relation>
 +
  <relation name="mod" frequency="1">
 +
  <source id="410" attribute="entry" lang="UNL" frequency="20" class="nou">book(icl>document)</source>
 +
  <target id="583" lang="UNL" frequency="2" class="nou">creation(icl>action)</target>
 +
  </relation>
 +
  <relation name="mod" frequency="1">
 +
  <source id="410" attribute="entry" lang="UNL" frequency="20" class="nou">book(icl>document)</source>
 +
  <target id="1539" lang="UNL" frequency="3" class="nou">lineage(icl>descendant)</target>
 +
  </relation>
 +
  <relation name="mod" frequency="1">
 +
  <source id="410" attribute="entry" lang="UNL" frequency="20" class="nou">book(icl>document)</source>
 +
  <target id="1566" lang="UNL" frequency="11" class="nou">love(icl>emotion)</target>
 +
  </relation>
 +
</kb>
 +
 
 +
== Simplified format ==
 +
 
 +
UNL<sup>KB</sup> entries in simplified format must have the structure of [[Grammar_Specs#Disambiguation_Rules|network disambiguation rules]], as follows:
 +
 
 +
RELATION(SOURCE;TARGET)=DC;
 +
 
 +
Where:<br />
 +
RELATION is the name of one existing UNL relation ("agt", "aoj", "obj", etc);<br />
 +
SOURCE is the source node of the UNL relation;<br />
 +
TARGET is the target node of the UNL relation; <br />
 +
DC is the degree of certainty (i.e., the likelihood of the relation between the SOURCE and the TARGET), ranging from 0 (impossible) to 255 (necessary)<br />
 +
The SOURCE and the TARGET nodes may be referred as:
 +
*constants (i.e., specific UWs), to be represented between double square brackets: <nowiki>[[103485997]]</nowiki>
 +
*a feature (attribute, value, or attribute-value pair) or set of features of a group of UWs: N, LEX=N, N&ABT
 +
*a relation or a set of relations: agt(N,V)
 +
 
 +
=== Examples ===
 +
 
 +
<nowiki>icl(<[[100001930]];[[100001740]])=1; (= a physical entity is a kind of entity)</nowiki><br />
 +
pof(N;V)=0; (= a nominal concept cannot be a part of a verbal concept)<br />
 +
pof(LEX=N;LEX=V)=0; (= a nominal concept cannot be a part of a verbal concept)<br />
 +
icl(ABT;^ABT)=0; (= an abstract concept cannot be a kind of non-abstract concept)<br />
 +
and(agt(;);^agt(;))=0; (= an agent relation may not be coordinated with a non-agent relation)<br />

Latest revision as of 19:12, 16 August 2013

The UNL Knowledge Base, or simply UNLKB, constitutes a network structure where UWs are interconnected through Universal Relations of UNL. Differently from the UNL Ontology, which deals only with hierarchical relations ("icl" and "iof"), the UNLKB comprises any relation necessary to define a given UW. In that sense, the UNLKB contains and extends the UNL Ontology, and it is expected to include all the information normally available in ordinary dictionaries and thesauri. The UNLKB is also a part of the UNL Memory, a network structure that includes not only the essential (necessary) information about a concept but also the accidental information extracted from corpora.

The UNLKB may be provided in two different formats:


Contents

Extended format

UNLKB entries in extended format must have the following structure:

<relation name="RNAME" type="RTYPE" frequency="RFREQ">
  <source id="SID" attribute="ATT" lang="UNL" frequency="SFREQ" class="SCLASS">SOURCE</source>
  <target id="TID" attribute="ATT" lang="UNL" frequency="TFREQ" class="TCLASS">TARGET</target>
</relation>

Where:
RNAME is the name of one existing UNL relation ("agt", "aoj", "obj", etc);
RTYPE is the type of the existing relation
RFREQ is the frequency of the relation TYPE between the SOURCE and the TARGET in the corpus;
SFREQ is the frequency of the SOURCE in the corpus;
TFREQ is the frequency of the TARGET in the corpus;
SID is a number used to identify the SOURCE;
TID is a number used to identify the TARGET;
ATT is one of the existing UNL attributes ("entry", "past", etc);
SCLASS is the general class of the SOURCE;
TCLASS is the general class of the TARGET;
SOURCE is the source node of the UNL relation;
TARGET is the target node of the UNL relation;

XML Schema

<?xml version="1.0" encoding="utf-16"?>
<xsd:schema attributeFormDefault="unqualified" elementFormDefault="qualified" version="1.0" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
 <xsd:element name="kb">
   <xsd:complexType>
     <xsd:sequence>
       <xsd:element maxOccurs="unbounded" name="relation">
         <xsd:complexType>
           <xsd:sequence>
             <xsd:element name="source">
               <xsd:complexType>
                 <xsd:attribute name="id" type="xsd:unsignedLong" use="required" />
                 <xsd:attribute name="attribute" type="xsd:string" use="optional" />
                 <xsd:attribute name="lang" type="xsd:string" use="optional" />
                 <xsd:attribute name="frequency" type="xsd:int" use="optional"/>
                 <xsd:attribute name="class" type="xsd:string" use="optional"/>
               </xsd:complexType>
             </xsd:element>
             <xsd:element name="target">
               <xsd:complexType>
                 <xsd:attribute name="id" type="xsd:unsignedLong" use="required"/>
                 <xsd:attribute name="attribute" type="xsd:string" use="optional" />
                 <xsd:attribute name="lang" type="xsd:string" use="optional"/>
                 <xsd:attribute name="frequency" type="xsd:int" use="optional"/>
                 <xsd:attribute name="class" type="xsd:string" use="optional"/>
               </xsd:complexType>
             </xsd:element>
           </xsd:sequence>
           <xsd:attribute name="name" type="xsd:string" use="required"/>
           <xsd:attribute name="type" type="xsd:string" use="optional"/>
           <xsd:attribute name="frequency" type="xsd:int" use="optional"/>
         </xsd:complexType>
       </xsd:element>
     </xsd:sequence>
   </xsd:complexType>
 </xsd:element>
</xsd:schema>

Example

<?xml version="1.0" encoding="utf-16"?>
<kb>
 <relation name="mod" frequency="2">
  <source id="410" attribute="entry" lang="UNL" frequency="20" class="nou">book(icl>document)</source>
  <target id="2243" lang="UNL" frequency="2" class="nou">republic(icl>form of government)</target>
 </relation>
 <relation name="mod" frequency="1">
  <source id="410" attribute="entry" lang="UNL" frequency="20" class="nou">book(icl>document)</source>
  <target id="466" lang="UNL" frequency="1" class="nou">certainty(icl>attribute)</target>
 </relation>
 <relation name="mod" frequency="1">
  <source id="410" attribute="entry" lang="UNL" frequency="20" class="nou">book(icl>document)</source>
  <target id="583" lang="UNL" frequency="2" class="nou">creation(icl>action)</target>
 </relation>
 <relation name="mod" frequency="1">
  <source id="410" attribute="entry" lang="UNL" frequency="20" class="nou">book(icl>document)</source>
  <target id="1539" lang="UNL" frequency="3" class="nou">lineage(icl>descendant)</target>
 </relation>
 <relation name="mod" frequency="1">
  <source id="410" attribute="entry" lang="UNL" frequency="20" class="nou">book(icl>document)</source>
  <target id="1566" lang="UNL" frequency="11" class="nou">love(icl>emotion)</target>
 </relation>
</kb>

Simplified format

UNLKB entries in simplified format must have the structure of network disambiguation rules, as follows:

RELATION(SOURCE;TARGET)=DC;

Where:
RELATION is the name of one existing UNL relation ("agt", "aoj", "obj", etc);
SOURCE is the source node of the UNL relation;
TARGET is the target node of the UNL relation;
DC is the degree of certainty (i.e., the likelihood of the relation between the SOURCE and the TARGET), ranging from 0 (impossible) to 255 (necessary)
The SOURCE and the TARGET nodes may be referred as:

  • constants (i.e., specific UWs), to be represented between double square brackets: [[103485997]]
  • a feature (attribute, value, or attribute-value pair) or set of features of a group of UWs: N, LEX=N, N&ABT
  • a relation or a set of relations: agt(N,V)

Examples

icl(<[[100001930]];[[100001740]])=1; (= a physical entity is a kind of entity)
pof(N;V)=0; (= a nominal concept cannot be a part of a verbal concept)
pof(LEX=N;LEX=V)=0; (= a nominal concept cannot be a part of a verbal concept)
icl(ABT;^ABT)=0; (= an abstract concept cannot be a kind of non-abstract concept)
and(agt(;);^agt(;))=0; (= an agent relation may not be coordinated with a non-agent relation)

Software