Grammar Specs

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(Basic symbols)
(Redirected page to Grammar)
 
(301 intermediate revisions by 2 users not shown)
Line 1: Line 1:
UNL-NL grammars are sets of rules for translating UNL expressions into natural language (NL) sentences and NL sentences into UNL expressions. They are normally unidirectional, i.e., the enconversion grammar (NL-to-UNL) is different from the deconversion grammar (UNL-to-NL), even though they share the same basic syntax. In order to standardize the language resources in the UNL framework, the UNDL Foundation recommends the adoption of the following specifications for both UNL-to-NL and NL-to-UNL grammars. This formalism, however, is not supported by the UNL Centre's tools, and it is only required by those interested in using UNDL Foundation's tools.
+
#REDIRECT [[Grammar]]
 
+
== Types of rules ==
+
 
+
In the UNL Grammar there are two basic types of rules:
+
 
+
;Transformation rules
+
:Used to generate natural language sentences out of UNL graphs and vice-versa.
+
;Disambiguation rules
+
:Used to improve the performance of transformation rules by constraining their applicability.
+
 
+
The Transformation Rules follow the very general formalism
+
 
+
α:=β;
+
 
+
where the left side α is a condition statement, and the right side β is an action to be performed over α.
+
 
+
The Disambiguation Rules, which were directly inspired by the UNL Centre's former co-occurrence dictionary and knowledge base, follows a slightly different formalism:
+
 
+
α=P;
+
 
+
where the left side α is a statement and the right side P is an integer from 0 to 255 that indicates the probability of occurrence of α. 
+
 
+
We present both types of rules and their role in the UNL System. We introduce, first, the basic symbols that are used both by transformation and disambiguation rules; next, we present the transformation rules and their several subtypes; and finally we present the disambiguation rules.
+
 
+
== Basic symbols ==
+
 
+
Both transformation and disambiguation rules use the same set of basic symbols:
+
 
+
<div align="center">
+
{| border="1" cellpadding="2"
+
|+
+
==== Basic symbols used in UNL grammar rules ====
+
!Symbol
+
!Definition
+
!Example
+
|-
+
|align=center|<nowiki>^</nowiki>
+
|not
+
|^a = not a
+
|-
+
|align=center|{ }
+
|or
+
|{a,b} = a or b
+
|-
+
|align=center|<nowiki>+</nowiki>
+
|add
+
|<nowiki>+</nowiki>a = add a
+
|-
+
|align=center|<nowiki>-</nowiki>
+
|remove
+
|<nowiki>-</nowiki>a = remove a
+
|-
+
|align=center|%
+
|placeholder for nodes
+
|%01
+
|-
+
|align=center|#
+
|placeholder for sub-NLWs
+
|#01
+
|-
+
|align=center|“ “
+
|string
+
|“buy” = “buy”
+
|-
+
|align=center|[ ]
+
|NLWs or UWs
+
|[book]
+
|}
+
</div>
+
<br />
+
<br />
+
 
+
== Transformation rules ==
+
 
+
Natural language sentences and UNL graphs are supposed to convey the same amount of information in different structures: whereas the former arranges data as an ordered list of words, the latter organizes it as a hypergraph. In that sense, translating from natural language into UNL and from UNL into natural language is ultimately a matter of transforming lists into networks and vice-versa.
+
+
Both EUGENE and IAN, the UNDLF generation and analysis tools, assume that such transformation should be carried out progressively, i.e., through a transitional data structure: the tree, which could be used as an interface between lists and networks. Accordingly, the UNL Grammar states seven different types of rules (LL, TT, NN, LT, TL, TN, NT), as indicated below:
+
 
+
*'''ANALYSIS''' (NL-UNL)
+
**LL - List Processing (list-to-list)
+
**LT - Surface-Structure Formation (list-to-tree)
+
**TT - Syntactic Processing (tree-to-tree)
+
**TN - Deep-Structure Formation (tree-to-network)
+
**NN - Semantic Processing (network-to-network)
+
 
+
*'''GENERATION''' (UNL-NL)
+
**NN - Semantic Processing (network-to-network)
+
**NT - Deep-Structure Formation (network-to-tree)
+
**TT - Syntactic Processing (tree-to-tree)
+
**TL - Surface-Structure Formation (tree-to-list)
+
**LL - List Processing (list-to-list)
+
 
+
The '''NL original sentence''' is supposed to be preprocessed, by the LL rules, in order to become an ordered list. Next, the resulting '''list structure''' is parsed with the LT rules, so as to unveil its '''surface syntactic structure''', which is already a tree. The tree structure is further processed by the TT rules in order to expose its inner organization, the '''deep syntactic structure''', which is supposed to be more suitable to the semantic interpretation. Then, this deep syntactic structure is projected into a semantic network by the TN rules. The resultant '''semantic network''' is then post-edited by the NN rules in order to comply with UNL standards and generate the '''UNL Graph'''.
+
 
+
The reverse process is carried out during natural language generation. The '''UNL graph''' is preprocessed by the NN rules in order to become a more easily tractable semantic network. The resulting '''network structure''' is converted, by the NT rules, into a syntactic structure, which is still distant from the surface structure, as it is directly derived from the semantic arrangement. This '''deep syntactic structure''' is subsequently transformed into a '''surface syntactic structure''' by the TT rules. The surface syntactic structure undergoes many other changes according to the TL rules, which generate a NL-like '''list structure'''. This list structure is finally realized as a '''natural language sentence''' by the LL rules.
+
 
+
=== Network-to-Network Rules ===
+
 
+
The network-to-network rules (NN) are used for processing networks, both in analysis and in generation. During analysis, these rules are used for post-editing the semantic network structure derived from the syntactic module in order to generate the UNL graph; in generation, they are used for pre-editing the UNL graph, transforming it into a semantic network that would be more appropriate for sentence generation.
+
 
+
The NN rules follow the general syntax below:
+
 
+
<NN RULE>::= <SEM>(<SEM>)*”:=”[{”-“,”+“}]<SEM>([{”-“,”+“}]<SEM>)*”;”
+
<SEM>::={agt,aoj,...,via}”(“<NODE>”;”<NODE>”)”
+
<NODE>:= relations, placeholders, indexes, NLWs, UWs, attributes or values
+
 
+
There are 4 different subtypes of NN rules:
+
 
+
<div align="center">
+
{| cellpadding="5" border="1"
+
|+
+
==== NN rules ====
+
!ACTION
+
!RULE
+
|-
+
|ADD RELATION
+
|SEM(A;B):=+SEM(C;D);
+
|-
+
|DELETE RELATION
+
|SEM(A;B):=-SEM(A,B);
+
|-
+
|REPLACE RELATION
+
|SEM(A;B):=SEM(C;D);
+
|-
+
|MERGE RELATION
+
|SEM(A;B)SEM(C;D):=SEM(E;F);
+
|-
+
|DIVIDE RELATION
+
|SEM(A;B):=SEM(C;D)SEM(E;F);
+
|}
+
Where SEM is any of the existing UNL relations, and A, B, C, D, E and F are variables.
+
</div> 
+
 
+
=== Tree-to-Tree Rules ===
+
 
+
The tree-to-tree rules (TT) are used for processing trees, both in analysis and in generation. During analysis, these rules are used for revealing the deep structure out of the surface structure; in generation, they are used for transforming the deep into the surface syntactic structure.
+
The TT rules follow the general syntax below:
+
 
+
<nowiki><TT RULE></nowiki>::= <SYN>(<SYN>)*”:=”[{”-“,”+“}]<SYN>([{”-“,”+“}]<SYN>)*”;”
+
<SYN>::={syntactic relation}”(“<NODE>”(“;”<NODE>”)*)”
+
<NODE>:= relations, placeholders, indexes, NLWs, UWs, attributes or values
+
 
+
Syntactic relations are n-ary: they can have as many arguments (nodes) as necessary.
+
 
+
There are 5 different subtypes of TT rules:
+
 
+
<div align="center">
+
{|cellpadding="5" border="1"
+
|+
+
==== TT rules ====
+
!ACTION
+
!RULE
+
|-
+
|ADD RELATION
+
|SYN(A;B):=+SYN(C;D) ;
+
|-
+
|DELETE RELATION
+
|SYN(A;B):=-SYN(A;B);
+
|-
+
|REPLACE RELATION
+
|SYN(A;B):=SYN(C;D);
+
|-
+
|MERGE RELATION
+
|SYN(A;B)SYN(C;D):=SYN(E;F);
+
|-
+
|DIVIDE RELATION
+
|SYN(A;B):=SYN(C;D)SYN(E;F);
+
|}
+
Where SYN is a syntactic relation, and A, B, C, D, E and F are variables.
+
 
+
 
+
As syntactic relations are n-ary, the REPLACE RELATION may also be used to ADD or DELETE nodes.
+
 
+
 
+
{|border="1" cellpadding="5"
+
|+
+
==== Special types of TT replace relations ====
+
!ACTION
+
!RULE
+
|-
+
|ADD NODE
+
|SYN(A;B):=SYN(A;B;C);
+
|-
+
|DELETE NODE
+
|SYN(A;B):=SYN(A);
+
|-
+
|}
+
 
+
Where SYN is a syntactic relation, and A, B and C are variables.
+
</div>
+
 
+
=== List-to-List Rules ===
+
 
+
The list-to-list (LL) rules are used for processing lists, both in analysis and in generation. In analysis, these rules are used for pre-editing the natural language sentence and preparing the input to the syntactic module; in generation, they are used for post-editing the output of the syntactic module and generating the natural language sentence.
+
 
+
The LL rules follow the syntax below:
+
<LL RULE>::= “(”<NODE>”)”(”(”<NODE>”)”)*”:=”[{”+”,”-“}]“(”<NODE>”)”(“(”<NODE>”)”)*”;”
+
<NODE>:= relations, placeholders, indexes, NLWs, UWs, attributes or values
+
 
+
There are 5 different subtypes of LL rules:
+
 
+
<div align="center">
+
{|cellpadding="5" border="1"
+
|+
+
====LL rules====
+
!ACTION
+
!RULE
+
|-
+
|ADD
+
|(A):=(A)(B); or (A):=(B)(A);
+
|-
+
|DELETE
+
|(A):=-(A);
+
|-
+
|REPLACE
+
|(A):=(B);
+
|-
+
|MERGE
+
|(A)(B):=(C);
+
|-
+
|DIVIDE
+
|(A):=(B)(C);
+
|}
+
Where A, B and C are variables.
+
</div>
+
 
+
=== List-to-Tree Rules ===
+
 
+
The list-to-tree (LT) rules are used to parse the list structure into a (surface) tree structure. It is used only in analysis, and all LT rules follow the syntax below:
+
 
+
<LT RULE>::= “(”<NODE>”)”(“(”<NODE>”)”):=<SYN>(“<SYN>”)*”;”
+
<SYN>::={syntactic relation}”(“<NODE>”;”<NODE>”)”
+
<NODE>:= relations, placeholders, indexes, NLWs, UWs, attributes or values
+
 
+
There is a single type of LT rule:
+
 
+
<div align="center">
+
{|cellpadding="5" border="1"
+
|+
+
==== LT rule ====
+
!ACTION
+
!RULE
+
|-
+
|REPLACE
+
|(A):=SYN(B;C);
+
|}
+
Where SYN is a syntactic relation, and A, B and C are variables
+
</div>
+
 
+
=== Tree-to-List Rules ===
+
 
+
The tree-to-list (TL) rules are used to linearize the (surface) tree structure into a list structure. It is used only in generation, and all TL rules follow the syntax below:
+
 
+
<TL RULE>::= <SYN>(“<SYN>”)*:=“(”<NODE>”)”(“(”<NODE>”>”)”;”
+
<SYN>::={syntactic relation}”(“<NODE>”;”<NODE>”)”
+
<NODE>:= relations, placeholders, indexes, NLWs, UWs, attributes or values
+
 
+
There is a single type of TL rule:
+
 
+
<div align="center">
+
{|cellpadding="5" border="1"
+
|+
+
==== TL rule ====
+
!ACTION
+
!RULE
+
|-
+
|REPLACE
+
|SYN(A;B):=(C);
+
|}
+
Where SYN is a syntactic relation and A, B and C are variables
+
</div>
+
 
+
=== Tree-to-Network Rules ===
+
 
+
The tree-to-network (TN) rules derive a semantic network out of a syntactic tree. It is used only in analysis, and all TN rules follow the syntax below:
+
 
+
<TN RULE>::= <SYN>”(“<SYN>)*”:=”<SEM>(<SEM>)*”;”
+
<SYN>::={syntactic relation}”(“<NODE>”;”<NODE>”)”
+
<SEM>::={agt,aoj,...,via}”(“<NODE>”;”<NODE>”)”
+
<NODE>:= relations, placeholders, indexes, NLWs, UWs, attributes or values
+
 
+
There is a single type of TN rule:
+
 
+
<div align="center">
+
{|cellpadding="5" border="1"
+
|+
+
==== TN rule ====
+
!ACTION
+
!RULE
+
|-
+
|REPLACE
+
|SYN(A;B):=SEM(C;D);
+
|}
+
Where SYN is a syntactic relation, SEM is a semantic relation, and A, B, C and D are variables.
+
</div>
+
 
+
=== Network-to-Tree Rules ===
+
 
+
The network-to-tree (NT) rules reorganizes the network structure as a deep tree structure. It is used only in generation, and all the NT rules follow the syntax below:
+
<NT RULE>::= <SEM>”(“<SEM>)*”:=”<SYN>(<SYN>)*”;”
+
<SEM>::={agt,aoj,...,via}”(“<NODE>”;”<NODE>”)”
+
<SYN>::={syntactic relation}”(“<NODE>”;”<NODE>”)”
+
<NODE>:= relations, placeholders, indexes, NLWs, UWs, attributes or values
+
There is a single type of TN rule:
+
 
+
<div align="center">
+
{|cellpadding="5" border="1"
+
|+
+
==== TN rule ====
+
!ACTION
+
!RULE
+
|-
+
|REPLACE
+
|SEM(A;B):=SYN(C;D);
+
|}
+
Where SYN is a syntactic relation, SEM is a semantic relation, and A, B, C and D are variables.
+
</div>
+
 
+
== General Properties of Transformation Rules ==
+
 
+
;PRIORITY
+
:Rules should be applied serially, according to the order defined in the grammar. The first rule will be the first to be applied, the second will the second, and so on.
+
 
+
 
+
;RECURSIVENESS
+
:Rules should be applied recursively as long as their conditions are true.
+
 
+
 
+
;COMPREHENSIVENESS
+
:Grammars should be applied comprehensively as long as there is at least one applicable rule.
+
 
+
 
+
;ACTION
+
:The rules may add or delete values to the source and the target nodes, but only in the right side items:
+
:::agt(a;b):=agt(+c;);
+
:::agt(a;b):=agt(;-b);
+
 
+
 
+
;CONSERVATION
+
:Rules affect only the information clearly specified. No relation, node or feature is deleted unless explicitly informed.<br />
+
:For instance, in the examples below, the source node of the “agt” relation preserves, in all cases, the value “a”. The only change concerns the feature “c”, which is added to the source node of the “agt” in the first two cases; and the feature “b”, which is deleted from the target node in the third case.
+
:::agt(a;b):=agt(c;);
+
:::agt(a;b):=agt(+c;);
+
:::agt(a;b):=agt(;-b);
+
:In any case, the ADD and DELETE rules (i.e., when the right side starts with “+”or “-“) preserve the items in the left side, except for the explicitly deleted ones:
+
:::INPUT: agt(a;b) obj(a;c) tim(a;d)
+
:::RULE: agt(a;b) ^mod(a;e):=+mod(a;e);
+
:::OUTPUT: agt(a;b) obj(a;c), tim(a;d), mod(a;e)
+
:or
+
:::INPUT: agt(a;b) obj(a;c) tim(a;d)
+
:::RULE: agt(a;b):=-agt(a;b);
+
:::OUTPUT: obj(a;c) tim(a;d)
+
:The REPLACE, MERGE and DIVIDE rules affect only their designated scopes. In that sense, NN may only replace, merge or divide semantic relations; TT may only replace, merge or divide syntactic relations; and LL may only replace, merge or divide list nodes. All other information is preserved, unless explicitly informed.
+
:::INPUT: agt(a;b) cob(a;c)
+
:::RULE: cob(;):=obj(;);
+
:::OUTPUT: agt(a;b) obj(a;c)
+
:or
+
:::INPUT: agt(a;b) cob(a;c)
+
:::RULE: cob(a;):=obj(-a,+d;);
+
:::OUTPUT: agt(a;b) obj(d;c)
+
 
+
 
+
;CONJUNCTION
+
:Both the left and the right side of the rule may have as many items as necessary, as exemplified below:<br />
+
:::SEM(A;B)SEM(C;D)SEM(E;F):=SEM(G;H)SEM(I;J)SEM(K;L);
+
 
+
 
+
;DISJUNCTION
+
:The left side of the rules may bring disjuncts, but not the right side.
+
:::{SEM(A;B),SEM(C;D)},^SEM(E;F):=+SEM(E;F); 
+
:::SEM(A;B),{SEM(C;D),SEM(E;F)}:=-SEM(A;B); 
+
:::agt(VER,{V01,V02};NOU,^SNG}:=;
+
 
+
 
+
;EXTENDIBILITY
+
:The left side of the rules may bring wildcard characters, such as the ones indicated in the table of Basic Symbols.
+
:::“cit?(??)”:=[[city(icl>metropolis)]];
+
:::“cit*”:=[[city(icl>metropolis)]];
+
:::“ #### “ :=+YEAR;
+
 
+
 
+
;CONCISION
+
:Rules should be as small as possible. In that sense, the source and the target nodes may be simple placeholders or indexes:
+
:::cob(;):=obj(;);
+
:::tim(%01;<nowiki>[[in]]</nowiki>),obj(<nowiki>[[in]]</nowiki>;%02):=tim(%01;%02);
+
:::tim(VER,%01;<nowiki>[[in]]</nowiki>),obj(<nowiki>[[in]]</nowiki>;NOU,%02):=tim(%01;%02);
+
:By default, the first node to appear in the left side is %01, the second is %02, and so on. The same to the right side. Therefore, the last rule above may be rewritten as:
+
:::tim(VER;[[in]]),obj([[in]];NOU):=tim(;%04);
+
:In the DELETE rules, the right side may be omitted in case of deletion of the entire left side:
+
:::obj(PRE;):=;
+
 
+
 
+
;READABILITY
+
:There can be blank spaces between variables and symbols. Comments can be added after the “;”.
+
:::obj ( ; ) := ; this rule deletes every “obj” relation.
+
 
+
 
+
;SCOPE GENERATION
+
:Inside a relation, nodes can be relations (i.e., hypernodes) as well.
+
:::SEM(A;SEM(C;D)):=SEM(A;C),SEM(C;D));
+
:::SEM(A;C),SEM(C;D):=SEM(A;SEM(C;D));
+
 
+
 
+
;COMMUTATIVITY
+
:Inside the same side of the rule, the order of the factors does not affect the end result, except for list-processing rules (LL, LT and TL).
+
:::SEM(A;B):=SEM(C;D)SEM(E;F);    =    SEM(A;B):= SEM(E;F)SEM(C;D);
+
:::SYN(A;B):=SYN(C;D)SYN(E;F);    =    SYN(A;B):= SYN(E;F)SYN(C;D);
+
:But:
+
:::(A):=(B)(C);      IS DIFFERENT FROM  (A):=(C)(B);   
+
:::SYN(A;B):=(C)(D);  IS DIFFERENT FROM  SYN(A;B):=(D)(C);
+
:::(C)(D):=SYN(A;B);  IS DIFFERENT FROM  (D)(C):=SYN(A;B);
+
:Additionally, the order of the features inside a relation does not affect the end result, but the order of the nodes is non-commutative.
+
:::SEM( VER,TRA ; NOU,MCL )      =    SEM( TRA,VER ; MCL,NOU )
+
:But:
+
:::SEM( VER,TRA ; NOU,MCL)    IS DIFFERENT FROM  SEM( NOU,MCL ; VER,TRA )
+
 
+
 
+
;DICTIONARY ATTRIBUTES
+
:Dictionary attributes can be used as variables.
+
:::SYN( A,^num ; B,num ):=SYN( A,num=%02; %02 );
+
 
+
 
+
;DICTIONARY RULES
+
:Dictionary rules can be triggered by “!”.
+
:::(@pl, pl:=”feet”):=(!pl,-@pl);
+
:::(@pl, pl:=”oo”:”ee”):=(!pl,-@pl);
+
:::(@pl, pl:=”y”>”ies”):=(!pl,-@pl); 
+
:::(@pl, pl:=1>”ies”):=(!pl,-@pl); 
+
:::(@pl, pl=*):=(!pl,-@pl); 
+
+
 
+
;NLW SPLITTING
+
:Natural language words can be split in “#” breakpoints.
+
:::(A):=( A#01,+ver ) , ( A#02,+phv );
+
:::(A) , (B) , (C) := (B#01) , (A) , (B#02) , (C) , (B#03);
+
 
+
== Disambiguation Rules ==
+
 
+
Apart from the Transformation Rules, the UNL Grammar also comprises Disambiguation Rules, which are optional and may be used to:
+
*Prevent wrong lexical choices;
+
*Provoke best matches;
+
*Check the consistency of the graphs, trees and lists.
+
The formalism here presented is directly inspired by UNLC former co-occurrence dictionary and knowledge-base. The structure of the rule is as follows:
+
+
STATEMENT=P;
+
 
+
Where<br />
+
STATEMENT is any network, tree or list relation; and<br />
+
F, which can range from 0 (impossible) to 255 (possible), is the probability of occurrence of the STATEMENT<br />
+
 
+
Examples:<br />
+
:agt(VER;VER)=0;
+
:<PRES,@past>=0;
+
 
+
Disambiguation Rules are used, both in natural language analysis and generation, to induce (or prohibit) some rules or lexical choices. In what follows, we PRESENT an example of Disambiguation Rule and its role in the UNL System.
+
 
+
:UNL DICTIONARY
+
::[the] {} “” (pos:DEF) <EN,0,0>;
+
::[book] {} “book(icl>document)” (pos:NOU) <EN,0,0>;
+
::[book] {} “to book(icl>to reserve)” (pos:VER) <EN,0,0>;
+
:INPUT
+
::The book is on the table.
+
:DISAMBIGUATION RULE
+
::<DEF> & <VER> = 0;
+
:FUNCTIONING
+
::Before the Disambiguation Rule, there would be two candidate UWs for the NLW “book”. The Disambiguation Rule prevents the second alternative, which has the value “VER”, to apply.
+

Latest revision as of 18:11, 19 August 2013

  1. REDIRECT Grammar
Software