|  |   | 
| (20 intermediate revisions by the same user not shown) | 
| Line 1: | Line 1: | 
|  | UNL grammars are sets of rules for translating UNL expressions into natural language (NL) sentences and NL sentences into UNL expressions. They are normally unidirectional, i.e., the [[UNL-ization]] grammar (NL-to-UNL) is different from the [[NL-ization]] grammar (UNL-to-NL), even though they share the same basic syntax. 
 |  | #REDIRECT [[Grammar]] | 
|  |   |  | 
|  | == Basic symbols ==
 |  | 
|  |   |  | 
|  | {| border="1" cellpadding="2" align=center
 |  | 
|  | |+Basic symbols used in UNL grammar rules
 |  | 
|  | !Symbol
 |  | 
|  | !Definition
 |  | 
|  | !Example
 |  | 
|  | |-
 |  | 
|  | |align=center|<nowiki>^</nowiki>
 |  | 
|  | |not
 |  | 
|  | |^a = not a
 |  | 
|  | |-
 |  | 
|  | |align=center|{ | }
 |  | 
|  | |or
 |  | 
|  | |<nowiki>{a|b}</nowiki> = a or b
 |  | 
|  | |-
 |  | 
|  | |align=center|%
 |  | 
|  | |index for nodes, attributes and values
 |  | 
|  | |%x (see [[#Indexes|below]])
 |  | 
|  | |-
 |  | 
|  | |align=center|#
 |  | 
|  | |index for sub-NLWs
 |  | 
|  | |#01 (see [[#Indexes|below]])
 |  | 
|  | |-
 |  | 
|  | |align=center|=
 |  | 
|  | |attribute-value assignment
 |  | 
|  | |POS=NOU
 |  | 
|  | |-
 |  | 
|  | |align=center|!
 |  | 
|  | |rule trigger
 |  | 
|  | |!PLR
 |  | 
|  | |-
 |  | 
|  | |align=center|&
 |  | 
|  | |merge operator
 |  | 
|  | |%x&%y
 |  | 
|  | |-
 |  | 
|  | |align=center|?
 |  | 
|  | |dictionary lookup operator
 |  | 
|  | |?[a]
 |  | 
|  | |-
 |  | 
|  | |align=center|“ “
 |  | 
|  | |string
 |  | 
|  | |"went"
 |  | 
|  | |-
 |  | 
|  | |align=center|[ ]
 |  | 
|  | |natural language entry (headword)
 |  | 
|  | |[go]
 |  | 
|  | |-
 |  | 
|  | |align=center|[[ ]]
 |  | 
|  | |UW
 |  | 
|  | |[[to go(icl>to move)]]
 |  | 
|  | |-
 |  | 
|  | |align=center|( )
 |  | 
|  | |node
 |  | 
|  | |(a)
 |  | 
|  | |-
 |  | 
|  | |align=center|//
 |  | 
|  | |regular expression
 |  | 
|  | |/a{2,3}/ = aa,aaa
 |  | 
|  | |}
 |  | 
|  |   |  | 
|  | ;The differences between "", [] and [[]]
 |  | 
|  | :Double quotes are always used to represent strings: "a" will match only the string "a"
 |  | 
|  | :Simple square brackets are always used to represent natural language entries (headwords) in the dictionary: [a] will match the node associated to the entry [a] retrieved from the dictionary, no matter its current realization, which may be affected by other rules (the original [a] may have been replaced, for instance, by "b", but will still be indexed to the entry [a])
 |  | 
|  | :Double square brackets are always used to represent UWs: <nowiki>[[a]]</nowiki> will match the node associated to the UW <nowiki>[[a]]</nowiki>
 |  | 
|  |   |  | 
|  | ;Predefined values (assigned by default)
 |  | 
|  | :SCOPE - Scope
 |  | 
|  | :SHEAD - Sentence head (the beginning of a sentence)
 |  | 
|  | :STAIL - Sentence tail (the end of a sentence)
 |  | 
|  | :CHEAD - Scope head (the beginning of a scope)
 |  | 
|  | :CTAIL - Scope tail (the end of a scope)
 |  | 
|  | :TEMP - Temporary entry (entry not found in the dictionary)
 |  | 
|  | :DIGIT - Any sequence of digits (i.e.: 0,1,2,3,4,5,6,7,8,9)
 |  | 
|  |   |  | 
|  | == Basic concepts ==
 |  | 
|  | === Nodes ===
 |  | 
|  | A node is the most elementary unit in the grammar. It is the result of the [[tokenization]] process, and corresponds to the notion of "lexical item", to be represented by dictionary entries. At the surface level, a natural language sentence is considered a list of nodes, and a UNL graph a set of relations between nodes. Any node is a vector (one-dimensional array) containing the following necessary elements:
 |  | 
|  | *a string, to be represented between "quotes", which expresses the actual state of the node;
 |  | 
|  | *a headword, to be represented between [square brackets], which expresses the original value of the node in the dictionary;
 |  | 
|  | *a UW, to be represented between <nowiki>[[double square brackets]]</nowiki>, which expresses the UW value of the node;
 |  | 
|  | *a feature or set of features, which express the features of the node;
 |  | 
|  | *an [[#Indexes|index]], preceded by the symbol %, which is used to reference the node;
 |  | 
|  | Examples of nodes are
 |  | 
|  | *("ing") (a node making reference only to its actual string value)
 |  | 
|  | *([book]) (a node making reference only to its headword,i.e., its original state in the dictionary)
 |  | 
|  | *([[book(icl>document)]]) (a node making reference only to its UW value)
 |  | 
|  | *(NUM) (a node making reference only to one of its features)
 |  | 
|  | *(POS=NOU) (a node making reference only to one of its features in the attribute-value pair format)
 |  | 
|  | *(%x) (a node making reference only to its unique index)
 |  | 
|  | *("string",[headword],<nowiki>[[UW]]</nowiki>,feature1,feature2,...,attribute1=value1,attribute2=value2,...,%x) (complete node)
 |  | 
|  | ==== Properties of nodes ====
 |  | 
|  | ;Nodes are enclosed between (parentheses)
 |  | 
|  | :("a") is a node
 |  | 
|  | :"a" is not a note
 |  | 
|  | ;The elements of a node are separated by comma
 |  | 
|  | :("a",[a],<nowiki>[[a]]</nowiki>,A,B,A=C,%a)
 |  | 
|  | ;The order of elements inside a node is not relevant.
 |  | 
|  | :("a",[a],<nowiki>[[a]]</nowiki>,A,B,A=C,%a) is the same as (<nowiki>[[a]]</nowiki>,B,A,"a",[a],A=C,%a)
 |  | 
|  | ;Nodes may have one single string, headword, UW and index, but may have as many features as necessary
 |  | 
|  | :<strike>("a","b")</strike> (a node may not contain more than one string)
 |  | 
|  | :<strike>([a],[b])</strike> (a node may not contain more than one headword)
 |  | 
|  | :<strike>(<nowiki>[[a]]</nowiki>,<nowiki>[[b]]</nowiki>)</strike> (a node may not contain more than one UW) 
 |  | 
|  | :<strike>(%a,%b)</strike> (a node may not contain more than one index)
 |  | 
|  | :(A,B,C,D,...,Z) (a node may contain as many features as necessary)
 |  | 
|  | ;A node may be referred by any of its elements
 |  | 
|  | :("a") refers to all nodes where actual string = "a"
 |  | 
|  | :([a]) refers to all nodes where headword = [a]
 |  | 
|  | :(<nowiki>[[a]]</nowiki>) refers to all nodes where UW = <nowiki>[[a]]</nowiki>
 |  | 
|  | :(A) refers to all nodes having the feature A
 |  | 
|  | :("a",[a],<nowiki>[[a]]</nowiki>,A) refers to all nodes having the feature A where string = "a" and headword = [a] and UW = <nowiki>[[a]]</nowiki>
 |  | 
|  | ;Nodes are automatically indexed according to a position-based system if no explicit index is provided (see [[#Indexes|Index]])
 |  | 
|  | :("a")("b") is actually ("a",%01)("b",%02)
 |  | 
|  | ;[[Regular expressions]] may be used to make reference to any element of the node, except the index
 |  | 
|  | :("/a{2,3}/") refers to all nodes where string is a sequence of 2 to 3 characters "a"
 |  | 
|  | :([/a{2,3}/]) refers to all nodes where headword is a sequence of 2 to 3 characters "a"
 |  | 
|  | :([[/a{2,3}/]]) refers to all nodes where UW is a sequence of 2 to 3 characters "a"
 |  | 
|  | :(/a{2,3}/) refers to all nodes having a feature that is a sequence of 2 to 3 characters "a"
 |  | 
|  | ;Nodes may contain disjoint features enclosed between {braces} and separated by comma
 |  | 
|  | :({A|B}) refers to all nodes having the feature A OR B
 |  | 
|  | ;Node features may be expressed as simple attributes, or attribute-value pairs:
 |  | 
|  | :(MCL) - feature as an attribute: refers to all nodes having the feature MCL
 |  | 
|  | :(GEN=MCL) - feature as an attribute-value pair, which is the same as (GEN,MCL): refers to all nodes having the features GEN and MCL.
 |  | 
|  | Attribute-value pairs may be used to create co-reference between different nodes (as in agreement):
 |  | 
|  | :(%x,GEN)(%y,GEN=%x) - the value of the attribute GEN of the node %x is the same of the attribute GEN of the node %y (see [[#Index|Index]])
 |  | 
|  |   |  | 
|  | === Relations ===
 |  | 
|  | In order to form a natural language sentence or a UNL graph, nodes are inter-related by relations. In the UNL framework, there can be three different types of relations:
 |  | 
|  | *the '''linear''' relation L expresses the surface structure of natural language sentences
 |  | 
|  | *'''syntactic''' relations express the deep (tree) structure of natural language sentences
 |  | 
|  | *'''semantic''' relations express the structure of UNL graphs
 |  | 
|  | ==== Properties of relations ====
 |  | 
|  | ;The linear relation is always binary and is represented in two possible formats:
 |  | 
|  | *L(%x;%y), where L is the invariant name of the linear relation, and %x and %y are nodes; or
 |  | 
|  | *(%x)(%y)
 |  | 
|  | ;Syntactic relations are not predefined, although we have been using a set of binary relations based on the [[X-bar theory]].
 |  | 
|  | ;Semantic relations constitute a predefined and closed set that can be found [[relations|here]].
 |  | 
|  | ;Syntactic and semantic relations are represented in the same way:
 |  | 
|  | *rel(%x;%y), where "rel" is the name of the relation, %x is the source node, and %y is the target node
 |  | 
|  | ;Arguments of linear, syntactic and semantic relations are not commutative.
 |  | 
|  | :The order of the elements in a relation affects the result:
 |  | 
|  | ::(%x)(%y) is different from (%y)(%x)
 |  | 
|  | ::relation(%x;%y) is different from relation(%y;%x)
 |  | 
|  | ;Linear and semantic relations are always binary; syntactic relations may be n-ary:
 |  | 
|  | :L(%x;%y) - linear relation
 |  | 
|  | :agt(%x;%y) - semantic relation
 |  | 
|  | :VH(%x) - unary syntactic relation
 |  | 
|  | :VC(%x;%y) - binary syntactic relation
 |  | 
|  | :XX(%x;%y;%z) - possible ternary syntactic relation
 |  | 
|  | ;Inside each relation, nodes are isolated by semicolon (;). 
 |  | 
|  | :VC(%x;%y)
 |  | 
|  | :<strike>VC(%x,%y)</strike>
 |  | 
|  | ;Inside each relation, nodes may be referenced by any of its elements, isolated by comma (,):
 |  | 
|  | :("a")([b]) - linear relation between a node where string = "a" and another node where headword = [b]
 |  | 
|  | :L(<nowiki>[[c]]</nowiki>;D) - linear relation between a node where UW = <nowiki>[[c]]</nowiki> and another node having the feature D
 |  | 
|  | :VC(%a;%b) - syntactic relation between a node where index = %a and another node where index = %b
 |  | 
|  | :agt("a",[a],<nowiki>[[a]]</nowiki>,A;"b",[b],<nowiki>[[b]]</nowiki>,B) - semantic relation between a node having the feature A where string = "a" AND headword "a" AND UW = <nowiki>[[a]]</nowiki> AND another node having the feature B where string = "b" AND headword = [b] AND UW = <nowiki>[[b]]</nowiki>
 |  | 
|  | ;Relations may be conjoined through juxtaposition:
 |  | 
|  | :("a")("b")("c") - two linear relations: one between ("a") and ("b") AND other between ("b") and ("c")
 |  | 
|  | :agt(%x;%y)obj(%x;%z) - two semantic relations: one between (%x) and (%y) AND other between (%x) and (%z)
 |  | 
|  | :<strike>VC([a];[b]),VC([a];[c])</strike> - conjoined relations must not be isolated by comma
 |  | 
|  | ;Relations may be disjoined through {braces}
 |  | 
|  | :{("a")|("b")}("c") - either ("a")("c") or ("b")("c")
 |  | 
|  | :{agt(%x;%y)|exp(%x;%y)}obj(%x;%z) - either agt(%x;%y)obj(%x;%z) or exp(%x;%y)obj(%x;%z)
 |  | 
|  | ;Syntactic and semantic relations may be replaced by regular expressions
 |  | 
|  | :/.{2,3}/(%x;%y) - any relation made of two or three characters between %x and %y
 |  | 
|  |   |  | 
|  | === Hyper-nodes ===
 |  | 
|  | Nodes may contain one or more relations. In this case, they are said to be "hyper-nodes", and represent scopes or sub-graphs. As any node, hyper-nodes contain a string, a headword, a UW, an index and features, of which the internal relations are a special type. Examples of hyper-nodes are the following:
 |  | 
|  | *(("a")("b")) - a hyper-node containing a linear relation between the nodes ("a") and ("b")
 |  | 
|  | *(VC(%x;%y)VA(%x;%z)) - a hyper-node containing two syntactic relations: VC(%x;%y)AND VA(%x;%z)
 |  | 
|  | *(agt([a];[b])obj([a];[c])) - a hyper-node containing two semantic relations: agt([a];[b]) AND obj([a];[c])
 |  | 
|  | *(([kick],V)([the],D)([bucket],N),V,NTST) - a hyper-node having the features N and NTST and containing two linear relations: one between the nodes ([kick],V) and ([the],D), and other between ([the],D) and [bucket],N)
 |  | 
|  | *(([kick],V)([the],D)([bucket],N),"kick the bucket",<nowiki>[[die]]</nowiki>,V,NTST) - the same as before, except for the fact that the hyper-node has string = "kick the bucket" and UW = <nowiki>[[die]]</nowiki>
 |  | 
|  | Hyper-nodes may also contain internal hyper-nodes:
 |  | 
|  | *((("a")("b"))("c")) - a hyper-node containing a linear relation between the hyper-node (("a")("b")) and the node ("c")
 |  | 
|  | ==== Properties of hyper-nodes ====
 |  | 
|  | ;As any node, hyper-nodes are expressed between (parentheses)
 |  | 
|  | :(("a")("b"))
 |  | 
|  | ;As any node, hyper-nodes may have one single string, one single headword and one single UW, but may have as many features and internal relations as necessary
 |  | 
|  | :(([kick],V)([the],D)([bucket],N),"kick the bucket",[kick the bucket],<nowiki>[[die]]</nowiki>,V,NTST)
 |  | 
|  | ;As any node, hyper-nodes may be referenced by any of its elements, including internal relations
 |  | 
|  | :(([kick],V)) - refers to any hyper-node containing the node ([kick],V)
 |  | 
|  | :(([the],D)([bucket],N)) - refers to any hyper-node containing a linear relation between ([the],D) AND ([bucket],N)
 |  | 
|  | :(([kick],D),([bucket],N)) - refers to any hyper-node containing the nodes ([kick],V) AND ([bucket],N)
 |  | 
|  | ;When a hyper-node is deleted, all its internal relations are deleted as well
 |  | 
|  | :(([kick],V)([the],D)([bucket],N)):=; (the hyper-node is deleted, as well as the relations ([kick],V)([the],D) AND ([the],D)([bucket],N))
 |  | 
|  |   |  | 
|  | === Hyper-relations ===
 |  | 
|  | Relations may have relations as arguments. In this case, they are said to be "hyper-relations". Examples of hyper-relations are the following:
 |  | 
|  | *XP(XB(%a;%b);%c) - a syntactic relation XP between the syntactic relation XB(%a;%b) and the node %c
 |  | 
|  | *and(agt([a];[b]);agt([a];[c])) - a semantic relation "and" between the semantic relations agt([a];[b]) AND agt([a];[c])
 |  | 
|  | ==== Properties of hyper-relations ====
 |  | 
|  | ;A hyper-relation may have one single relation as each argument
 |  | 
|  | *XP(XB(%a;%b);%c) - the source argument of the hyper-relation XP is a relation
 |  | 
|  | *XP(%a;XB(%b;%c)) - the target argument of the hyper-relation XP is a relation
 |  | 
|  | *XP(VC(%a;%b);VA(%a;%c)) - the source and the target argument of the hyper-relation XP are relations
 |  | 
|  | *<strike>XP(VC(%a;%b)VA(%a;%c);VS(%a;%d))</strike> - a hyper-relation may not have more than one relation as one single argument (in this case, the hyper-relation XP contained two relations as the source argument)
 |  | 
|  | ;Relations do not have strings, UWs, headwords or any features
 |  | 
|  | *<strike>XP(XB(%a;%b),"ab",[ab],<nowiki>[[ab]]</nowiki>,A,B;%c)</strike> (the relation XB(%a;%b) may not have strings, UWs, headwords or any features)
 |  | 
|  |   |  | 
|  | == Types of rules == 
 |  | 
|  |   |  | 
|  | In the UNL Grammarthere are three basic types of rules:
 |  | 
|  |   |  | 
|  | === Normalization Rules ===
 |  | 
|  | (main article: [[N-Rule]]s
 |  | 
|  | Used to normalize the natural language input and to segment natural language texts into sentences.
 |  | 
|  |   |  | 
|  | === Transformation rules ===
 |  | 
|  | (main article: [[T-Rule]]s
 |  | 
|  | Used to generate natural language sentences out of UNL graphs and vice-versa. 
 |  | 
|  |   |  | 
|  | === Disambiguation rules ===
 |  | 
|  | (main article: [[D-rule]]s
 |  | 
|  | Used to improve the performance of transformation rules by constraining their applicability.
 |  | 
|  |   |  | 
|  | The Segmentation Rules and Transformation Rules follow the very general formalism 
 |  | 
|  |   |  | 
|  |  α:=β;
 |  | 
|  |   |  | 
|  | where the left side α is a condition statement, and the right side β is an action to be performed over α. 
 |  | 
|  |   |  | 
|  | The Disambiguation Rules, which were directly inspired by the UNL Centre's former co-occurrence dictionary and knowledge base, follows a slightly different formalism: 
 |  | 
|  |   |  | 
|  |  α=P;
 |  | 
|  |   |  | 
|  | where the left side α is a statement and the right side P is an integer from 0 to 255 that indicates the probability of occurrence of α.
 |  | 
|  |   |  | 
|  | == Indexes ==
 |  | 
|  | ;Indexes (%) are used for co-indexing nodes, attributes and values inside and between the left and the right side of transformation rules. 
 |  | 
|  | :X(%a;)Y(%a;) (the first node of X is also the first node of Y)
 |  | 
|  | :X(%a;%b):=Y(%b;%a); (the first node of X becomes the second node of Y, and the second node of X becomes the first node of Y)
 |  | 
|  | :X(%a;)Y(%a;):=Z(%a); (if the first node of X is the first node of Y then make it the single node of Z)
 |  | 
|  | <blockquote>Any co-indexation is made by the use of indexes and not by the repetition of features. In that sense, '''X(A;)Y(A;)''' is different from '''X(%a;)Y(%a;)'''. In the former case, the first node of X is not necessarily the first node of Y, they only share the same feature A; in the latter case, the first node of X is necessarily the first node of Y.</blockquote>
 |  | 
|  | ;Indexes are made of any sequence of alphanumeric characters and underscore:
 |  | 
|  | :%index
 |  | 
|  | :%a
 |  | 
|  | :%first_index
 |  | 
|  | :%a1
 |  | 
|  | :<strike>%first index</strike> (no blank spaces are allowed)
 |  | 
|  | <blockquote>%01 (numbers are used for default indexation and must be avoided - see below)</blockquote>
 |  | 
|  | ;Default indexation 
 |  | 
|  | :If omitted, indexes are assigned by default, according to the following rules:
 |  | 
|  | :Default indexes are assigned from left to right in each side of the rule according to the position of the nodes:
 |  | 
|  | ::X(A;B)Y(C;D) is the same as X('''%01''',A;'''%02''',B)Y('''%03''',C;'''%04''',D)
 |  | 
|  | :Default indexation is done only for non-indexed nodes (i.e., user-defined indexes prevail over indexes assigned by default):
 |  | 
|  | ::X(A,%A;B)Y(C,%C;D) is the same as X(A,%A;B,'''%02''')Y(C,%C;'''%04''',D) 
 |  | 
|  | :::(Notice that the user-defined indexes %A and %C are preserved and not replaced by default indexes)
 |  | 
|  | :In default indexation, left-side nodes are automatically co-indexed with right-side nodes '''if and only if''' their position and number are the same:
 |  | 
|  | ::X(A;B):=Y(C;D); is the same as X('''%01''',A;'''%02''',B):=Y('''%01''',C;'''%02''',D);
 |  | 
|  | ::X(A;B):=Y(C;D;E); is the same as X('''%01''',A;'''%02''',B):=Y('''%03''',C;'''%04''',D;'''%05''',E);
 |  | 
|  | :::(there is no co-indexation between the left and the right side in the latter case, because the number of the nodes is not the same)
 |  | 
|  | :Default indexes are also assigned to hyper-nodes and sub-nodes
 |  | 
|  | ::(((A))):=(((B))); is the same as (%01(%01%01(%01%01%01,A))):=(%01(%01%01(%01%01%01,B)));
 |  | 
|  | :In default indexation, sub-nodes are informed by the syntax <PARENT NODE><CHILD NODE>, where <PARENT NODE> may be, itself, a sub-node:
 |  | 
|  | ::X(Y(A;B);C) is the same as X('''%01''',Y('''%01%01''',A;'''%01%02''',B);'''%02''')
 |  | 
|  | :::%01 = Y(A;B), %02 = C, %01%01 = A, %01%02 = B
 |  | 
|  | ::X(Y(Z(A;B);C);D) is the same as X('''%01''',Y('''%01%01''',Z('''%01%01%01''',A;'''%01%01%02''',B);'''%01%02''',C);'''%02''',D)
 |  | 
|  | :::%01 = Y(Z(A;B);C), %02 = D, %01%01 = Z(A;B), %01%02 = C, %01%01%01 = A, %01%01%02 = B
 |  | 
|  | ;Non-indexed nodes in the right side means ADDITION, whereas left-side nodes that are not referred to in the right side means DELETION
 |  | 
|  | :X(%a;%b):=Y(%a;X;%b); is the same as X(%a;%b):=Y(%a;'''%02''',X,;%b); (it means that a new node with the feature X will be created for the relation Y)
 |  | 
|  | :X(%a;%b;%c):=Y(%a;%c); (it means that the second node of X will be deleted from the relation Y)
 |  | 
|  | ;Indexes may also be used to transfer attribute values expressed in the format ATTRIBUTE=VALUE:
 |  | 
|  | :X(A,%a,ATT1=VAL1;B,%b):=X(%a;%b,ATT1=%a); (the value "VAL1" of "ATT1" of %a is copied to the node %b)
 |  | 
|  | ;Special indexes (#) are used to make reference to the internal structure of the field <NLW> in the dictionary
 |  | 
|  | :(X)(Y):=(X,#02)(Y)(X,#01);
 |  | 
|  | ::The rule above is used for complex dictionary entries such as:
 |  | 
|  | :::[[A][B]] "uw" (X, #01(ATT=AAA), #02(ATT=BBB)) <flg,fre,pri>;
 |  | 
|  | ::It means that, given (X)(Y), the output should be (B)(Y)(A).
 |  | 
|  |   |  | 
|  | == Notes ==
 |  | 
|  | <references />
 |  |