|
|
Line 1: |
Line 1: |
− | UNL grammars are sets of rules for translating UNL expressions into natural language (NL) sentences and NL sentences into UNL expressions. They are normally unidirectional, i.e., the [[UNL-ization]] grammar (NL-to-UNL) is different from the [[NL-ization]] grammar (UNL-to-NL), even though they share the same basic syntax.
| + | #REDIRECT [[Grammar]] |
− | | + | |
− | == Basic symbols ==
| + | |
− | | + | |
− | {| border="1" cellpadding="2" align=center
| + | |
− | |+Basic symbols used in UNL grammar rules
| + | |
− | !Symbol
| + | |
− | !Definition
| + | |
− | !Example
| + | |
− | |-
| + | |
− | |align=center|<nowiki>^</nowiki>
| + | |
− | |not
| + | |
− | |^a = not a
| + | |
− | |-
| + | |
− | |align=center|{ | }
| + | |
− | |or
| + | |
− | |<nowiki>{a|b}</nowiki> = a or b
| + | |
− | |-
| + | |
− | |align=center|%
| + | |
− | |index for nodes, attributes and values
| + | |
− | |%x (see [[#Indexes|below]])
| + | |
− | |-
| + | |
− | |align=center|#
| + | |
− | |index for sub-NLWs
| + | |
− | |#01 (see [[#Indexes|below]])
| + | |
− | |-
| + | |
− | |align=center|=
| + | |
− | |attribute-value assignment
| + | |
− | |POS=NOU
| + | |
− | |-
| + | |
− | |align=center|!
| + | |
− | |rule trigger
| + | |
− | |!PLR
| + | |
− | |-
| + | |
− | |align=center|&
| + | |
− | |merge operator
| + | |
− | |%x&%y
| + | |
− | |-
| + | |
− | |align=center|?
| + | |
− | |dictionary lookup operator
| + | |
− | |?[a]
| + | |
− | |-
| + | |
− | |align=center|“ “
| + | |
− | |string
| + | |
− | |"went"
| + | |
− | |-
| + | |
− | |align=center|[ ]
| + | |
− | |natural language entry (headword)
| + | |
− | |[go]
| + | |
− | |-
| + | |
− | |align=center|[[ ]]
| + | |
− | |UW
| + | |
− | |[[to go(icl>to move)]]
| + | |
− | |-
| + | |
− | |align=center|( )
| + | |
− | |node
| + | |
− | |(a)
| + | |
− | |-
| + | |
− | |align=center|//
| + | |
− | |regular expression
| + | |
− | |/a{2,3}/ = aa,aaa
| + | |
− | |}
| + | |
− | | + | |
− | ;The differences between "", [] and [[]]
| + | |
− | :Double quotes are always used to represent strings: "a" will match only the string "a"
| + | |
− | :Simple square brackets are always used to represent natural language entries (headwords) in the dictionary: [a] will match the node associated to the entry [a] retrieved from the dictionary, no matter its current realization, which may be affected by other rules (the original [a] may have been replaced, for instance, by "b", but will still be indexed to the entry [a])
| + | |
− | :Double square brackets are always used to represent UWs: <nowiki>[[a]]</nowiki> will match the node associated to the UW <nowiki>[[a]]</nowiki>
| + | |
− | | + | |
− | ;Predefined values (assigned by default)
| + | |
− | :SCOPE - Scope
| + | |
− | :SHEAD - Sentence head (the beginning of a sentence)
| + | |
− | :STAIL - Sentence tail (the end of a sentence)
| + | |
− | :CHEAD - Scope head (the beginning of a scope)
| + | |
− | :CTAIL - Scope tail (the end of a scope)
| + | |
− | :TEMP - Temporary entry (entry not found in the dictionary)
| + | |
− | :DIGIT - Any sequence of digits (i.e.: 0,1,2,3,4,5,6,7,8,9)
| + | |
− | | + | |
− | == Basic concepts ==
| + | |
− | === Nodes ===
| + | |
− | A node is the most elementary unit in the grammar. It is the result of the [[tokenization]] process, and corresponds to the notion of "lexical item", to be represented by dictionary entries. At the surface level, a natural language sentence is considered a list of nodes, and a UNL graph a set of relations between nodes. Any node is a vector (one-dimensional array) containing the following necessary elements:
| + | |
− | *a string, to be represented between "quotes", which expresses the actual state of the node;
| + | |
− | *a headword, to be represented between [square brackets], which expresses the original value of the node in the dictionary;
| + | |
− | *a UW, to be represented between <nowiki>[[double square brackets]]</nowiki>, which expresses the UW value of the node;
| + | |
− | *a feature or set of features, which express the features of the node;
| + | |
− | *an [[#Indexes|index]], preceded by the symbol %, which is used to reference the node;
| + | |
− | Examples of nodes are
| + | |
− | *("ing") (a node making reference only to its actual string value)
| + | |
− | *([book]) (a node making reference only to its headword,i.e., its original state in the dictionary)
| + | |
− | *([[book(icl>document)]]) (a node making reference only to its UW value)
| + | |
− | *(NUM) (a node making reference only to one of its features)
| + | |
− | *(POS=NOU) (a node making reference only to one of its features in the attribute-value pair format)
| + | |
− | *(%x) (a node making reference only to its unique index)
| + | |
− | *("string",[headword],<nowiki>[[UW]]</nowiki>,feature1,feature2,...,attribute1=value1,attribute2=value2,...,%x) (complete node)
| + | |
− | ==== Properties of nodes ====
| + | |
− | ;Nodes are enclosed between (parentheses)
| + | |
− | :("a") is a node
| + | |
− | :"a" is not a note
| + | |
− | ;The elements of a node are separated by comma
| + | |
− | :("a",[a],<nowiki>[[a]]</nowiki>,A,B,A=C,%a)
| + | |
− | ;The order of elements inside a node is not relevant.
| + | |
− | :("a",[a],<nowiki>[[a]]</nowiki>,A,B,A=C,%a) is the same as (<nowiki>[[a]]</nowiki>,B,A,"a",[a],A=C,%a)
| + | |
− | ;Nodes may have one single string, headword, UW and index, but may have as many features as necessary
| + | |
− | :<strike>("a","b")</strike> (a node may not contain more than one string)
| + | |
− | :<strike>([a],[b])</strike> (a node may not contain more than one headword)
| + | |
− | :<strike>(<nowiki>[[a]]</nowiki>,<nowiki>[[b]]</nowiki>)</strike> (a node may not contain more than one UW)
| + | |
− | :<strike>(%a,%b)</strike> (a node may not contain more than one index)
| + | |
− | :(A,B,C,D,...,Z) (a node may contain as many features as necessary)
| + | |
− | ;A node may be referred by any of its elements
| + | |
− | :("a") refers to all nodes where actual string = "a"
| + | |
− | :([a]) refers to all nodes where headword = [a]
| + | |
− | :(<nowiki>[[a]]</nowiki>) refers to all nodes where UW = <nowiki>[[a]]</nowiki>
| + | |
− | :(A) refers to all nodes having the feature A
| + | |
− | :("a",[a],<nowiki>[[a]]</nowiki>,A) refers to all nodes having the feature A where string = "a" and headword = [a] and UW = <nowiki>[[a]]</nowiki>
| + | |
− | ;Nodes are automatically indexed according to a position-based system if no explicit index is provided (see [[#Indexes|Index]])
| + | |
− | :("a")("b") is actually ("a",%01)("b",%02)
| + | |
− | ;[[Regular expressions]] may be used to make reference to any element of the node, except the index
| + | |
− | :("/a{2,3}/") refers to all nodes where string is a sequence of 2 to 3 characters "a"
| + | |
− | :([/a{2,3}/]) refers to all nodes where headword is a sequence of 2 to 3 characters "a"
| + | |
− | :([[/a{2,3}/]]) refers to all nodes where UW is a sequence of 2 to 3 characters "a"
| + | |
− | :(/a{2,3}/) refers to all nodes having a feature that is a sequence of 2 to 3 characters "a"
| + | |
− | ;Nodes may contain disjoint features enclosed between {braces} and separated by comma
| + | |
− | :({A|B}) refers to all nodes having the feature A OR B
| + | |
− | ;Node features may be expressed as simple attributes, or attribute-value pairs:
| + | |
− | :(MCL) - feature as an attribute: refers to all nodes having the feature MCL
| + | |
− | :(GEN=MCL) - feature as an attribute-value pair, which is the same as (GEN,MCL): refers to all nodes having the features GEN and MCL.
| + | |
− | Attribute-value pairs may be used to create co-reference between different nodes (as in agreement):
| + | |
− | :(%x,GEN)(%y,GEN=%x) - the value of the attribute GEN of the node %x is the same of the attribute GEN of the node %y (see [[#Index|Index]])
| + | |
− | | + | |
− | === Relations ===
| + | |
− | In order to form a natural language sentence or a UNL graph, nodes are inter-related by relations. In the UNL framework, there can be three different types of relations:
| + | |
− | *the '''linear''' relation L expresses the surface structure of natural language sentences
| + | |
− | *'''syntactic''' relations express the deep (tree) structure of natural language sentences
| + | |
− | *'''semantic''' relations express the structure of UNL graphs
| + | |
− | ==== Properties of relations ====
| + | |
− | ;The linear relation is always binary and is represented in two possible formats:
| + | |
− | *L(%x;%y), where L is the invariant name of the linear relation, and %x and %y are nodes; or
| + | |
− | *(%x)(%y)
| + | |
− | ;Syntactic relations are not predefined, although we have been using a set of binary relations based on the [[X-bar theory]].
| + | |
− | ;Semantic relations constitute a predefined and closed set that can be found [[relations|here]].
| + | |
− | ;Syntactic and semantic relations are represented in the same way:
| + | |
− | *rel(%x;%y), where "rel" is the name of the relation, %x is the source node, and %y is the target node
| + | |
− | ;Arguments of linear, syntactic and semantic relations are not commutative.
| + | |
− | :The order of the elements in a relation affects the result:
| + | |
− | ::(%x)(%y) is different from (%y)(%x)
| + | |
− | ::relation(%x;%y) is different from relation(%y;%x)
| + | |
− | ;Linear and semantic relations are always binary; syntactic relations may be n-ary:
| + | |
− | :L(%x;%y) - linear relation
| + | |
− | :agt(%x;%y) - semantic relation
| + | |
− | :VH(%x) - unary syntactic relation
| + | |
− | :VC(%x;%y) - binary syntactic relation
| + | |
− | :XX(%x;%y;%z) - possible ternary syntactic relation
| + | |
− | ;Inside each relation, nodes are isolated by semicolon (;).
| + | |
− | :VC(%x;%y)
| + | |
− | :<strike>VC(%x,%y)</strike>
| + | |
− | ;Inside each relation, nodes may be referenced by any of its elements, isolated by comma (,):
| + | |
− | :("a")([b]) - linear relation between a node where string = "a" and another node where headword = [b]
| + | |
− | :L(<nowiki>[[c]]</nowiki>;D) - linear relation between a node where UW = <nowiki>[[c]]</nowiki> and another node having the feature D
| + | |
− | :VC(%a;%b) - syntactic relation between a node where index = %a and another node where index = %b
| + | |
− | :agt("a",[a],<nowiki>[[a]]</nowiki>,A;"b",[b],<nowiki>[[b]]</nowiki>,B) - semantic relation between a node having the feature A where string = "a" AND headword "a" AND UW = <nowiki>[[a]]</nowiki> AND another node having the feature B where string = "b" AND headword = [b] AND UW = <nowiki>[[b]]</nowiki>
| + | |
− | ;Relations may be conjoined through juxtaposition:
| + | |
− | :("a")("b")("c") - two linear relations: one between ("a") and ("b") AND other between ("b") and ("c")
| + | |
− | :agt(%x;%y)obj(%x;%z) - two semantic relations: one between (%x) and (%y) AND other between (%x) and (%z)
| + | |
− | :<strike>VC([a];[b]),VC([a];[c])</strike> - conjoined relations must not be isolated by comma
| + | |
− | ;Relations may be disjoined through {braces}
| + | |
− | :{("a")|("b")}("c") - either ("a")("c") or ("b")("c")
| + | |
− | :{agt(%x;%y)|exp(%x;%y)}obj(%x;%z) - either agt(%x;%y)obj(%x;%z) or exp(%x;%y)obj(%x;%z)
| + | |
− | ;Syntactic and semantic relations may be replaced by regular expressions
| + | |
− | :/.{2,3}/(%x;%y) - any relation made of two or three characters between %x and %y
| + | |
− | | + | |
− | === Hyper-nodes ===
| + | |
− | Nodes may contain one or more relations. In this case, they are said to be "hyper-nodes", and represent scopes or sub-graphs. As any node, hyper-nodes contain a string, a headword, a UW, an index and features, of which the internal relations are a special type. Examples of hyper-nodes are the following:
| + | |
− | *(("a")("b")) - a hyper-node containing a linear relation between the nodes ("a") and ("b")
| + | |
− | *(VC(%x;%y)VA(%x;%z)) - a hyper-node containing two syntactic relations: VC(%x;%y)AND VA(%x;%z)
| + | |
− | *(agt([a];[b])obj([a];[c])) - a hyper-node containing two semantic relations: agt([a];[b]) AND obj([a];[c])
| + | |
− | *(([kick],V)([the],D)([bucket],N),V,NTST) - a hyper-node having the features N and NTST and containing two linear relations: one between the nodes ([kick],V) and ([the],D), and other between ([the],D) and [bucket],N)
| + | |
− | *(([kick],V)([the],D)([bucket],N),"kick the bucket",<nowiki>[[die]]</nowiki>,V,NTST) - the same as before, except for the fact that the hyper-node has string = "kick the bucket" and UW = <nowiki>[[die]]</nowiki>
| + | |
− | Hyper-nodes may also contain internal hyper-nodes:
| + | |
− | *((("a")("b"))("c")) - a hyper-node containing a linear relation between the hyper-node (("a")("b")) and the node ("c")
| + | |
− | ==== Properties of hyper-nodes ====
| + | |
− | ;As any node, hyper-nodes are expressed between (parentheses)
| + | |
− | :(("a")("b"))
| + | |
− | ;As any node, hyper-nodes may have one single string, one single headword and one single UW, but may have as many features and internal relations as necessary
| + | |
− | :(([kick],V)([the],D)([bucket],N),"kick the bucket",[kick the bucket],<nowiki>[[die]]</nowiki>,V,NTST)
| + | |
− | ;As any node, hyper-nodes may be referenced by any of its elements, including internal relations
| + | |
− | :(([kick],V)) - refers to any hyper-node containing the node ([kick],V)
| + | |
− | :(([the],D)([bucket],N)) - refers to any hyper-node containing a linear relation between ([the],D) AND ([bucket],N)
| + | |
− | :(([kick],D),([bucket],N)) - refers to any hyper-node containing the nodes ([kick],V) AND ([bucket],N)
| + | |
− | ;When a hyper-node is deleted, all its internal relations are deleted as well
| + | |
− | :(([kick],V)([the],D)([bucket],N)):=; (the hyper-node is deleted, as well as the relations ([kick],V)([the],D) AND ([the],D)([bucket],N))
| + | |
− | | + | |
− | === Hyper-relations ===
| + | |
− | Relations may have relations as arguments. In this case, they are said to be "hyper-relations". Examples of hyper-relations are the following:
| + | |
− | *XP(XB(%a;%b);%c) - a syntactic relation XP between the syntactic relation XB(%a;%b) and the node %c
| + | |
− | *and(agt([a];[b]);agt([a];[c])) - a semantic relation "and" between the semantic relations agt([a];[b]) AND agt([a];[c])
| + | |
− | ==== Properties of hyper-relations ====
| + | |
− | ;A hyper-relation may have one single relation as each argument
| + | |
− | *XP(XB(%a;%b);%c) - the source argument of the hyper-relation XP is a relation
| + | |
− | *XP(%a;XB(%b;%c)) - the target argument of the hyper-relation XP is a relation
| + | |
− | *XP(VC(%a;%b);VA(%a;%c)) - the source and the target argument of the hyper-relation XP are relations
| + | |
− | *<strike>XP(VC(%a;%b)VA(%a;%c);VS(%a;%d))</strike> - a hyper-relation may not have more than one relation as one single argument (in this case, the hyper-relation XP contained two relations as the source argument)
| + | |
− | ;Relations do not have strings, UWs, headwords or any features
| + | |
− | *<strike>XP(XB(%a;%b),"ab",[ab],<nowiki>[[ab]]</nowiki>,A,B;%c)</strike> (the relation XB(%a;%b) may not have strings, UWs, headwords or any features)
| + | |
− | | + | |
− | == Types of rules ==
| + | |
− | | + | |
− | In the UNL Grammar there are three basic types of rules:
| + | |
− | | + | |
− | === Normalization Rules ===
| + | |
− | (main article: [[N-Rule]]s
| + | |
− | Used to normalize the natural language input and to segment natural language texts into sentences.
| + | |
− | | + | |
− | === Transformation rules ===
| + | |
− | (main article: [[T-Rule]]s
| + | |
− | Used to generate natural language sentences out of UNL graphs and vice-versa.
| + | |
− | | + | |
− | === Disambiguation rules ===
| + | |
− | (main article: [[D-rule]]s
| + | |
− | Used to improve the performance of transformation rules by constraining their applicability.
| + | |
− | | + | |
− | The Segmentation Rules and Transformation Rules follow the very general formalism
| + | |
− | | + | |
− | α:=β;
| + | |
− | | + | |
− | where the left side α is a condition statement, and the right side β is an action to be performed over α.
| + | |
− | | + | |
− | The Disambiguation Rules, which were directly inspired by the UNL Centre's former co-occurrence dictionary and knowledge base, follows a slightly different formalism:
| + | |
− | | + | |
− | α=P;
| + | |
− | | + | |
− | where the left side α is a statement and the right side P is an integer from 0 to 255 that indicates the probability of occurrence of α.
| + | |
− | | + | |
− | == Notes ==
| + | |
− | <references />
| + | |