Grammar Specs

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
Line 6: Line 6:
 
== Basic Concepts ==
 
== Basic Concepts ==
 
[[file:grammar.png|center]]
 
[[file:grammar.png|center]]
 
  
 
;[[Node]]
 
;[[Node]]
Line 17: Line 16:
 
:A hyper-relation is a relation between relations.
 
:A hyper-relation is a relation between relations.
  
== [[Nodes]]s ==
+
=== [[Node]] ===
{{:Node}}s
+
{{:Node}}
 +
 
 +
=== [[Relation]] ===
 +
{{:Relation}}
 +
 
 +
=== [[Hyper-Node]] ===
 +
{{:Hyper-Node}}
 +
 
 +
=== [[Hyper-Relation]] ===
 +
{{:Hyper-Relation}}
 +
 
 +
== Rule ==
 +
{{:Rule}}

Revision as of 17:59, 19 August 2013

The following Grammar Specs are used for writing rules for the UNDL Foundation tools (IAN, EUGENE, SEAN, NORMA, etc.).

Contents

Basic Symbols

Basic symbols used in the UNL framework
Symbol Definition Example
( ) node (%a)
" " string "went"
[ ] natural language entry (headword) [go]
[[ ]] UW [[to go(icl>to move)]]
// regular expression /a{2,3}/ = aa,aaa
rel(x;y) relation agt(kill;Peter)
^ not ^a = not a
{ | } or {a|b} = a or b
% index for nodes, attributes and values %x
: scope ID :01
# index for sub-NLWs #01
= attribute-value assignment POS=NOU
! rule trigger !PLR
& merge operator %x&%y
? dictionary lookup operator ?[a]

Basic Concepts

Grammar.png
Node
A node is the most elementary unit in the graph. It is the result of the tokenization process, and corresponds to the notion of "lexical item". At the surface level, a natural language sentence is considered a list of nodes, and a UNL graph a set of relations between nodes.
Relation
In order to form a natural language sentence or a UNL graph, nodes are inter-related by relations.
Hyper-Node
A hyper-node is a sub-graph, i.e., a node containing relations between nodes.
Hyper-Relation
A hyper-relation is a relation between relations.

Node

A node is the most elementary unit in the grammar. It is the result of the tokenization process, and corresponds to the notion of "lexical item". At the surface level, a natural language sentence is considered a list of nodes, and a UNL graph a set of relations between nodes.

Basic Symbols

Basic symbols used in the UNL framework
Symbol Definition Example
( ) node (%a)
" " string "went"
[ ] natural language entry (headword) [go]
[[ ]] UW [[to go(icl>to move)]]
// regular expression /a{2,3}/ = aa,aaa
rel(x;y) relation agt(kill;Peter)
^ not ^a = not a
{ | } or {a|b} = a or b
% index for nodes, attributes and values %x
: scope ID :01
# index for sub-NLWs #01
= attribute-value assignment POS=NOU
! rule trigger !PLR
& merge operator %x&%y
? dictionary lookup operator ?[a]

Basic Concepts

Grammar.png
Node
A node is the most elementary unit in the graph. It is the result of the tokenization process, and corresponds to the notion of "lexical item". At the surface level, a natural language sentence is considered a list of nodes, and a UNL graph a set of relations between nodes.
Relation
In order to form a natural language sentence or a UNL graph, nodes are inter-related by relations. In the UNL framework, there are three different types of relations: the linear (list) relation, syntactic relations and semantic relations.
Hyper-Node
A hyper-node is a sub-graph, i.e., a scope: a node containing relations between nodes.
Hyper-Relation
A hyper-relation is a relation between relations.

Notation

Nodes are represented between (parentheses).
Each node may have several different elements, which are listed inside parentheses, isolated by comma.
Examples of nodes:

  • ("a")
  • ([a])
  • ([[a]])
  • (NOU)
  • (POS=NOU)
  • ("a",[a],[[a]],LEX=N,POS=NOU,GEN=MCL,NUM=SNG)

Nodes are related by relations. In a relation, different nodes are isolated by ";"

  • rel("a";"b") (a relation rel between the nodes ("a") and ("b"))

Elements

Any node is a structure containing the following necessary elements:

  • a string, to be represented between "quotes", which expresses the actual state of the node;
  • a headword, to be represented between [square brackets], which expresses the original value of the node in the dictionary;
  • a UW, to be represented between [[double square brackets]], which expresses the UW value of the node;
  • a feature or set of features, which express the features of the node;
  • an Index, preceded by the symbol %, which is used to reference the node.

The elements of a node can be:

  • native, if defined in the dictionary; or
  • non-native, if assigned by transformation rules.

Example

Consider the input string "an apple" and the dictionary[1] below:

[an]{111}""(LEX=D,POS=ART)<eng,0,0>;
[ ]{3333}""(LEX=O,POS=PUT,BLK)<eng,0,)>;
[apple]{222}"apple(icl>fruit)"(LEX=N,POS=NOU)<eng,0,0>;

In the tokenization process, the input string is segmented into nodes according to the dictionary. This means that the input string above is analyzed as a list of three nodes:

("an",[an],[[]],LEX=D,POS=ART)
(" ",[ ],[[]],LEX=O,POS=PUT,BLK)
("apple",[apple],[[apple(icl>fruit)]],LEX=N,POS=NOU)

Each node consists of:

  • a string, between quotes ("an", " ", "apple");
  • a headword, between brackets ([an],[ ],[apple]);
  • a UW, between double brackets ([[]],[[]],[[apple(icl>fruit)]]);
  • a set of features (LEX=D, POS=ART, LEX=O, ..., BLK)

These elements are said to be native because they are inherited from the dictionary.
During the processing, we may change any of these elements with T-rules. The resulting new elements are said to be non-native because they are assigned by rules.
Consider, for instance, the example below:

  • INITIAL STATE: ("an",[an],[[]],LEX=D,POS=ART)
  • RULE APPLIED: ("an"):=("a");[2]
  • FINAL STATE: ("a",[an],[[]],LEX=D,POS=ART)

Note, in the above, that the node changed its string value from "an" to "a". As this was the only change intended, the rule referred only to the string value of the node ("an").
In addition to changing the string, we could have changed any element of the node:

  • (ART):=(-ART); (delete the feature ART from the nodes having the feature ART)
  • (ART,^NDEF):=(+NDEF); (add the feature NDEF to the nodes having the feature ART and not having the feature NDEF)
  • ("an",ART,^NDEF):=("a",-ART,+NDEF); (set the string to "a", remove the feature ART and add the feature NDEF to the nodes having the feature ART and not having the feature NDEF whose string is "an").

Indexation

main article: Indexation

In most cases, we have to assign an index to the node. This happens when we want to perform operations over nodes instead of elements of nodes.
Consider, for instance, the need for reversing the order of the input string "an apple" in order to generate "apple an". If we write a rule as:

  • ("an")(" ")("apple"):=("apple")(" ")("an");

We would have the following output:

("apple",[an],[[]],LEX=D,POS=ART)
(" ",[ ],[[]],LEX=O,POS=PUT,BLK)
("an",[apple],[[apple(icl>fruit)]],LEX=N,POS=NOU)

Note, in the above, that we have simply replaced the string "an" by "apple", and "apple" by "an", preserving all the other features, which is not the intended behavior (after the rule, "apple" is ART and "an" is NOU).
In order to manipulate entire nodes (and not only some elements), we have to create indexes such as:

  • ("an",%index1)(" ",%index2)("apple",%index3):=(%index3)(%index2)(%index1);

In this case, the output would be the expected one:

("apple",[apple],[[apple(icl>fruit)]],LEX=N,POS=NOU)
(" ",[ ],[[]],LEX=O,POS=PUT,BLK)
("an",[an],[[]],LEX=D,POS=ART)

Indexes, which are introduced by the symbol %, are always temporary (they are valid only within rules using them) and are used for co-indexing nodes. For further information on indexes, see indexation.

Properties

  1. Nodes are enclosed between (parentheses)
    ("a") is a node
    "a" is not a node
  2. Inside relations, parentheses are not duplicated, except in case of hyper-nodes
    rel("a";"b") (relation rel between the nodes ("a") and ("b"))
    rel(("a");("b"))
  3. The elements of a node are separated by comma
    ("a",[a],[[a]],A,B,A=C,%a)
  4. The order of elements inside a node is not relevant.
    ("a",[a],[[a]],A,B,A=C,%a) is the same as ([[a]],B,A,"a",[a],A=C,%a)
  5. Nodes may have one single string, headword, UW and index, but may have as many features as necessary
    ("a","b") (a node may not contain more than one string)
    ([a],[b]) (a node may not contain more than one headword)
    ([[a]],[[b]]) (a node may not contain more than one UW)
    (%a,%b) (a node may not contain more than one index)
    (A,B,C,D,...,Z) (a node may contain as many features as necessary)
  6. A node may be referred by any of its elements, but only the index make it unique
    ("a") refers to all nodes where actual string = "a"
    ([a]) refers to all nodes where headword = [a]
    ([[a]]) refers to all nodes where UW = [[a]]
    (A) refers to all nodes having the feature A
    ("a",[a],[[a]],A) refers to all nodes having the feature A where string = "a" and headword = [a] and UW = [[a]]
    (%a) refers to the specific node with the index %a
  7. Nodes are automatically indexed according to a position-based system if no explicit index is provided (see Indexation)
    ("a")("b") is actually ("a",%01)("b",%02)
  8. Regular expressions may be used to make reference to any element of the node, except the index
    ("/a{2,3}/") refers to all nodes where string is a sequence of 2 to 3 characters "a"
    ([/a{2,3}/]) refers to all nodes where headword is a sequence of 2 to 3 characters "a"
    ([[/a{2,3}/]]) refers to all nodes where UW is a sequence of 2 to 3 characters "a"
    (/a{2,3}/) refers to all nodes having a feature that is a sequence of 2 to 3 characters "a"
  9. Nodes may contain disjoint features enclosed between {braces} and separated by vertical bar
    ({A|B}) refers to all nodes having the feature A OR B
  10. Node features may be expressed as simple attributes, or attribute-value pairs:
    (MCL) - feature as an attribute: refers to all nodes having the feature MCL
    (GEN=MCL) - feature as an attribute-value pair, which is the same as (GEN,MCL): refers to all nodes having the features GEN and MCL.
  11. Attribute-value pairs may be used to create co-reference between different nodes (as in agreement):
    (%x,GEN)(%y,GEN=%x) - the value of the attribute GEN of the node %x is the same as the attribute GEN of the node %y (see Indexation)

"Strings" or [headwords]?

  • "Double quotes" are always used to represent strings: "a" will match only the string "a"
  • [Simple square brackets] are always used to represent natural language entries (headwords) in the dictionary

In the initial state, i.e., right after tokenization, the "string" and the [headword] of a node are the same. During the processing, however, they can become different.
Consider, for instance, the case of "an apple" above. Note that, after applying the rule:

("an"):=("a");

The resulting node is:

("a",[an],[[]],LEX=D,POS=ART)

Note, in the above, that the string is now "a" whereas the headword is still [an]. We could have also changed the headword as in:

("an",[an]):=("a",[a]);

But it is normally interesting to preserve the headword in order to keep the track of the dictionary entries being used (regardless of their string changes).
In this case, if we want to make reference to the actual string value of the node, we use "quotes"; if we want to make reference to the string as it was retrieved in the dictionary, we use [brackets].

Transformations over nodes

Nodes are altered, replaced, created and deleted by T-rules:

Altering elements of nodes

Any changes to the elements of nodes must be stated in the right side of rules. Changes affect only the elements explicitly indicated in rules:

("a",[a],[[a]],A):=("b"); (only the string is affected by the rule; all other elements are preserved. The resulting node is ("b",[a],[[a]],A) )
"Strings"
Nodes may have one single string. This value is set through the operator "+", reset through the operator "-", or modified through A-rules. The operator "+" may be omitted. Changes to strings do not affect any other element (headwords, UWs and features)
  • (""):=(+"a"); (the string of the node is set from "" to "a")
  • (""):=("a"); (the same as above)
  • ():=("a"); (the same as above)
  • ("a"):=(+"b"); (the string of the node is set from "a" to "b")
  • ("a"):=("b"); (the same as above)
  • ("a"):=(-"a"); (the string of the node is reset, i.e., it changes from "a" to "")
  • ("a"):=(""); (the same as above)
  • ("a"):=("x"<0,0>"y"); (the string of the node is modified from "a" to "xay")
[headwords]
Nodes may have one single headword. This value is set through the operator "+" and reset through the operator "-". The operator "+" may be omitted. Headwords may not be modified through A-rules. Changes to headwords do not affect any other element (strings, UWs and features)
  • ([]):=(+[a]); (the headword of the node is set from [] to [a])
  • ([]):=([a]); (the same as above)
  • ():=([a]); (the same as above)
  • ([a]):=(+[b]); (the headword of the node is set from [a] to [b])
  • ([a]):=([b]); (the same as above)
  • ([a]):=(-[a]); (the headword of the node is reset, i.e., it changes from [a] to [])
  • ([a]):=([]); (the same as above)
  • ([a]):=("x"<0,0>"y"); (it is not possible to modify headwords through A-rules.
[[UWs]]
Nodes may have one single UW. This value is set through the operator "+" and reset through the operator "-". The operator "+" may be omitted. UWs may not be modified through A-rules. Changes to UWs do not affect any other element (strings, headwords and features)
  • ([[]]):=(+[[a]]); (the UW of the node is set from [[]] to [[a]])
  • ([[]]):=([[a]]); (the same as above)
  • ():=([[a]]); (the same as above)
  • ([[a]]):=(+[[b]]); (the UW of the node is set from [[a]] to [[b]])
  • ([[a]]):=([[b]]); (the same as above)
  • ([[a]]):=(-[[a]]); (the UW of the node is reset, i.e., it changes from [[a]] to [[]])
  • ([[a]]):=([[]]); (the same as above)
  • ([a]):=("x"<0,0>"y"); (it is not possible to modify UWs through A-rules.
Features
Nodes may have as many features as necessary. Features may come isolated in a list (POS,NOU,GEN,MCL,NUM,SNG) or as pairs of attribute=value (POS=NOU,GEN=MCL,NUM=SNG). Changes to features do not affect any other element (strings, headwords and UWs)
  • Adding features to nodes
    Features are added through the operator + (add). The operator "+" may be omitted.
    • ():=(+B); (add the feature B to the node)
    • ():=(B); (the same as above)
    Rules are recursive: the feature will be added to the node while the condition is true.
    • ():=(+B); (the resulting node is (B,B,B,...), i.e., this is an infinite loop)
    • (^B):=(+B); (the resulting node is (B), i.e., the feature B is added only if the node does not contain it yet)
    The operator + (add) does not create attribute=value pairs automatically (it simply adds features to the nodes)
    • (%x,^POS,^NOU):=(%x,+NOU); (the resulting node is (NOU) and not (POS=NOU) because POS has not been added)
    • (%x,^POS,^NOU):=(%x,+POS,+NOU); (the resulting node is (POS,NOU) and not (POS=NOU) because there was no assignment POS=NOU)
    • (%x,POS,^NOU):=(%x,+NOU); (the resulting node is (POS,NOU) because there was no assignment POS=NOU)
    • (%x,^POS,^NOU):=(%x,+POS=NOU); (the resulting node is (POS=NOU))
    • (%x,POS,^NOU):=(%x,+POS=NOU); (the resulting node is (POS, POS=NOU) because the feature POS has been duplicated)
  • Deleting features from nodes
    Features are deleted through the operator - (delete).
    ():=(-B); (delete the feature B from the node)
    Rules are recursive: the feature will be deleted from the node while the condition is true.
    ():=(-B); (the rule will delete all instances of the feature B from the node, i.e., the node (B,B,B,B,B) will become ()
    The operator "-" may also be used to reset attributes:
    • (%x,POS,NOU):=(%x,-NOU); (the resulting node is (POS) because the feature POS was not deleted)
    • (%x,POS,NOU):=(%x,-POS,-NOU); (the resulting node is () because both features POS and NOU were deleted)
    • (%x,POS=NOU):=(%x,-NOU); (the resulting node is (POS) because only the value of the attribute POS was deleted)
    • (%x,POS=NOU):=(%x,-POS); (the resulting node is () because the attribute POS was deleted with all its values)
  • Copying features
    Features can be copied from one to another node through indexes
    • (%x,GEN)(%y,^GEN):=(%x)(%y,GEN=%x); (the value of the attribute GEN is copied from the node %x to %y);
Indexes
Indexes are used to make reference to the whole node instead of its elements. Any change in the index means a completely new node, and no element is preserved.
  • (%x,"a"):=(%x,"b"); (the string of the node %x is set from "a" to "b"; all the other elements of the node %x are preserved)
  • (%x,"a"):=(%y,"b"); (the whole node %x is replaced by a new node %y whose string is "b"; no element from %x is copied to %y)

Deleting nodes

In linear rules, nodes are deleted if they are not repeated (co-indexed) in the right side:

  • (%x)(%y):=(%x); (the node %y will be deleted)

In other rules, nodes are deleted if they are not repeated (co-indexed) in the right side and are not part of any other relation:

  • rel(%x;%y):=rel(%x); (the node %y will be deleted if, and only if, it is not part of any other relation)

Creating nodes

Nodes are created through the use of new indexes in the right side:

  • ("a",%x)("b",%y):=(%x)(%y)("c",%z); (the node %z will be created)
  • ("a",%x)("b",%y):=(%x)("c",%z); (the node %z will be created, and %y will be deleted)

Duplicating (cloning) nodes

Nodes may be duplicated by repeating indexes on the right side along with the command #CLONE:

  • ("a",^CLONED,%x):=(%x,+CLONED)(%x,+CLONED,#CLONE);
    ("a") becomes ("a")("a")

In order to avoid infinite recursion, it is important to change the condition on the right side (in the example above, the feature +CLONED, assigned to all instances of the clone, prevents the rule from applying indefinitely)
Clones contain the same elements of the original nodes, unless they are explicitly altered during the cloning:

  • ("a",[a],[[a]],A,^CLONED,%x):=(%x,+CLONED)(%x,+CLONED,#CLONE);
    ("a",[a],[[a]],A) becomes ("a",[a],[[a]],A,CLONED)("a",[a],[[a]],A,CLONED)
  • (A,^CLONED,%x):=(%x,-A,+B,+CLONED)(%x,-A,+C,+CLONED,#CLONE);
    (A) becomes (B,CLONED)(C,CLONED)

Splitting nodes

One node may be split into two or more nodes through the use of splitting rules. Consider, for instance, the cases below:

Splitting rules deal only with strings and apply only to nodes with the feature TEMP.
Original node: ("abc",TEMP)
Split rule: ("abc"):=("ab")("c");
Resulting nodes: ("ab",TEMP)("c",TEMP);
However, if the original node was ("abc"), without TEMP, the rule would not have been applied (i.e., it is necessary to assign the feature TEMP to the node before splitting it)
Splitting rules are conservative: the elements of the original node, except the string, will be preserved unless explicitly altered.
Original node: ("abc",[abc],[[abc]],A,B,C,TEMP)
Split rule: ("abc"):=("ab")("c");
Resulting nodes: ("ab",[abc],[[abc]],A,B,C,TEMP)("c",[abc],[[abc]],A,B,C,TEMP) (i.e., the elements of the original node will be copied to the new nodes)
However, if the rule was: ("abc"):=("ab",-A,-B,-C,-TEMP,+AB)("c",-TEMP);
The result would be: ("ab",[abc],[[abc]],AB)("c",[abc],[[abc]],A,B,C)

Merging nodes (&)

Two or more nodes may be merged by the command &:

  • (%x)(%y)(%z):=(%x&%y&%z);

In the example above("a")("b")("c") becomes ("abc")

Merge operations concatenate headwords and UWs, and join features

("hw1",[[uw1]],F1,%x)("hw2",[[uw2]],F2,%y)("hw3",[[uw3]],F3,%z):=(%x&%y&%z);
The resulting node is ("hw1hw2hw3",[[uw1uw2uw3]],F1,F2,F3)

Compare the difference
  • (%x)(%y):=(%z); (the nodes %x and %y are replaced by %z, and their features are lost unless explicitly included in %z)
  • (%x)(%y):=(%x&%y); (the nodes %x and %y are merged)

Retrieving entries in the dictionary after tokenization (?)

During transformation (i.e., after tokenization), dictionary entries may be accessed from transformation rules by the command "?"

  • (?[headword]) retrieves the first entry in the dictionary with the headword "headword"
  • (?[[uw]]) retrieves the first entry in the dictionary with the UW "uw"
  • (?[headword],?[[uw]],?feature) retrieves the first entry in the dictionary with the headword "headword", the UW "uw" and the feature "feature"

Regular expressions, variables and disjunction may also be used in dictionary search

  • (?[/abcd./]) retrieves the first entry in the dictionary whose headword has 5 characters and begins with "abcd" (this works only in natural language generation)
  • (?[[/abcd./]]) retrieves the first entry in the dictionary whose UW has 5 characters and begins with "abcd" (this works only in natural language analysis)
Obligatory parameters
Due to the indexation algorithm, the headword is obligatory in IAN and the UW is obligatory in EUGENE:
  • (?[headword]) will work only in IAN
  • (?[[uw]]) will work only in EUGENE
  • (?feature) will not work in IAN or EUGENE
Variables
In order to avoid repetition, dictionary look-up may use the values of indexed nodes in the left side
  • (?[%x]) retrieves the first entry in the dictionary with the same headword of the node %x
  • (?[[%x]]) retrieves the first entry in the dictionary with the same UW of the node %x
  • (?[%x],ATT=%x) retrieves the first entry in the dictionary with the same headword of the node %x and whose attribute ATT has the same value of the attribute ATT of the node %x
Example

Dictionary search is used mainly in natural language generation

  • (N,NUM,GEN,@def,%noun):=(?[[]],?ART,?DEF,?NUM=%noun,?GEN=%noun)(%noun,-@def);

In case of node %noun with the features noun (N), number (NUM) and gender (GEN), and with the attribute @def (definite), search the first entry in the dictionary associated with the UW "" (empty UW) with the features ART and DEF, and whose attributes NUM and GEN have the same values of the ones of the node %noun, and insert it in front of the noun. Remove @def from the noun in order to avoid an infinite loop.

Triggering rules (!)

Inflectional rules are triggered in the grammar by the command "!"<ATTRIBUTE>.
Given the dictionary entry:

  • [foot] "foot" (POS=NOU, NUM(PLR:="oo":"ee")) <eng,0,0>;

The rule NUM(PLR:="oo":"ee") is triggered by !NUM
For instance:

  • (NUM=PLR,^inflected):=(!NUM,+inflected); or
  • (PLR,^inflected):=(!NUM,+inflected); or
  • (NUM,^inflected):=(!NUM,+inflected);

In the first case (NUM=PLR), the system verifies if the attribute "NUM" is set and if it has the value "PLR". In the second and in the third case, the system simply verifies if the word has any feature (attribute or value) equal to "PLR" or "NUM".
It's important to stress that, as the features of the dictionary are defined by the user, there is no way of pre-assigning attribute-value pairs. In that sense, it's not possible to infer that "PLR" will be a value of the attribute "NUM" except through an assignment of the form "NUM=PLR" (i.e., given only "PLR" or "NUM", is not possible to state "NUM=PLR").


Notes

  1. For the structure of the dictionary, please consult dictionary.
  2. This is a T-rule and means: replace the string value from "an" to "a". For further information on T-rules, please consult T-rule.

Relation

In order to form a natural language sentence or a UNL graph, nodes are inter-related by relations. In the UNL framework, there can be three different types of relations between nodes:

  • the linear relation L, which defines the order of the elements in a list
  • syntactic relations (such as adjunct of the noun phrase, complement of the verbal phrase, specifier of the adjective phrase, etc.)
  • semantic relations (such as agent, object, manner, instrument, etc.)


Basic Symbols

Basic symbols used in the UNL framework
Symbol Definition Example
( ) node (%a)
" " string "went"
[ ] natural language entry (headword) [go]
[[ ]] UW [[to go(icl>to move)]]
// regular expression /a{2,3}/ = aa,aaa
rel(x;y) relation agt(kill;Peter)
^ not ^a = not a
{ | } or {a|b} = a or b
% index for nodes, attributes and values %x
: scope ID :01
# index for sub-NLWs #01
= attribute-value assignment POS=NOU
! rule trigger !PLR
& merge operator %x&%y
? dictionary lookup operator ?[a]

Basic Concepts

Grammar.png
Node
A node is the most elementary unit in the graph. It is the result of the tokenization process, and corresponds to the notion of "lexical item". At the surface level, a natural language sentence is considered a list of nodes, and a UNL graph a set of relations between nodes.
Relation
In order to form a natural language sentence or a UNL graph, nodes are inter-related by relations. In the UNL framework, there are three different types of relations: the linear (list) relation, syntactic relations and semantic relations.
Hyper-Node
A hyper-node is a sub-graph, i.e., a scope: a node containing relations between nodes.
Hyper-Relation
A hyper-relation is a relation between relations.

Notation

Relations are represented by the general syntax

rel:scope(arg1;arg2;...;argn)

Where

  • rel is the name of the relation;
  • scope is the scope of the relation;
  • arg1, arg2, ..., are the arguments of the relation, i.e., nodes.

The main scope is 00 and it is not shown, by default:

  • rel(arg1;arg2) is the same as rel:00(arg1;arg2) (i.e., the relation rel belongs to the main scope, i.e., the main graph)
  • rel:01(arg1;arg2) (the relation rel belongs to the scope :01, i.e., a sub-graph inside the main graph)

Types

In the UNL framework, there can be three different types of relations:

  • the linear relation L expresses the surface (list) structure of natural language sentences
  • syntactic relations express the syntactic (tree) structure of natural language sentences
  • semantic relations express the semantic (graph) structure of UNL graphs

Examples

Examples of relations:

  • ("a")("b") (a linear relation between two nodes: one having the string "a" and the other having the string "b"
  • L("a";"b") (the same as above)
  • VC(V;NP) (a syntactic relation VC between two nodes: one having the feature V and the other having the feature NP
  • VC("a",V;"b",[[b]],LEX=N,NP) (a syntactic relation VC between two nodes: one having the string "a" and the feature V; and the other having the string "b", the UW b and the features LEX=N and NP)
  • agt("kill";N) (a semantic relation between two nodes: one having the string "kill" and the other having the feature N.

Properties

  1. The linear relation is always binary and is represented in two possible formats:
    • L(%x;%y) or
    • (%x)(%y)
    where L is the invariant name of the linear relation, and %x and %y are nodes.
  2. Syntactic relations are not predefined, although we have been using a set of binary relations based on the X-bar theory.
  3. Semantic relations constitute a predefined and closed set that can be found here.
  4. Arguments of relations are not commutative.
    The order of the elements in a relation affects the result:
    (%x)(%y) is different from (%y)(%x)
    relation(%x;%y) is different from relation(%y;%x)
  5. Linear and semantic relations are always binary; syntactic relations may be n-ary:
    L(%x;%y) - linear relation
    agt(%x;%y) - semantic relation
    VH(%x) - unary syntactic relation
    VC(%x;%y) - binary syntactic relation
    XX(%x;%y;%z) - possible ternary syntactic relation
  6. Inside each relation, nodes are isolated by semicolon (;).
    VC(%x;%y)
    VC(%x,%y)
  7. Inside each relation, nodes may be referenced by any of its elements, isolated by comma (,):
    ("a")([b]) - linear relation between a node where string = "a" and another node where headword = [b]
    L([[c]];D) - linear relation between a node where UW = [[c]] and another node having the feature D
    VC(%a;%b) - syntactic relation between a node where index = %a and another node where index = %b
    agt("a",[a],[[a]],A;"b",[b],[[b]],B) - semantic relation between a node having the feature A where string = "a" AND headword "a" AND UW = [[a]] AND another node having the feature B where string = "b" AND headword = [b] AND UW = [[b]]
  8. Relations may be conjoined through juxtaposition:
    ("a")("b")("c") - two linear relations: one between ("a") and ("b") AND other between ("b") and ("c")
    agt(%x;%y)obj(%x;%z) - two semantic relations: one between (%x) and (%y) AND other between (%x) and (%z)
    VC([a];[b]),VC([a];[c]) - conjoined relations must not be isolated by comma
  9. Relations may be disjoined through {braces}
    {("a")|("b")}("c") - either ("a")("c") or ("b")("c")
    {agt(%x;%y)|exp(%x;%y)}obj(%x;%z) - either agt(%x;%y)obj(%x;%z) or exp(%x;%y)obj(%x;%z)
  10. Syntactic and semantic relations may be replaced by regular expressions
    /.{2,3}/(%x;%y) - any relation made of two or three characters between %x and %y
  11. Differently from nodes, relations do not have elements (strings, headwords, features and indexes)
    In rel("a",[a],[[a]],A;"b",[b],[[b]],B), the elements "a", "b", [a], [b], [[a]], [[b]], A and B belong to the arguments of the relation and not to the relation itself.

Transformations

Relations are altered, replaced, created and deleted by S-rules:

Altering nodes in a relation

Elements of nodes in relations are altered through the operators + (add) and - (delete). The operator + may be omitted.

  • rel(%x,A;%y,B):=rel(%x,+C;%y,+D); (add the feature C to %x and D to %y)
  • rel(%x,A;%y,B):=rel(%x,C;%y,D);(the same as above)
  • rel(%x,A;%y,B):=rel(%x,-A;%y); (delete the feature A from %x)

"strings", [headwords] and [[UWs]] are considered to be features (but a single node may have only one of each)

  • rel(%x;%y):=rel(%x,"a";%y); (replace the existing string in %x, if any, by "a")
  • rel(%x;%y):=rel(%x,[A];%y);(replace the existing headword in %x, if any, by [A])
  • rel(%x;%y):=rel(%x,[[A]];%y); (replace the existing UW in %x, if any, by [[A]])

Creating nodes in a relation

Nodes are created when they are not co-indexed to any node in the left side (see Indexation):

  • rel(%x,A;%y,B):=rel(%x;%y;%z,+A); (the node %z, with the feature A, is created as a new argument of the relation rel)

Deleting nodes in a relation

Nodes are deleted when they are not co-indexed to any node in the right side (see Indexation):

  • rel(%x,A;%y,B;%z,C):=rel(%x;%y); (the node %z is deleted as an argument of the relation rel)

Nodes are completelly deleted if, and only if, they are not part of any other relation

Creating relations

Relations are created by the operator + (add) before the relation to be created. This operator may not be omitted.

  • rel(%x;%y):=+rel2(%x;%z); (a new relation rel2 is created between the nodes %x and %z; the original relation is not altered)

Creation of relations is a possible source of infinite loops. In order to prevent the rule from applying eternally, the condition field must be controlled:

  • rel(%x;%y)^rel2(%x;%z):=+rel2(%x;%z);

Deleting relations

Relations are deleted when they are not repeated in the right side, except in case of +

  • rel(%x;%y):=; (the relation rel between the nodes %x and %y is deleted)
  • rel(%x;%y):=rel2(%x;%y); (the relation rel between %x and %y is deleted and a new relation rel2 is created in its place) (replacement)
  • rel(%x;%y):=+rel2(%x;%y); (the relation rel is preserved and a new relation rel2 is created) (creation)

Replacing relations

Relations in the left side are replaced by relations in the right side, except in case of +:

  • rel(%x;%y):=rel2(%x;%y); (the relation rel between %x and %y is deleted and a new relation rel2 is created in its place)
  • rel1(%x;%y)rel2(%y;%z):=rel3(%x;%z); (the relations rel1 and rel2 are deleted and a new relation rel3 is created in their place) (merge)
  • rel(%x;%y):=rel1(%x;%y)rel2(%y;%z); (the relation rel is deleted and two new relations rel1 and rel2 are created in its place) (divide)
  • (%x)(%y):=rel(%x;%y); (the linear relation between the nodes %x and %y is replaced by the non-linear relation rel between the same nodes)
  • L(%x;%y):=rel(%x;%y); (the same as above)

Hyper-Node

Hyper-nodes are nodes containing relations between nodes. They represent scopes or sub-graphs.

Basic Symbols

Basic symbols used in the UNL framework
Symbol Definition Example
( ) node (%a)
" " string "went"
[ ] natural language entry (headword) [go]
[[ ]] UW [[to go(icl>to move)]]
// regular expression /a{2,3}/ = aa,aaa
rel(x;y) relation agt(kill;Peter)
^ not ^a = not a
{ | } or {a|b} = a or b
% index for nodes, attributes and values %x
: scope ID :01
# index for sub-NLWs #01
= attribute-value assignment POS=NOU
! rule trigger !PLR
& merge operator %x&%y
? dictionary lookup operator ?[a]

Basic Concepts

Grammar.png
Node
A node is the most elementary unit in the graph. It is the result of the tokenization process, and corresponds to the notion of "lexical item". At the surface level, a natural language sentence is considered a list of nodes, and a UNL graph a set of relations between nodes.
Relation
In order to form a natural language sentence or a UNL graph, nodes are inter-related by relations. In the UNL framework, there are three different types of relations: the linear (list) relation, syntactic relations and semantic relations.
Hyper-Node
A hyper-node is a sub-graph, i.e., a scope: a node containing relations between nodes.
Hyper-Relation
A hyper-relation is a relation between relations.

Scopes

main article: Scope

A scope is a sub-graph inside a graph, i.e., group of relations between nodes that work as a single entity. For instance, in the sentence "Mary saw Peter when John arrived", the dependent clause "when John arrived" describes the argument of a time relation and, therefore, should be represented as a hyper-node (i.e., as a sub-graph) as indicated below:

Scope.jpg

In the UNL table representation, hyper-nodes are indexed by ":XX", where XX is a two-digit hyper-node index. The main node index is :00 and may be omitted. Hyper-node indexes must be associated to every relation inside the node.

Scope1.jpg

Notation

As any node, hyper-nodes are represented between (parentheses):

  • (("a")("b")) - a hyper-node containing a list relation between two nodes
  • (L("a";"b")) - the same as above
  • (VC("a";"b")) - a hyper-node containing a syntactic relation VC between two nodes
  • (agt("a";"b")) - a hyper-node containing a semantic relation agt between two nodes

For better readability, hyper-nodes are normally referenced by the SCOPE ID :XX, where XX is a two-character unique identifier for the hyper-node.

  • tim(saw;agt(arrived;John)) (= saw when John arrived) is represented as
    tim(saw;:01)
    agt:01(arrived;John)

All relations inside the same scope must share the same scope ID

  • tim(saw;agt(killed;John)obj(killed;Peter)) (= saw when John killed Peter) is represented as
    tim(saw;:01)
    agt:01(killed;John)
    obj:01(killed;Peter)

The scope ID of the main scope is 00 and is omitted by default.

  • tim(saw;:01)agt:01(arrived;John) is the same as tim:00(saw;:01)agt:01(arrived;John)

Elements

As any node, hyper-nodes are vectors (uni-dimensional arrays) containing the following necessary elements:

  • a string, represented between "quotes"
  • a headword, represented between [simple square brackets]
  • a UW, represented between [[double square brackets]]
  • features, of which the internal relations are a special type
  • an index, preceded by %

Properties

As any node, hyper-nodes are expressed between (parentheses)
(("a")("b"))
Elements of inner nodes are not elements of hyper-nodes
(%a,"a",[a],[[a]],A,(%b,"b",[b],[[b]],B))
The elements of the hyper-node %a are "a", [a], [[a]], A and the node %b
The elements of the node %b are "b", [b], [[b]], B
The whole inner node (%b,"b",[b],[[b]],B) is an element of %a and may be used to reference it: (("b")), (([b])), (([[b]]), ((B))
However, "b", [b], [[b]] and B are not elements of %a (i.e., %a will not match ("b") but only (("b"))
As any node, hyper-nodes may have one single string, one single headword and one single UW, but may have as many features as necessary
(([kick],V)([the],D)([bucket],N),"kick the bucket",[kick the bucket],[[die]],V,NTST)
the hyper-node is formed of the following elements:
  • the string "kick the bucket"
  • the headword [kick the bucket]
  • the UW [[die]]
  • the features V and NTST
  • the list relation between the nodes ([kick],V)([the],D)([bucket],N)
Note that the string, headword and UW of the hyper-node may not coincide with the corresponding values of the inner nodes.
Internal nodes and relations work in the same way as features
(("a"),("b")) (a hyper-node containing two nodes: ("a") and ("b") (there is no necessary relation between ("a") and ("b"))
(("a")("b")) (a hyper-node containing a list relation between the nodes ("a") and ("b")
({("a")|("b")}) (a hyper-node containing either the node ("a") or the node ("b")
({("a")("b")|("c")("d")}) (a hyper-node containing either a list relation between the nodes ("a") and ("b") or a list relation between the nodes ("c") and ("d")
(rel("a";"b"),rel("c";"d")) (a hyper-node containing the relations rel("a";"b") and the relation rel("c";"d"))
(rel("a";"b")rel("c";"d")) (the same as above)
({rel("a";"b")|rel("c";"d")}) (a hyper-node containing either the relation rel("a";"b") or the relation rel("c";"d"))
As any node, hyper-nodes may be referenced by any of its elements, including internal nodes and relations
(([kick],V)) - refers to any hyper-node containing the node ([kick],V)
(([the],D)([bucket],N)) - refers to any hyper-node containing a linear relation between ([the],D) AND ([bucket],N)
(([kick],D),([bucket],N)) - refers to any hyper-node containing the nodes ([kick],V) AND ([bucket],N)
A hyper-node can be the internal node of another hyper-node
((("a"))) - a hyper-node containing a hyper-node containing the node "a"
Hyper-nodes and inner nodes are also indexed (see Indexation)
(%c,(%b,(%a,"a"))) - a hyper-node %c contaning a hyper-node %b contaning the node %a
(%c,LEX=%a,(%b,(%a,"a"))) - a hyper-node %c containing a hyper-node %b containing the node %a, where the value of LEX of %c is the same as the value of LEX of %a
When a hyper-node is deleted, all its internal relations are deleted, if they are not referenced in the rule
(([kick],V)([the],D)([bucket],N)):=; (the hyper-node is deleted, as well as the relations ([kick],V)([the],D) AND ([the],D)([bucket],N))
(([kick],V)):=; (all hyper-nodes containing the node ([kick],V) are deleted, even if the hyper-node consists of several other inner nodes

Examples

Examples of hyper-nodes are the following:

  • (("a"),("b")) - a hyper-node containing the nodes ("a") and ('b")
  • (("a")("b")) - a hyper-node containing a linear relation between the nodes ("a") and ("b")
  • (VC(%x;%y)VA(%x;%z)) - a hyper-node containing two syntactic relations: VC(%x;%y) AND VA(%x;%z)
  • (agt([a];[b])obj([a];[c])) - a hyper-node containing two semantic relations: agt([a];[b]) AND obj([a];[c])
  • (([kick],V)([the],D)([bucket],N),V,NTST) - a hyper-node having the features V and NTST and containing two linear relations: one between the nodes ([kick],V) and ([the],D), and other between ([the],D) and [bucket],N)
  • (([kick],V)([the],D)([bucket],N),"kick the bucket",[[die]],V,NTST) - the same as before, except for the fact that the hyper-node has string = "kick the bucket" and UW = [[die]]

Hyper-nodes may also contain internal hyper-nodes:

  • ((("a")("b"))("c")) - a hyper-node containing a linear relation between the hyper-node (("a")("b")) and the node ("c")

Transformations

Changes

Hyper-nodes, as nodes, have elements, which may be altered by the use of the operators + (add) and - (delete). The operator + may be omitted. Changes affect only the scopes indicated.

Changes to the main scope
  • (%a,"a",[a],[[a]],A,(%b,"b")):=(%a,"c");(the string of the hyper-node is set to "c"; the internal node %b is not affected)
  • (%a,"a",[a],[[a]],A,(%b,"b")):=(%a,"");(the string of the hyper-node is set to ""; the internal node %b is not affected)
  • (%a,"a",[a],[[a]],A,(%b,"b")):=(%a,-"a");(the same as above)
  • (%a,"a",[a],[[a]],A,(%b,"b")):=(%a,[c]);(the headword of the hyper-node is set to [c]; the internal node %b is not affected)
  • (%a,"a",[a],[[a]],A,(%b,"b")):=(%a,[[c]]);(the UW of the hyper-node is set to [[c]]; the internal node %b is not affected)
  • (%a,"a",[a],[[a]],A,^B,(%b,"b")):=(%a,+B);(add the feature B to the hyper-node %a; the internal node %b is not affected)
  • (%a,"a",[a],[[a]],A,^B,(%b,"b")):=(%a,B); (the same as above: add the feature B to %a)
  • (%a,"a",[a],[[a]],A,(%b,"b")):=(%a,-A);(delete the feature A from the hyper-node %a; the internal node %b is not affected)
Changes to inner scopes
  • (%a,(%b,"b",[b],[[b]],B)):=(%a,(%b,"c"));(the string of the inner node %b is set to "c"; the hyper-node %a is not affected)
  • (%a,(%b,"b",[b],[[b]],B)):=(%a,(%b,""));(the string of the inner node %b is set to ""; the hyper-node %a is not affected)
  • (%a,(%b,"b",[b],[[b]],B)):=(%a,(%b,-"b"));(the same as above)
  • (%a,(%b,"b",[b],[[b]],B)):=(%a,(%b,[c]));(the headword of the inner node %bis set to [c]; the hyper-node %a is not affected)
  • (%a,(%b,"b",[b],[[b]],B)):=(%a,(%b,[[c]]));(the UW of the inner node %b is set to [[c]]; the hyper-node %a is not affected)
  • (%a,(%b,"b",[b],[[b]],B,^C)):=(%a,(%b,+C));(add the feature C to the inner node %b; the hyper-node %a is not affected)
  • (%a,(%b,"b",[b],[[b]],B,^C)):=(%a,(%b,C)); (the same as above: add the feature C to %b)
  • (%a,(%b,"b",[b],[[b]],B)):=(%a,(%b,-B));(delete the feature B from the inner node %b; the hyper-node %a is not affected)
Rules must have as many parentheses as the depth of the inner scope to be altered
  • (%a,(%b,(%c,(%d,(%e,"e",[e],[[e]],E))))):=(%a,(%b,(%c,(%d,(%e,"f"))))); (the string of inner node %e is set to "f"; the enclosing nodes %d, %c, %b and %a are not affected)
Hyper-nodes do not need to be represented if the changes apply to nodes instead of nodes inside hyper-nodes
  • (%a,(%b,"b",[b],[[b]],B)):=(%a,(%b,"c")); (the string of the inner node %b is set to "c"; the hyper-node %a is not affected)

could be represented simply as

  • (%b,"b",[b],[[b]],B)):=(%b,"c");

if the changes apply to all nodes ("b",[b],[[b]],B) and not only to those inside scopes.

Deletion

Hyper-nodes, as any node, are deleted if they are not repeated (co-indexed) in the right side. In this case, all the inner nodes are deleted as well:

  • (REL(%x;%y),%z):=; (the hyper-node %z will be deleted, and all its internal nodes and relations as well)

As any feature, inner nodes are conservative, and are not deleted even if they are not repeated (co-indexed) in the right side:

  • (%a,A,^B):=(%a,+B); (the feature A is not deleted from the node %a)
  • (%a,^B,(%b,"b")):=(%a,+B); (the node %b is not deleted from the hyper-node %a)

In order to delete inner nodes, the operator "-" must be used

  • (%a,A,(%b,B)):=(%a,-(%b)); (the node %b is deleted from the hyper-node %a)
  • (%a,A,rel(%b;%c)):=(%a,-rel(%b;%c)); (the relation rel(%b;%c) is deleted from the hyper-node %a)

Extraction

Nodes may be extracted from hyper-nodes by removing the corresponding parentheses. In this case, the hyper-node is deleted (along with its features), but the internal nodes and relations are preserved, if repeated on the right side.

  • ((%x),%y):=(%x); (the hyper-node %y is deleted, but its internal node %x is preserved; in case %y have nodes other than %x, these nodes will be deleted as well, because they are not repeated in the right side)
  • (REL(%x;%y),%z):=REL(%x;%y); (the hyper-node %z is deleted, but its internal relation REL(%x;%y) is preserved; in case %z have relations other than REL(%x;%y), and nodes other than %x and %y, these will be deleted as well, because they are not repeated in the right side.

Create

Hyper-nodes are created through the encapsulation of existing nodes

  • (%x):=((%x),%y); (the hyper-node %y is created, with the node %x there inside)
  • REL(%x;%y):=(REL(%x;%y),%z); (the hyper-node %z is created, with the relation REL between the nodes %x and %y inside)
  • (%x)(%y):=((%x)(%y),%z); (the hyper-node %z is created, with the linear relation between the nodes %x and %y there inside)
Attention
relations and nodes must be repeated in the right side or they will be deleted
  • (%x):=(%y); (the node %x will be simply replaced by %y; no hyper-node will be created)
  • REL(%x;%y):=(%z); (the relation REL between the nodes %x and %y will be replaced by the node %z; no hyper-node will be created)

Hyper-Relation

Hyper-relation is a relation between relations, or between relations and nodes.

Basic Symbols

Basic symbols used in the UNL framework
Symbol Definition Example
( ) node (%a)
" " string "went"
[ ] natural language entry (headword) [go]
[[ ]] UW [[to go(icl>to move)]]
// regular expression /a{2,3}/ = aa,aaa
rel(x;y) relation agt(kill;Peter)
^ not ^a = not a
{ | } or {a|b} = a or b
% index for nodes, attributes and values %x
: scope ID :01
# index for sub-NLWs #01
= attribute-value assignment POS=NOU
! rule trigger !PLR
& merge operator %x&%y
? dictionary lookup operator ?[a]

Basic Concepts

Grammar.png
Node
A node is the most elementary unit in the graph. It is the result of the tokenization process, and corresponds to the notion of "lexical item". At the surface level, a natural language sentence is considered a list of nodes, and a UNL graph a set of relations between nodes.
Relation
In order to form a natural language sentence or a UNL graph, nodes are inter-related by relations. In the UNL framework, there are three different types of relations: the linear (list) relation, syntactic relations and semantic relations.
Hyper-Node
A hyper-node is a sub-graph, i.e., a scope: a node containing relations between nodes.
Hyper-Relation
A hyper-relation is a relation between relations.

Notation

Hyper-relations and relations are represented in the same way, i.e.,

rel:scope(arg1;arg2;...;argn)

where:

  • rel is the name of the hyper-relation
  • scope is the scope of the hyper-relation
  • arg1, arg2, ... are the arguments of the hyper-relation

The only difference between relations and hyper-relations is that the latter has at least one relation as one of its arguments, e.g.:

  • rel(rel1(arg1;arg2);arg3;...;argn)

For better readability, inner relations are normally replaced by relation ID's, to be represented by :XX, where XX is a two-digit string

  • rel(rel1(arg1;arg2);arg3;...;argn) is the same as rel(:01;arg3;...;argn)rel1:01(arg1;arg2)

Properties

A hyper-relation may have one single relation as each argument
  • XP(XB(%a;%b);%c) - the source argument of the hyper-relation XP is a relation
  • XP(%a;XB(%b;%c)) - the target argument of the hyper-relation XP is a relation
  • XP(VC(%a;%b);VA(%a;%c)) - the source and the target argument of the hyper-relation XP are relations
  • XP(VC(%a;%b)VA(%a;%c);VS(%a;%d)) - a hyper-relation may not have more than one relation as one single argument (in this case, the hyper-relation XP contained two relations as the source argument)
Differently from nodes, relations do not have elements (strings, headwords, features and indexes)
  • XP(XB(%a;%b),"ab",[ab],[[ab]],A,B;%c) (the relation XB(%a;%b) may not have strings, UWs, headwords or any features)

Examples

Examples of hyper-relations

  • XP(XB(%a;%b);%c) - a syntactic relation XP between the syntactic relation XB(%a;%b) and the node %c
  • and(agt([a];[b]);agt([a];[c])) - a semantic relation "and" between the semantic relations agt([a];[b]) AND agt([a];[c])

Transformations

Hyper-relations are altered, replaced, created and deleted by T-rules:

Creating hyper-relations

Hyper-relations are created through encapsulating relations:

  • rel1(%x;%y)rel2(%x;%z):=rel1(rel2(%x;%z);%y); (the relation rel1 between %x and %y becomes a hyper-relation between the relation rel2(%x;%z) and the node %y.)

Transforming hyper-relations into simple relations

Hyper-relations are transformed into simple relations by removing their internal relations:

  • rel1(rel2(%x;%z);%y):=rel1(%x;%y)rel2(%x;%z); (the hyper-relation rel1 between the relation rel2(%x;%z) and the node %y is transformed into a simple relation between the nodes %x and %y; the relatin rel2(%x;%z) is not affected.)

Rule

Grammars are sets of rules used to go from UNL into natural language, or from natural language into UNL. In the UNL framework, there can be two different types of rules:

  • T-rules, or transformation rules, are used to perform changes to nodes or relations
  • D-rules, or disambiguation rules, are used to control changes over nodes or relations

T-rules

main article:T-rule

T-rules are used to perform actions and follow the very general formalism

α:=β;

where the left side α is a condition statement, and the right side β is an action to be performed over α.

There are several different especial types of T-rules:

  • A-rule is a specific type of T-rule used for affixation (prefixation, infixation, suffixation)
  • C-rule is a specific type of T-rule used for composition (word formation in case of compounds and multiword expressions)
  • L-rule is a specific type of T-rule used for handling word order
  • N-rule is a specific type of T-rule used for segmenting sentences and normalizing the input text
  • S-rule is a specific type of T-rule used for handling syntactic structures

Examples of T-rules

  • PLR:=0>"s"; (A-rule: add "s" in case of plural, as in book>books)
  • MTW:=+VA("into account",PP); (C-rule: add the prepositional phrase "into account" as an adjunct to the verbal phrase (VA) in order to form the multiword expression, as in take>take into account)
  • (ART,%x)(QUA,%y):=(%y)(%x); (L-rule: reverse the order ART+QUA to QUA+ART, as in the all>all the)
  • ("don't"):=("do not"); (N-rule: replace the contraction "don't" by "do not")
  • (V,%x)(N,%y):=VC(%x;%y); (S-rule: replace the linear relation between a verb and a noun by the syntactic relation VC between them)

D-rules

main article: D-rule

D-rules are used to control the action of T-rules. They are used to control the dictionary retrieval (in tokenization) and to prevent or to induce the application of rules in transformation.

D-rules follow the syntax:

α=P;

where the left side α is a statement and the right side P is an integer from 0 to 255 that indicates the probability of occurrence of α.

Examples of D-rules

  • (ART)(VER)=0; (there cannot be any article before a verb)
  • agt(^V,^J;)=0; (the source node of an agent relation must be either a verb or an adjective)
  • (D)(N)=1; (determiners may come before nouns)
Software