T-rule
Line 1: | Line 1: | ||
T-rules, or transformation rules, are rules that alter the state of the nodes. They are used for normalization, for syntactic analysis and for semantic interpretation. The set of the t-rules form the '''Transformation grammar''', or '''T-Grammar'''. | T-rules, or transformation rules, are rules that alter the state of the nodes. They are used for normalization, for syntactic analysis and for semantic interpretation. The set of the t-rules form the '''Transformation grammar''', or '''T-Grammar'''. | ||
− | == | + | == Types of Transformation Rules == |
− | + | ||
− | + | Natural language sentences and UNL graphs are supposed to convey the same amount of information in different structures: whereas the former arranges data as an ordered list of words, the latter organizes it as a hypergraph. In that sense, translating from natural language into UNL and from UNL into natural language is ultimately a matter of transforming lists into networks and vice-versa. | |
− | + | ||
− | + | The UNDLF generation and analysis tools assume that such transformation should be carried out progressively, i.e., through a transitional data structure: the tree, which could be used as an interface between lists and networks. Accordingly, the UNL Grammar states seven different types of rules (LL, TT, NN, LT, TL, TN, NT), as indicated below: | |
− | + | ||
− | + | *'''ANALYSIS''' (NL-UNL) | |
− | *LT, or | + | **LL - List Processing (list-to-list) |
− | *TL, or | + | **LT - Surface-Structure Formation (list-to-tree) |
− | *TN, or | + | **TT - Syntactic Processing (tree-to-tree) |
− | *NT, | + | **TN - Deep-Structure Formation (tree-to-network) |
+ | **NN - Semantic Processing (network-to-network) | ||
+ | |||
+ | *'''GENERATION''' (UNL-NL) | ||
+ | **NN - Semantic Processing (network-to-network) | ||
+ | **NT - Deep-Structure Formation (network-to-tree) | ||
+ | **TT - Syntactic Processing (tree-to-tree) | ||
+ | **TL - Surface-Structure Formation (tree-to-list) | ||
+ | **LL - List Processing (list-to-list) | ||
+ | |||
+ | The '''NL original sentence''' is supposed to be preprocessed, by the LL rules, in order to become an ordered list. Next, the resulting '''list structure''' is parsed with the LT rules, so as to unveil its '''surface syntactic structure''', which is already a tree. The tree structure is further processed by the TT rules in order to expose its inner organization, the '''deep syntactic structure''', which is supposed to be more suitable to the semantic interpretation. Then, this deep syntactic structure is projected into a semantic network by the TN rules. The resultant '''semantic network''' is then post-edited by the NN rules in order to comply with UNL standards and generate the '''UNL Graph'''. | ||
+ | |||
+ | The reverse process is carried out during natural language generation. The '''UNL graph''' is preprocessed by the NN rules in order to become a more easily tractable semantic network. The resulting '''network structure''' is converted, by the NT rules, into a syntactic structure, which is still distant from the surface structure, as it is directly derived from the semantic arrangement. This '''deep syntactic structure''' is subsequently transformed into a '''surface syntactic structure''' by the TT rules. The surface syntactic structure undergoes many other changes according to the TL rules, which generate a NL-like '''list structure'''. This list structure is finally realized as a '''natural language sentence''' by the LL rules. | ||
+ | |||
+ | As sentences are complex structures that may contain nested or embedded phrases, both the analysis and the generation processes may be '''interleaved''' rather than pipelined. This means that the natural flow described above is only "normal" and not "necessary". During natural language generation, a LL rule may apply prior to a TT rule, or a NN rule may be applied after a TL rule. Rules are recursive and must be applied in the order defined in the grammar as long as their conditions are true, regardless of the state. | ||
+ | |||
+ | === List-to-List Rules === | ||
+ | |||
+ | The list-to-list (LL) rules are used for processing lists, both in analysis and in generation. In analysis, these rules are used for pre-editing the natural language sentence and preparing the input to the syntactic module; in generation, they are used for post-editing the output of the syntactic module and generating the natural language sentence. | ||
+ | |||
+ | There are 5 different subtypes of LL rules: | ||
+ | |||
+ | {|cellpadding="5" border="1" align="center" | ||
+ | |+LL rules | ||
+ | !ACTION | ||
+ | !RULE | ||
+ | !DESCRIPTION | ||
+ | |- | ||
+ | |rowspan="2"|ADD | ||
+ | |(%x):=(%x)(%y); | ||
+ | |The node %y is added to the right of the node %x | ||
+ | |- | ||
+ | |(%x):=(%y)(%x); | ||
+ | |The node %y is added to the left of the node %x | ||
+ | |- | ||
+ | |rowspan="2"|DELETE | ||
+ | |(%x):=-(%x); | ||
+ | |rowspan="2"|The node %x is deleted. | ||
+ | |- | ||
+ | |(%x):=; | ||
+ | |- | ||
+ | |REPLACE | ||
+ | |(%x):=(%y); | ||
+ | |All the instances of the node %x will be replaced by the node %y | ||
+ | |- | ||
+ | |MERGE | ||
+ | |(%x)(%y):=(%x&%y); | ||
+ | |The nodes %x and %y will be merged | ||
+ | |- | ||
+ | |} | ||
+ | <div align="center">Where %x and %y are nodes.</div> | ||
+ | |||
+ | === Tree-to-Tree Rules === | ||
+ | |||
+ | The tree-to-tree rules (TT) are used for processing trees, both in analysis and in generation. During analysis, these rules are used for revealing the deep structure out of the surface structure; in generation, they are used for transforming the deep into the surface syntactic structure. | ||
+ | |||
+ | Syntactic relations are n-ary: they can have as many arguments (nodes) as necessary. | ||
+ | |||
+ | There are 3 different subtypes of TT rules: | ||
+ | |||
+ | {| cellpadding="5" border="1" align="center" | ||
+ | |+TT rules | ||
+ | !ACTION | ||
+ | !RULE | ||
+ | !DESCRIPTION | ||
+ | |- | ||
+ | |ADD RELATION | ||
+ | |SYN1(%x;%y):=+SYN2(%w;%z); | ||
+ | |The relation SYN2 between the nodes %w and %z will be added to the graph containing the relation SYN1 between the nodes %x and %y | ||
+ | |- | ||
+ | |rowspan="2"|DELETE RELATION | ||
+ | |SYN(%x;%y):=-SYN(%x;%y); | ||
+ | |rowspan="2"|The relation SYN between the nodes %x and %y will be deleted (the nodes %x and %y will not be deleted) | ||
+ | |- | ||
+ | |SYN(%x;%y)=; | ||
+ | |- | ||
+ | |REPLACE RELATION | ||
+ | |SYN1(%x;%y):=SYN2(%w;%z); | ||
+ | |The relation SYN1 between the nodes %x and %y will be replaced by the relation SYN2 between the nodes %w and %z | ||
+ | |} | ||
+ | <div align="center">Where SYN is a syntactic relation, and %x, %y, %z and %w are nodes.</div> | ||
+ | |||
+ | |||
+ | As syntactic relations are n-ary, the REPLACE RELATION may also be used to ADD or DELETE nodes. | ||
+ | |||
+ | |||
+ | {|border="1" cellpadding="5" align="center" | ||
+ | |+Special types of TT replace relations | ||
+ | !ACTION | ||
+ | !RULE | ||
+ | !DESCRIPTION | ||
+ | |- | ||
+ | |ADD NODE | ||
+ | |SYN(%x;%y):=SYN(%x;%y;%z); | ||
+ | |The binary relation SYN between the nodes %x and %y is replaced by a ternary relation SYN between the nodes %x, %y and %z | ||
+ | |- | ||
+ | |DELETE NODE | ||
+ | |SYN(%x;%y):=SYN(%y); | ||
+ | |The binary relation SYN between the nodes %x and %y is replaced by a unary relation SYN with the node %y | ||
+ | |- | ||
+ | |} | ||
+ | <div align="center">Where SYN is a syntactic relation, and %x, %y and %z are nodes.</div> | ||
+ | |||
+ | === Network-to-Network Rules === | ||
+ | |||
+ | The network-to-network rules (NN) are used for processing networks, both in analysis and in generation. During analysis, these rules are used for post-editing the semantic network structure derived from the syntactic module in order to generate the UNL graph; in generation, they are used for pre-editing the UNL graph, transforming it into a semantic network that would be more appropriate for sentence generation. | ||
+ | |||
+ | There are 3 different subtypes of NN rules: | ||
+ | |||
+ | {| cellpadding="5" border="1" align="center" | ||
+ | |+NN rules | ||
+ | !ACTION | ||
+ | !RULE | ||
+ | !DESCRIPTION | ||
+ | |- | ||
+ | |ADD RELATION | ||
+ | |SEM1(%x;%y):=+SEM2(%w;%z); | ||
+ | |The relation SEM2 between the nodes %w and %z will be added to the graph containing the relation SEM1 between the nodes %x and %y | ||
+ | |- | ||
+ | |rowspan="2"|DELETE RELATION | ||
+ | |SEM(%x;%y):=-SEM(%x;%y); | ||
+ | |rowspan="2"|The relation SEM between the nodes %x and %y will be deleted (the nodes %x and %y will not be deleted) | ||
+ | |- | ||
+ | |SEM(%x;%y)=; | ||
+ | |- | ||
+ | |REPLACE RELATION | ||
+ | |SEM1(%x;%y):=SEM2(%w;%z); | ||
+ | |The relation SEM1 between the nodes %x and %y will be replaced by the relation SEM2 between the nodes %w and %z | ||
+ | |} | ||
+ | <div align="center">Where SEM is any of the existing UNL relations, and %x, %y, %z and %w are nodes.</div> | ||
+ | |||
+ | === List-to-Tree Rules === | ||
+ | |||
+ | The list-to-tree (LT) rules are used to parse the list structure into a tree structure.<br /> | ||
+ | There are 2 different subtypes of LT rules: | ||
+ | |||
+ | {|cellpadding="5" border="1" align="center" | ||
+ | |+LT rule | ||
+ | !ACTION | ||
+ | !RULE | ||
+ | !DESCRIPTION | ||
+ | |- | ||
+ | |ADD | ||
+ | |(%x)(%y):=+SYN(%x;%y); | ||
+ | |The relation SYN is created between the nodes %x and %y if there is a linear relation between them (the linear relation is not deleted) | ||
+ | |- | ||
+ | |REPLACE | ||
+ | |(%x)(%y):=SYN(%x;%y); | ||
+ | |The linear relation between %x and %y is replaced by the relation SYN between the same nodes (i.e., the linear relation is deleted) | ||
+ | |- | ||
+ | |} | ||
+ | <div align="center">Where SYN is a syntactic relation, and %x and %y are nodes.</div> | ||
+ | |||
+ | === Tree-to-List Rules === | ||
+ | |||
+ | The tree-to-list (TL) rules are used to linearize the tree structure into a list structure. There is one single type of TL rule: | ||
+ | |||
+ | There is a single type of TL rule: | ||
+ | |||
+ | {|cellpadding="5" border="1" align="center" | ||
+ | |+TL rule | ||
+ | !ACTION | ||
+ | !RULE | ||
+ | !DESCRIPTION | ||
+ | |- | ||
+ | |REPLACE | ||
+ | |SYN(%x;%y):=(%x)(%y); | ||
+ | |The relation SYN between %x and %y is replaced by a linear relation between %x and %y | ||
+ | |} | ||
+ | <div align="center">Where SYN is a syntactic relation and %x and %y are nodes.</div> | ||
+ | |||
+ | === Tree-to-Network Rules === | ||
+ | |||
+ | The tree-to-network (TN) rules derive a semantic network out of a syntactic tree. | ||
+ | |||
+ | There are 2 types of TN rules: | ||
+ | |||
+ | {|cellpadding="5" border="1" align="center" | ||
+ | |+TN rule | ||
+ | !ACTION | ||
+ | !RULE | ||
+ | !DESCRIPTION | ||
+ | |- | ||
+ | |ADD | ||
+ | |SYN(%x;%y):=+SEM(%w;%x); | ||
+ | |The semantic relation SEM between the nodes %w and %x is created if there is a syntactic relation SYN between the nodes %x and %y | ||
+ | |- | ||
+ | |REPLACE | ||
+ | |SYN(%x;%y):=SEM(%x;%y); | ||
+ | |The syntactic relation SYN between the nodes %x and %y is replaced by the semantic relation SEM between the same nodes. | ||
+ | |} | ||
+ | <div align="center">Where SYN is a syntactic relation, SEM is a semantic relation, and %x, %y, %w and %z are nodes.</div> | ||
+ | |||
+ | === Network-to-Tree Rules === | ||
+ | |||
+ | The network-to-tree (NT) rules reorganizes the network structure as a deep tree structure. | ||
+ | |||
+ | There are two types of TN rules: | ||
+ | |||
+ | {|cellpadding="5" border="1" align="center" | ||
+ | |+NT rule | ||
+ | !ACTION | ||
+ | !RULE | ||
+ | !DESCRIPTION | ||
+ | |- | ||
+ | |ADD | ||
+ | |SEM(%x;%y):=+SYN(%w;%x); | ||
+ | |The syntactic relation SYN between the nodes %w and %x is created if there is a semantic relation SEM between the nodes %x and %y | ||
+ | |- | ||
+ | |REPLACE | ||
+ | |SEM(%x;%y):=SYM(%x;%y); | ||
+ | |The semantic relation SEM between the nodes %x and %y is replaced by the syntactic relation SYN between the same nodes. | ||
+ | |} | ||
+ | <div align="center">Where SYN is a syntactic relation, SEM is a semantic relation, and %x, %y, %w and %z are nodes.</div> | ||
+ | |||
+ | == Transformations over nodes == | ||
+ | |||
+ | === Altering nodes === | ||
+ | Nodes are altered by the use of the operators + (add) and - (delete). The operator + may be omitted. | ||
+ | *(%x,A):=(%x,+B); (add the feature B to %x) | ||
+ | *(%x,A):=(%x,B); (the same as above: add the feature B to %x) | ||
+ | *(%x,A):=(%x,-A); (delete the feature A from %x) | ||
+ | "strings", <nowiki>[headwords]</nowiki> and <nowiki>[[UWs]]</nowiki> are considered to be features (but a single node may have only one of each) | ||
+ | *(%x,A):=(%x,"a"); (replace the existing string in %x, if any, by "a") | ||
+ | *(%x,[A]):=(%x,[A]); (replace the existing headword in %x, if any, by [A]) | ||
+ | *(%x,<nowiki>[[A]]</nowiki>):=(%x,<nowiki>[[A]]</nowiki>); (replace the existing UW in %x, if any, by <nowiki>[[A]]</nowiki>) | ||
+ | Example: | ||
+ | *("a",[a],<nowiki>[[a]]</nowiki>,A,C,%x):=("b",[b],<nowiki>[[b]]</nowiki>,-A,+B,%x); (the original node ("a",[a],<nowiki>[[a]]</nowiki>,A,C) becomes ("b",[b],<nowiki>[[b]]</nowiki>,B,C). Note that the feature C is preserved, because it was not affected by the rule); | ||
+ | |||
+ | === Deleting nodes === | ||
+ | In LL and LT rules, nodes are deleted if they are not repeated (co-indexed) in the right side: | ||
+ | *(%x)(%y):=(%x); (the node %y will be deleted) | ||
+ | In other rules, nodes are deleted if they are not repeated (co-indexed) in the right side and are not part of any other relation: | ||
+ | *rel(%x;%y):=rel(%x); (the node %y will be deleted if, and only if, it is not part of any other relation) | ||
+ | |||
+ | === Creating nodes === | ||
+ | Nodes are created through the use of new indexes in the right side: | ||
+ | *("a",%x)("b",%y):=(%x)(%y)("c",%z); (the node %z will be created) | ||
+ | *("a",%x)("b",%y):=(%x)("c",%z); (the node %z will be created, and %b will be deleted) | ||
+ | |||
+ | === Duplicating (cloning) nodes === | ||
+ | Nodes may be duplicated by repeating indexes on the right side along with the command #CLONE: | ||
+ | *("a",%x)("b",%y):=(%x)(%y)(%x,#CLONE)(%y,#CLONE)(%y,#CLONE)(%x,#CLONE); | ||
+ | ("a")("b") becomes ("a")("b")("a")("b")("b")("a")<br /> | ||
+ | In order to avoid infinite recursion, it is important to alter conditions on the left side.<br /> | ||
+ | In order to avoid impossible graphs (a node cannot be a neighbor of itself) and assign different features to the different instances of the repeated nodes, the command #CLONE must be used. | ||
+ | |||
+ | === Merging nodes (&) === | ||
+ | Two or more nodes may be merged by the command &: | ||
+ | *(%x)(%y)(%z):=(%x&%y&%z); | ||
+ | In the example above("a")("b")("c") becomes ("abc") | ||
+ | ;Merge operations concatenate headwords and UWs, and join features | ||
+ | ("hw1",<nowiki>[[uw1]]</nowiki>,F1,%x)("hw2",<nowiki>[[uw2]]</nowiki>,F2,%y)("hw3",<nowiki>[[uw3]]</nowiki>,F3,%z):=(%x&%y&%z);<br /> | ||
+ | The resulting node is ("hw1hw2hw3",<nowiki>[[uw1uw2uw3]]</nowiki>,F1,F2,F3) | ||
+ | ;Compare the difference | ||
+ | *(%x)(%y):=(%z); (the nodes %x and %y are replaced by %z, and their features are lost unless explicitly included in %z) | ||
+ | *(%x)(%y):=(%x&%z); (the nodes %z and %y are merged) | ||
+ | |||
+ | === Splitting nodes (retokenization) === | ||
+ | Temporary nodes (i.e., nodes having the feature TEMP) may be split, but the feature TEMP may be assigned to any node. <br /> | ||
+ | See [[Tokenization#Retokenization|Retokenization]] | ||
+ | |||
+ | == Transformations over hyper-nodes == | ||
+ | |||
+ | === Altering hyper-nodes === | ||
+ | Hyper-nodes, as nodes, have features, which may be altered by the use of the operators + (add) and - (delete). Changes in the hyper-node do not affect the internal nodes and relations. | ||
+ | The operator + may be omitted. | ||
+ | *(REL(%x;%y),%z):=(%z,+B); (add the feature B to the hyper-node %z; the internal nodes %x and %y are not affected) | ||
+ | *(REL(%x;%y),%z):=(%z,+B); (the same as above: add the feature B to %x) | ||
+ | *(REL(%x;%y),%z,A):=(%z,-A);(delete the feature A from the hyper-node %z; the internal nodes %x and %y are not affected) | ||
+ | |||
+ | === Deleting hyper-nodes === | ||
+ | In LL and LT rules, hyper-nodes are deleted if they are not repeated (co-indexed) in the right side. In this case, all the inner nodes are deleted as well: | ||
+ | *(REL(%x;%y),%z):=; (the hyper-node %z will be deleted, and all its internal nodes and relations as well) | ||
+ | In order to preserve the internal nodes, see Extarcting nodes out of hyper-nodes below | ||
+ | |||
+ | === Creating hyper-nodes === | ||
+ | Hyper-nodes are created through the encapsulation of existing nodes | ||
+ | *(%x):=((%x),%y); (the hyper-node %y is created, with the node %x there inside) | ||
+ | *REL(%x;%y):=(REL(%x;%y),%z); (the hyper-node %z is created, with the relation REL between the nodes %x and %y inside) | ||
+ | *(%x)(%y):=((%x)(%y),%z); (the hyper-node %z is created, with the linear relation between the nodes %x and %y there inside) | ||
+ | ;Attention: relations and nodes must be repeated in the right side or they will be deleted | ||
+ | *(%x):=(%y); (the node %x will be simply replaced by %y; no hyper-node will be created) | ||
+ | *REL(%x;%y):=(%z); (the relation REL between the nodes %x and %y will be replaced by the node %z; no hyper-node will be created) | ||
+ | |||
+ | === Extracting nodes out of hyper-nodes === | ||
+ | Nodes may be extracted from hyper-nodes by removing the hyper-node parentheses. In this case, the hyper-node is deleted (along with its features), but the internal nodes and relations are preserved, if repeated on the right side. | ||
+ | *((%x),%y):=(%x); (the hyper-node %y is deleted, but its internal node %x is preserved; in case %y have nodes other than %x, these nodes will be deleted as well, because they are not repeated in the right side) | ||
+ | *(REL(%x;%y),%z):=REL(%x;%y); (the hyper-node %z is deleted, but its internal relation REL(%x;%y) is preserved; in case %z have relations other than REL(%x;%y), and nodes other than %x and %y, these will be deleted as well, because they are not repeated in the right side. | ||
+ | |||
+ | == Transformations over relations and hyper-relations == | ||
+ | |||
+ | Relations and hyper-relations do not have features, and are replaced, created and deleted by NN, TT, NT, TN, TL and LT rules: | ||
+ | *REL1(%x;%y):=REL2(%x;%y); (replacement) | ||
+ | *REL(%x;%y):=; (deletion) | ||
+ | *REL1(%x;%y):=+REL2(%w;%z); (creation) | ||
+ | |||
+ | === Creating hyper-relations === | ||
+ | Hyper-relations are created through encapsulating relations: | ||
+ | *REL1(%x;%y)REL2(%x;%z):=REL1(REL2(%x;%z);%y); (the relation REL1 between %x and %y becomes a hyper-relation between the relation REL2(%x;%z) and the node %y.) | ||
+ | |||
+ | === Transforming hyper-relations into simple relations === | ||
+ | Hyper-relations are transformed into simple relations by removing their internal relations: | ||
+ | *REL1(REL2(%x;%z);%y):=REL1(%x;%y)REL2(%x;%z); (the hyper-relation REL1 between the relation REL2(%x;%z) and the node %y is transformed into a simple relation between the nodes %x and %y; the relatin REL2(%x;%z) is not affected.) | ||
+ | |||
+ | == Special types of transformation rules == | ||
+ | === Retrieving entries in the dictionary (?) === | ||
+ | Dictionary entries may be accessed from transformation rules by the command "?" | ||
+ | *(?<nowiki>[headword]</nowiki>) retrieves the first entry in the dictionary with the headword "headword" | ||
+ | *(?<nowiki>[[uw]]</nowiki>) retrieves the first entry in the dictionary with the UW "uw" | ||
+ | *(?<nowiki>[headword]</nowiki>,?<nowiki>[[uw]]</nowiki>,?feature) retrieves the first entry in the dictionary with the headword "headword", the UW "uw" and the feature "feature" | ||
+ | Regular expressions, variables and disjunction may also be used in dictionary search | ||
+ | *(?<nowiki>[/abcd./]</nowiki>) retrieves the first entry in the dictionary whose headword has 5 characters and begins with "abcd" (this works only in natural language generation) | ||
+ | *(?<nowiki>[[/abcd./]]</nowiki>) retrieves the first entry in the dictionary whose UW has 5 characters and begins with "abcd" (this works only in natural language analysis) | ||
+ | ;Obligatory parameters | ||
+ | :Due to the indexation algorithm, the headword is obligatory in IAN and the UW is obligatory in EUGENE: | ||
+ | *(?<nowiki>[headword]</nowiki>) will work only in IAN | ||
+ | *(?<nowiki>[[uw]]</nowiki>) will work only in EUGENE | ||
+ | *(?<nowiki>feature</nowiki>) will not work in IAN or EUGENE | ||
+ | ;Variables | ||
+ | :In order to avoid repetition, dictionary look-up may use the values of indexed nodes in the left side | ||
+ | *(?<nowiki>[%x]</nowiki>) retrieves the first entry in the dictionary with the same headword of the node %x | ||
+ | *(?<nowiki>[[%x]]</nowiki>) retrieves the first entry in the dictionary with the same UW of the node %x | ||
+ | *(?<nowiki>[%x],ATT=%x</nowiki>) retrieves the first entry in the dictionarz with the same headword of the node %x and whose attribute ATT has the same value of the attribute ATT of the node %x | ||
+ | ;Example | ||
+ | Dictionary search is used mainly in natural language generation | ||
+ | *(N,NUM,GEN,@def,%noun):=(?[[]],?ART,?DEF,?NUM=%noun,?GEN=%noun)(%noun,-@def); | ||
+ | In case of node %noun with the features noun (N), number (NUM) and gender (GEN), and with the attribute @def (definite), search the first entry in the dictionary associated with the UW "" (empty UW) with the features ART and DEF, and whose attributes NUM and GEN have the same values of the ones of the node %noun, and insert it in front of the noun. Remove @def from the noun in order to avoid an infinite loop. | ||
+ | |||
+ | === Triggering rules (!) === | ||
+ | [[Dictionary_Specs#Inflection_rules_inside_dictionary_entries.2A|Inflectional rules]] are triggered in the grammar by the command "!"<ATTRIBUTE>. | ||
+ | Given the dictionary entry: | ||
+ | *[foot] "foot" (POS=NOU, NUM(PLR:="oo":"ee")) <eng,0,0>; | ||
+ | The rule NUM(PLR:="oo":"ee") is triggered by !NUM<br /> | ||
+ | For instance: | ||
+ | *(NUM=PLR,^inflected):=(!NUM,+inflected); or | ||
+ | *(PLR,^inflected):=(!NUM,+inflected); or | ||
+ | *(NUM,^inflected):=(!NUM,+inflected); | ||
+ | In the first case (NUM=PLR), the system verifies if the attribute "NUM" is set and if it has the value "PLR". In the second and in the third case, the system simply verifies if the word has any feature (attribute or value) equal to "PLR" or "NUM".<br /> | ||
+ | It's important to stress that, as the features of the dictionary are defined by the user, there is no way of pre-assigning attribute-value pairs. In that sense, it's not possible to infer that "PLR" will be a value of the attribute "NUM" except through an assignment of the form "NUM=PLR" (i.e., given only "PLR" or "NUM", is not possible to state "NUM=PLR"). | ||
+ | |||
+ | == General properties of transformation rules == | ||
+ | |||
+ | ;PRIORITY | ||
+ | :Rules are applied serially, according to the order defined in the grammar. The first rule will be the first to be applied, the second will be the second, and so on. In case of the same rule being applicable more than once, rules are applied from left to right (in case of lists) and top-down (in case of graphs). | ||
+ | ::For instance: | ||
+ | :::List structure | ||
+ | ::::INPUT = [a][ ][beautiful][ ][book] | ||
+ | ::::GRAMMAR = | ||
+ | :::::RULE#1: ([ ]):=; (delete the blank space) | ||
+ | :::::RULE#2: ([beautiful])([ ])([book]):=([book])([beautiful]); (replace "beautiful+blank+book" by "book+beautiful") | ||
+ | ::::RESULT: | ||
+ | :::::INITIAL STATE: [a][ ][beautiful][ ][book] | ||
+ | :::::STATE#1: [a][beautiful][ ][book] (the RULE#1 is the first applicable rule to appear in the grammar and, therefore, is the first one to be applied, and it will apply to the leftmost blank) | ||
+ | :::::STATE#2: [a][beautiful][book] (the RULE#1 applies a second time, because there is a second blank space in the input) | ||
+ | :::::FINAL STATE: [a][beautiful][book] (the RULE#2 never applies, because its condition is no longer true after the second application of RULE#1) | ||
+ | :::Graph structure | ||
+ | ::::INPUT = mod(book,beautiful)mod(book,new) | ||
+ | ::::GRAMMAR = | ||
+ | :::::RULE#1: mod(%x;%y):=NA(%x;%y); (replace the mod relation between the nodes %x and %y by a NA relation between the same nodes) | ||
+ | ::::RESULT: | ||
+ | :::::INITIAL STATE: mod(book,beautiful)mod(book,new) | ||
+ | :::::STATE#1: NA(book,beautiful)mod(book,new) (the RULE#1 is the first applicable rule to appear in the grammar and, therefore, is the first one to be applied, and it will apply to the topmost relation, i.e., the first one to appear in the graph) | ||
+ | :::::STATE#2: NA(book,beautiful)NA(book,new) (the RULE#1 applies a second time, because there is a second mod relation to be replaced) | ||
+ | :::::FINAL STATE: NA(book,beautiful)NA(book,new) | ||
+ | |||
+ | ;RECURSIVENESS | ||
+ | :Rules are applied recursively as long as their conditions are true. Because of that, special attention should be paid to ADD rules: | ||
+ | *<strike>(%x,A):=(%x,+B);</strike> (creates an infinite loop, because the feature B will be added infinitely to the node %x)<br /> | ||
+ | In order to avoid the endless repetition, the condition side must be changed to (%x,A,^B):=(%x,+B); (the rule applies only once) | ||
+ | *<strike>REL1(%x;%y):=+REL2(%x;%z);</strike> (creates an infinite loop, because the REL2 will be added infinitely to the graph)<br /> | ||
+ | In order to avoid the endless repetition, the condition side must be changed to REL1(%x;%y)^REL(%x;%z):=+REL2(%x;%z); OR | ||
+ | REL1(%x,^BREAK;%y):=+REL2(%x,+BREAK;%z); (the rule applies only once) | ||
+ | |||
+ | ;COMPREHENSIVENESS | ||
+ | :Grammars are applied comprehensively as long as there is at least one applicable rule. | ||
+ | |||
+ | ;CONSERVATION | ||
+ | :Rules affect only the information clearly specified. No relation, node or feature is deleted unless explicitly informed.<br /> | ||
+ | :For instance, in the examples below, the source node of the “agt” relation preserves, in all cases, the value “a”. The only change concerns the feature “c”, which is added to the source node of the “agt” in the first two cases; and the feature “b”, which is deleted from the target node in the third case. | ||
+ | :::agt(a;b):=agt(c;); | ||
+ | :::agt(a;b):=agt(+c;); | ||
+ | :::agt(a;b):=agt(;-b); | ||
+ | :In any case, the ADD and DELETE rules (i.e., when the right side starts with “+” or “-“) preserve the items in the left side, except for the explicitly deleted ones: | ||
+ | :::INPUT: agt(%x;%y) obj(%x;%z) tim(%x;%w) | ||
+ | :::RULE: agt(%x;%y) ^mod(%x;%k):=+mod(%x;%k); | ||
+ | :::OUTPUT: agt(%x;%y) obj(%x;%z) tim(%x;%w) mod(%x;%k) | ||
+ | :or | ||
+ | :::INPUT: agt(%x;%y) obj(%x;%z) tim(%x;%w) | ||
+ | :::RULE: agt(%x;%y):=-agt(%x;%y); | ||
+ | :::OUTPUT: obj(%x;%z) tim(%x;%w) | ||
+ | |||
+ | ;SCOPE | ||
+ | :LL and LT rules apply over nodes, whereas NN, TT, NT, TN and TL rules apply over relations. | ||
+ | ::Nodes may be deleted only through LL and LT rules (i.e., when appearing in the left side of rules). | ||
+ | :::(A):=; (the node containing the feature "A" is deleted) | ||
+ | :::(%A,A):=-(%A); (the node containing the feature "A" is deleted - the same as above) | ||
+ | :::(%A,A)(%B,B):=(%A); (the node containing the feature "B" is deleted because not present in the right side - see [[#Indexes|indexes]]) | ||
+ | :::(%A,A)(%B,B)(%C,C):=REL(%A;%B); (the node containing the feature "C" is deleted, because not present in the right side - see [[#Indexes|indexes]]) | ||
+ | ::Nodes may not be deleted through NN, TT, NT, TN and TL rules (i.e., when not appearing in the left side of rules). | ||
+ | :::REL(%A,A;%B,B;%C,C):=(%A)(%B); (the relation between the nodes containing the features "A", "B" and "C" is replaced by a linear relation between the nodes containing the features "A" and "B". The node "C", however, is not deleted, even though absent from the right side - see [[#Indexes|indexes]]) | ||
+ | ::Relations may be deleted directly through NN, TT, NT, TN and TL rules, and indirectly through LL and LT rules (when their nodes are deleted). | ||
+ | :::REL(A;B):=; (The relation between the nodes containing the features "A" and "B" is deleted, but the nodes are preserved) | ||
+ | :::REL(A;B)REL(%C,C;%D,D):=REL(%C;%D); (The relation between the nodes containing the features "A" and "B" is deleted; its nodes and the relation between the nodes containing the features "C" and "D" are preserved) | ||
+ | :::(A):=; (The node containing the feature "A" is deleted, as well as all relations in which it figures as an argument) | ||
+ | |||
+ | ;INDEXATION | ||
+ | :All instances of the same node must be co-indexed (or they will be considered different nodes). See [[#Indexes|indexes]]. | ||
+ | |||
+ | ;ACTION | ||
+ | :Rules may add or delete values to the source and the target nodes, but only in the right side items: | ||
+ | :::agt(a;b):=agt(+c;); | ||
+ | :::agt(a;b):=agt(;-b); | ||
+ | |||
+ | ;CONJUNCTION | ||
+ | :Both the left and the right side of the rule may have as many items as necessary, as exemplified below:<br /> | ||
+ | :::SEM(A;B)SEM(C;D)SEM(E;F):=SEM(G;H)SEM(I;J)SEM(K;L); | ||
+ | |||
+ | ;DISJUNCTION | ||
+ | :The left side of the rules may bring disjuncts, but not the right side. Disjuncts must be represented between {braces} and must be separated by |. | ||
+ | :::{SEM(A;B)|SEM(C;D)}^SEM(E;F):=+SEM(E;F); | ||
+ | :::SEM(A;B){SEM(C;D)|SEM(E;F)}:=-SEM(A;B); | ||
+ | :::agt(VER,{V01|V02};NOU,^SNG}:=; | ||
+ | |||
+ | ;REGULAR EXPRESSIONS | ||
+ | :The left side of the rules may bring [[http://www.pcre.org/ Perl Compatible Regular Expressions]] between "/", as indicated below: | ||
+ | ::/(agt|obj|aoj)/(A,%a;B,%b):=VS(%a;%b); | ||
+ | :::The rule above applies in case of agt(A;B), obj(A;B) and aoj(A;B) | ||
+ | ::/[a-z]{2,3}/(A,%a;B,%b):=VS(%a;%b); | ||
+ | :::The rule above applies in case of any sequence of two or three alphabetic characters in the position of relation of A and B | ||
+ | ::agt(/(VER|NOU)/,%a;%b):=VS(%a;%b); | ||
+ | :::The rule above applies in case of VER and NOU as features of the first node of the relation "agt" | ||
+ | ::agt(POS=/(VER|NOU)/,%a;%b):=VS(%a;%b); | ||
+ | :::The rule above applies in case of VER and NOU as values of the attribute POS of the first node of the relation "agt" | ||
+ | |||
+ | ;CONCISION | ||
+ | :In order for rules to be as small as possible, the source and the target nodes may be simple place-holders or [[#Indexes|indexes]]: | ||
+ | :::cob(;):=obj(;); | ||
+ | :::tim(%01;<nowiki>[[in]]</nowiki>),obj(<nowiki>[[in]]</nowiki>;%02):=tim(%01;%02); | ||
+ | :::tim(VER,%01;<nowiki>[[in]]</nowiki>),obj(<nowiki>[[in]]</nowiki>;NOU,%02):=tim(%01;%02); | ||
+ | :::tim(VER;[[in]]),obj([[in]];NOU):=tim(;%04); | ||
+ | :In the DELETE rules, the right side may be omitted in case of deletion of the entire left side: | ||
+ | :::obj(PRE;):=; | ||
+ | |||
+ | ;READABILITY | ||
+ | :There can be blank spaces between variables and symbols. Comments can be added after the “;”. | ||
+ | :::obj ( ; ) := ; this rule deletes every “obj” relation. | ||
+ | |||
+ | ;COMMUTATIVITY | ||
+ | :Inside the same side of NN, NT, TT and TN rules, the order of the factors does not affect the result<ref>It is important to consider that the resulting order of relations may affect the application of other rules in some implementations. For instance, the rules "SEM(A;B):=SEM(C;D)SEM(E;F);" and "SEM(A;B):= SEM(E;F)SEM(C;D);" will provide the same result, but the relation "SEM(C;D)" may be listed before "SEM(E;F)" in the first case, and after it in the second case. This means that a general rule like SEM(;):=SYN(;);, which would be applicable to both generated relations, will be applied first to "SEM(C;D)" in the first case, and to "SEM(E;F)" in the second case.</ref> | ||
+ | :::SEM(A;B):=SEM(C;D)SEM(E;F); IS EQUIVALENT TO SEM(A;B):= SEM(E;F)SEM(C;D); | ||
+ | :::SYN(A;B):=SYN(C;D)SYN(E;F); IS EQUIVALENT TO SYN(A;B):= SYN(E;F)SYN(C;D); | ||
+ | :The order of the factors affect the result in case list-processing rules (LL, LT and TL): | ||
+ | :::(A):=(B)(C); IS DIFFERENT FROM (A):=(C)(B); | ||
+ | :::SYN(A;B):=(C)(D); IS DIFFERENT FROM SYN(A;B):=(D)(C); | ||
+ | :::(C)(D):=SYN(A;B); IS DIFFERENT FROM (D)(C):=SYN(A;B); | ||
+ | :Additionally, the order of the features inside a relation does not affect the end result, but the order of the nodes is non-commutative. | ||
+ | :::SEM( VER,TRA ; NOU,MCL ) IS THE SAME AS SEM( TRA,VER ; MCL,NOU ) | ||
+ | :But: | ||
+ | :::SEM( VER,TRA ; NOU,MCL) IS DIFFERENT FROM SEM( NOU,MCL ; VER,TRA ) | ||
+ | |||
+ | ;DICTIONARY ATTRIBUTES | ||
+ | :Dictionary attributes can be used as variables (see [[#Indexes|indexes]]). | ||
+ | :::SYN(%x,^NUM;%y,NUM):=SYN(NUM=%y;%x); | ||
+ | |||
+ | ;DICTIONARY RULES (see also [[Dictionary Specs#Inflection_rules_inside_dictionary_entries.2A | Inflection rules inside dictionary entries]]) | ||
+ | :Dictionary rules are triggered by '''"!"<ATTRIBUTE>''': | ||
+ | ::Dictionary | ||
+ | :::[foot] "foot" (NOU, NUM(PLR:=”oo”:”ee”)) <eng,0,0>; | ||
+ | :::[city] "city" (NOU, NUM(PLR:=”y”>”ies”)) <eng,0,0>; | ||
+ | ::Grammar | ||
+ | :::(@pl, NUM):=(!NUM,-@pl); | ||
+ | ::Output | ||
+ | :::foot>feet | ||
+ | :::city>cities | ||
+ | |||
+ | ;NLW SPLITTING | ||
+ | :Sub-NLWs in complex entries are referred by # (see [[#Indexes|indexes]]). | ||
+ | ::Dictionary | ||
+ | :::[[bring] [back]] "bring back" (VER,MTW,VA(01>02), #01(HEAD,VER), #02(ADJT,PP)) <eng,0,0>; | ||
+ | ::Grammar | ||
+ | :::VC(VER,MTW,VA(01>02),%head;NOU,%obj):=VB(VC(%head#01;%obj);%head#02); | ||
+ | |||
+ | == Formal Syntax of Transformation Rules == | ||
+ | |||
+ | <nowiki><TRANSFORMATION RULE> ::= <NN RULE> | <NT RULE> | <TT RULE> | <TL RULE> | <LL RULE> | <LT RULE> | <TN RULE></nowiki> | ||
+ | <nowiki><NN RULE> ::= (<SEM>)+ ":=" ( ("-"|"+")? <SEM> )* ";"</nowiki> | ||
+ | <nowiki><TT RULE> ::= (<SYN>)+ ":=" ( ("-"|"+")? <SYN> )* ";"</nowiki> | ||
+ | <nowiki><LL RULE> ::= ( "(" <NODE> ")" )+ ":=" ( ("-"|"+")? "(" <NODE> ")" )* ";"</nowiki> | ||
+ | <nowiki><NT RULE> ::= (<SEM>)+ ":=" ( <SYN> )+ ";"</nowiki> | ||
+ | <nowiki><TN RULE> ::= (<SYN>)+ ":=" ( <SEM> )+ ";"</nowiki> | ||
+ | <nowiki><TL RULE> ::= (<SYN>)+ ":=" ( "(" <NODE> ")" )+ ";"</nowiki> | ||
+ | <nowiki><LT RULE> ::= ( "(" <NODE> ")" )+ ":=" ( <SYN> )+ ";"</nowiki> | ||
+ | <nowiki><SEM> ::= <TEXT> "(" <NODE> ";" <NODE> ")"</nowiki> | ||
+ | <nowiki><SYN> ::= <TEXT> "(" <NODE> ";" <NODE> ")"</nowiki> | ||
+ | <nowiki><NODE> ::= ( (<DESCRIPTION>)( "," <DESCRIPTION> )* )?</nowiki> | ||
+ | <nowiki><DESCRIPTION> ::= <STRING> | <ENTRY> | <SUB-ENTRY> | <FEATURE> | <INDEX> | <RELATION></nowiki> | ||
+ | <nowiki><STRING> ::= """<text>"""</nowiki> | ||
+ | <nowiki><ENTRY> ::= "["<entry>"]"</nowiki> | ||
+ | <nowiki><SUB-ENTRY> ::= <INDEX>"#"[01-99]</nowiki> | ||
+ | <nowiki><FEATURE> ::= <VALUE> | <ATTRIBUTE> | <ATTRIBUTE>"="<VALUE></nowiki> | ||
+ | <nowiki><INDEX> ::= ( "%"([01-99]|[a-zA-Z_]+) )+</nowiki> | ||
+ | <nowiki><RELATION> ::= <SEM>|<SYN></nowiki> | ||
+ | <nowiki><VALUE> ::= <TEXT></nowiki> | ||
+ | <nowiki><ATTRIBUTE> ::= <TEXT></nowiki> | ||
+ | <nowiki><TEXT> ::= any sequence of characters except whitespace | <REGULAR EXPRESSION></nowiki> | ||
+ | <REGULAR EXPRESSION> ::= "/"<[http://www.pcre.org/ PERL COMPATIBLE REGULAR EXPRESSIONS]>"/" | ||
+ | |||
+ | Where: <br /> | ||
+ | <nowiki>""</nowiki> = constant<br /> | ||
+ | <nowiki>+</nowiki> = to be repeated one or more times<br /> | ||
+ | <nowiki>*</nowiki> = to be repeated zero or more times<br /> | ||
+ | <nowiki>?</nowiki> = to be repeated zero or one time<br /> | ||
+ | <nowiki>|</nowiki> = or<br /> | ||
+ | <nowiki>[x-y]</nowiki> = from x to y<br /> | ||
== Examples == | == Examples == |
Revision as of 15:49, 31 May 2013
T-rules, or transformation rules, are rules that alter the state of the nodes. They are used for normalization, for syntactic analysis and for semantic interpretation. The set of the t-rules form the Transformation grammar, or T-Grammar.
Types of Transformation Rules
Natural language sentences and UNL graphs are supposed to convey the same amount of information in different structures: whereas the former arranges data as an ordered list of words, the latter organizes it as a hypergraph. In that sense, translating from natural language into UNL and from UNL into natural language is ultimately a matter of transforming lists into networks and vice-versa.
The UNDLF generation and analysis tools assume that such transformation should be carried out progressively, i.e., through a transitional data structure: the tree, which could be used as an interface between lists and networks. Accordingly, the UNL Grammar states seven different types of rules (LL, TT, NN, LT, TL, TN, NT), as indicated below:
- ANALYSIS (NL-UNL)
- LL - List Processing (list-to-list)
- LT - Surface-Structure Formation (list-to-tree)
- TT - Syntactic Processing (tree-to-tree)
- TN - Deep-Structure Formation (tree-to-network)
- NN - Semantic Processing (network-to-network)
- GENERATION (UNL-NL)
- NN - Semantic Processing (network-to-network)
- NT - Deep-Structure Formation (network-to-tree)
- TT - Syntactic Processing (tree-to-tree)
- TL - Surface-Structure Formation (tree-to-list)
- LL - List Processing (list-to-list)
The NL original sentence is supposed to be preprocessed, by the LL rules, in order to become an ordered list. Next, the resulting list structure is parsed with the LT rules, so as to unveil its surface syntactic structure, which is already a tree. The tree structure is further processed by the TT rules in order to expose its inner organization, the deep syntactic structure, which is supposed to be more suitable to the semantic interpretation. Then, this deep syntactic structure is projected into a semantic network by the TN rules. The resultant semantic network is then post-edited by the NN rules in order to comply with UNL standards and generate the UNL Graph.
The reverse process is carried out during natural language generation. The UNL graph is preprocessed by the NN rules in order to become a more easily tractable semantic network. The resulting network structure is converted, by the NT rules, into a syntactic structure, which is still distant from the surface structure, as it is directly derived from the semantic arrangement. This deep syntactic structure is subsequently transformed into a surface syntactic structure by the TT rules. The surface syntactic structure undergoes many other changes according to the TL rules, which generate a NL-like list structure. This list structure is finally realized as a natural language sentence by the LL rules.
As sentences are complex structures that may contain nested or embedded phrases, both the analysis and the generation processes may be interleaved rather than pipelined. This means that the natural flow described above is only "normal" and not "necessary". During natural language generation, a LL rule may apply prior to a TT rule, or a NN rule may be applied after a TL rule. Rules are recursive and must be applied in the order defined in the grammar as long as their conditions are true, regardless of the state.
List-to-List Rules
The list-to-list (LL) rules are used for processing lists, both in analysis and in generation. In analysis, these rules are used for pre-editing the natural language sentence and preparing the input to the syntactic module; in generation, they are used for post-editing the output of the syntactic module and generating the natural language sentence.
There are 5 different subtypes of LL rules:
ACTION | RULE | DESCRIPTION |
---|---|---|
ADD | (%x):=(%x)(%y); | The node %y is added to the right of the node %x |
(%x):=(%y)(%x); | The node %y is added to the left of the node %x | |
DELETE | (%x):=-(%x); | The node %x is deleted. |
(%x):=; | ||
REPLACE | (%x):=(%y); | All the instances of the node %x will be replaced by the node %y |
MERGE | (%x)(%y):=(%x&%y); | The nodes %x and %y will be merged |
Tree-to-Tree Rules
The tree-to-tree rules (TT) are used for processing trees, both in analysis and in generation. During analysis, these rules are used for revealing the deep structure out of the surface structure; in generation, they are used for transforming the deep into the surface syntactic structure.
Syntactic relations are n-ary: they can have as many arguments (nodes) as necessary.
There are 3 different subtypes of TT rules:
ACTION | RULE | DESCRIPTION |
---|---|---|
ADD RELATION | SYN1(%x;%y):=+SYN2(%w;%z); | The relation SYN2 between the nodes %w and %z will be added to the graph containing the relation SYN1 between the nodes %x and %y |
DELETE RELATION | SYN(%x;%y):=-SYN(%x;%y); | The relation SYN between the nodes %x and %y will be deleted (the nodes %x and %y will not be deleted) |
SYN(%x;%y)=; | ||
REPLACE RELATION | SYN1(%x;%y):=SYN2(%w;%z); | The relation SYN1 between the nodes %x and %y will be replaced by the relation SYN2 between the nodes %w and %z |
As syntactic relations are n-ary, the REPLACE RELATION may also be used to ADD or DELETE nodes.
ACTION | RULE | DESCRIPTION |
---|---|---|
ADD NODE | SYN(%x;%y):=SYN(%x;%y;%z); | The binary relation SYN between the nodes %x and %y is replaced by a ternary relation SYN between the nodes %x, %y and %z |
DELETE NODE | SYN(%x;%y):=SYN(%y); | The binary relation SYN between the nodes %x and %y is replaced by a unary relation SYN with the node %y |
Network-to-Network Rules
The network-to-network rules (NN) are used for processing networks, both in analysis and in generation. During analysis, these rules are used for post-editing the semantic network structure derived from the syntactic module in order to generate the UNL graph; in generation, they are used for pre-editing the UNL graph, transforming it into a semantic network that would be more appropriate for sentence generation.
There are 3 different subtypes of NN rules:
ACTION | RULE | DESCRIPTION |
---|---|---|
ADD RELATION | SEM1(%x;%y):=+SEM2(%w;%z); | The relation SEM2 between the nodes %w and %z will be added to the graph containing the relation SEM1 between the nodes %x and %y |
DELETE RELATION | SEM(%x;%y):=-SEM(%x;%y); | The relation SEM between the nodes %x and %y will be deleted (the nodes %x and %y will not be deleted) |
SEM(%x;%y)=; | ||
REPLACE RELATION | SEM1(%x;%y):=SEM2(%w;%z); | The relation SEM1 between the nodes %x and %y will be replaced by the relation SEM2 between the nodes %w and %z |
List-to-Tree Rules
The list-to-tree (LT) rules are used to parse the list structure into a tree structure.
There are 2 different subtypes of LT rules:
ACTION | RULE | DESCRIPTION |
---|---|---|
ADD | (%x)(%y):=+SYN(%x;%y); | The relation SYN is created between the nodes %x and %y if there is a linear relation between them (the linear relation is not deleted) |
REPLACE | (%x)(%y):=SYN(%x;%y); | The linear relation between %x and %y is replaced by the relation SYN between the same nodes (i.e., the linear relation is deleted) |
Tree-to-List Rules
The tree-to-list (TL) rules are used to linearize the tree structure into a list structure. There is one single type of TL rule:
There is a single type of TL rule:
ACTION | RULE | DESCRIPTION |
---|---|---|
REPLACE | SYN(%x;%y):=(%x)(%y); | The relation SYN between %x and %y is replaced by a linear relation between %x and %y |
Tree-to-Network Rules
The tree-to-network (TN) rules derive a semantic network out of a syntactic tree.
There are 2 types of TN rules:
ACTION | RULE | DESCRIPTION |
---|---|---|
ADD | SYN(%x;%y):=+SEM(%w;%x); | The semantic relation SEM between the nodes %w and %x is created if there is a syntactic relation SYN between the nodes %x and %y |
REPLACE | SYN(%x;%y):=SEM(%x;%y); | The syntactic relation SYN between the nodes %x and %y is replaced by the semantic relation SEM between the same nodes. |
Network-to-Tree Rules
The network-to-tree (NT) rules reorganizes the network structure as a deep tree structure.
There are two types of TN rules:
ACTION | RULE | DESCRIPTION |
---|---|---|
ADD | SEM(%x;%y):=+SYN(%w;%x); | The syntactic relation SYN between the nodes %w and %x is created if there is a semantic relation SEM between the nodes %x and %y |
REPLACE | SEM(%x;%y):=SYM(%x;%y); | The semantic relation SEM between the nodes %x and %y is replaced by the syntactic relation SYN between the same nodes. |
Transformations over nodes
Altering nodes
Nodes are altered by the use of the operators + (add) and - (delete). The operator + may be omitted.
- (%x,A):=(%x,+B); (add the feature B to %x)
- (%x,A):=(%x,B); (the same as above: add the feature B to %x)
- (%x,A):=(%x,-A); (delete the feature A from %x)
"strings", [headwords] and [[UWs]] are considered to be features (but a single node may have only one of each)
- (%x,A):=(%x,"a"); (replace the existing string in %x, if any, by "a")
- (%x,[A]):=(%x,[A]); (replace the existing headword in %x, if any, by [A])
- (%x,[[A]]):=(%x,[[A]]); (replace the existing UW in %x, if any, by [[A]])
Example:
- ("a",[a],[[a]],A,C,%x):=("b",[b],[[b]],-A,+B,%x); (the original node ("a",[a],[[a]],A,C) becomes ("b",[b],[[b]],B,C). Note that the feature C is preserved, because it was not affected by the rule);
Deleting nodes
In LL and LT rules, nodes are deleted if they are not repeated (co-indexed) in the right side:
- (%x)(%y):=(%x); (the node %y will be deleted)
In other rules, nodes are deleted if they are not repeated (co-indexed) in the right side and are not part of any other relation:
- rel(%x;%y):=rel(%x); (the node %y will be deleted if, and only if, it is not part of any other relation)
Creating nodes
Nodes are created through the use of new indexes in the right side:
- ("a",%x)("b",%y):=(%x)(%y)("c",%z); (the node %z will be created)
- ("a",%x)("b",%y):=(%x)("c",%z); (the node %z will be created, and %b will be deleted)
Duplicating (cloning) nodes
Nodes may be duplicated by repeating indexes on the right side along with the command #CLONE:
- ("a",%x)("b",%y):=(%x)(%y)(%x,#CLONE)(%y,#CLONE)(%y,#CLONE)(%x,#CLONE);
("a")("b") becomes ("a")("b")("a")("b")("b")("a")
In order to avoid infinite recursion, it is important to alter conditions on the left side.
In order to avoid impossible graphs (a node cannot be a neighbor of itself) and assign different features to the different instances of the repeated nodes, the command #CLONE must be used.
Merging nodes (&)
Two or more nodes may be merged by the command &:
- (%x)(%y)(%z):=(%x&%y&%z);
In the example above("a")("b")("c") becomes ("abc")
- Merge operations concatenate headwords and UWs, and join features
("hw1",[[uw1]],F1,%x)("hw2",[[uw2]],F2,%y)("hw3",[[uw3]],F3,%z):=(%x&%y&%z);
The resulting node is ("hw1hw2hw3",[[uw1uw2uw3]],F1,F2,F3)
- Compare the difference
- (%x)(%y):=(%z); (the nodes %x and %y are replaced by %z, and their features are lost unless explicitly included in %z)
- (%x)(%y):=(%x&%z); (the nodes %z and %y are merged)
Splitting nodes (retokenization)
Temporary nodes (i.e., nodes having the feature TEMP) may be split, but the feature TEMP may be assigned to any node.
See Retokenization
Transformations over hyper-nodes
Altering hyper-nodes
Hyper-nodes, as nodes, have features, which may be altered by the use of the operators + (add) and - (delete). Changes in the hyper-node do not affect the internal nodes and relations. The operator + may be omitted.
- (REL(%x;%y),%z):=(%z,+B); (add the feature B to the hyper-node %z; the internal nodes %x and %y are not affected)
- (REL(%x;%y),%z):=(%z,+B); (the same as above: add the feature B to %x)
- (REL(%x;%y),%z,A):=(%z,-A);(delete the feature A from the hyper-node %z; the internal nodes %x and %y are not affected)
Deleting hyper-nodes
In LL and LT rules, hyper-nodes are deleted if they are not repeated (co-indexed) in the right side. In this case, all the inner nodes are deleted as well:
- (REL(%x;%y),%z):=; (the hyper-node %z will be deleted, and all its internal nodes and relations as well)
In order to preserve the internal nodes, see Extarcting nodes out of hyper-nodes below
Creating hyper-nodes
Hyper-nodes are created through the encapsulation of existing nodes
- (%x):=((%x),%y); (the hyper-node %y is created, with the node %x there inside)
- REL(%x;%y):=(REL(%x;%y),%z); (the hyper-node %z is created, with the relation REL between the nodes %x and %y inside)
- (%x)(%y):=((%x)(%y),%z); (the hyper-node %z is created, with the linear relation between the nodes %x and %y there inside)
- Attention
- relations and nodes must be repeated in the right side or they will be deleted
- (%x):=(%y); (the node %x will be simply replaced by %y; no hyper-node will be created)
- REL(%x;%y):=(%z); (the relation REL between the nodes %x and %y will be replaced by the node %z; no hyper-node will be created)
Extracting nodes out of hyper-nodes
Nodes may be extracted from hyper-nodes by removing the hyper-node parentheses. In this case, the hyper-node is deleted (along with its features), but the internal nodes and relations are preserved, if repeated on the right side.
- ((%x),%y):=(%x); (the hyper-node %y is deleted, but its internal node %x is preserved; in case %y have nodes other than %x, these nodes will be deleted as well, because they are not repeated in the right side)
- (REL(%x;%y),%z):=REL(%x;%y); (the hyper-node %z is deleted, but its internal relation REL(%x;%y) is preserved; in case %z have relations other than REL(%x;%y), and nodes other than %x and %y, these will be deleted as well, because they are not repeated in the right side.
Transformations over relations and hyper-relations
Relations and hyper-relations do not have features, and are replaced, created and deleted by NN, TT, NT, TN, TL and LT rules:
- REL1(%x;%y):=REL2(%x;%y); (replacement)
- REL(%x;%y):=; (deletion)
- REL1(%x;%y):=+REL2(%w;%z); (creation)
Creating hyper-relations
Hyper-relations are created through encapsulating relations:
- REL1(%x;%y)REL2(%x;%z):=REL1(REL2(%x;%z);%y); (the relation REL1 between %x and %y becomes a hyper-relation between the relation REL2(%x;%z) and the node %y.)
Transforming hyper-relations into simple relations
Hyper-relations are transformed into simple relations by removing their internal relations:
- REL1(REL2(%x;%z);%y):=REL1(%x;%y)REL2(%x;%z); (the hyper-relation REL1 between the relation REL2(%x;%z) and the node %y is transformed into a simple relation between the nodes %x and %y; the relatin REL2(%x;%z) is not affected.)
Special types of transformation rules
Retrieving entries in the dictionary (?)
Dictionary entries may be accessed from transformation rules by the command "?"
- (?[headword]) retrieves the first entry in the dictionary with the headword "headword"
- (?[[uw]]) retrieves the first entry in the dictionary with the UW "uw"
- (?[headword],?[[uw]],?feature) retrieves the first entry in the dictionary with the headword "headword", the UW "uw" and the feature "feature"
Regular expressions, variables and disjunction may also be used in dictionary search
- (?[/abcd./]) retrieves the first entry in the dictionary whose headword has 5 characters and begins with "abcd" (this works only in natural language generation)
- (?[[/abcd./]]) retrieves the first entry in the dictionary whose UW has 5 characters and begins with "abcd" (this works only in natural language analysis)
- Obligatory parameters
- Due to the indexation algorithm, the headword is obligatory in IAN and the UW is obligatory in EUGENE:
- (?[headword]) will work only in IAN
- (?[[uw]]) will work only in EUGENE
- (?feature) will not work in IAN or EUGENE
- Variables
- In order to avoid repetition, dictionary look-up may use the values of indexed nodes in the left side
- (?[%x]) retrieves the first entry in the dictionary with the same headword of the node %x
- (?[[%x]]) retrieves the first entry in the dictionary with the same UW of the node %x
- (?[%x],ATT=%x) retrieves the first entry in the dictionarz with the same headword of the node %x and whose attribute ATT has the same value of the attribute ATT of the node %x
- Example
Dictionary search is used mainly in natural language generation
- (N,NUM,GEN,@def,%noun):=(?[[]],?ART,?DEF,?NUM=%noun,?GEN=%noun)(%noun,-@def);
In case of node %noun with the features noun (N), number (NUM) and gender (GEN), and with the attribute @def (definite), search the first entry in the dictionary associated with the UW "" (empty UW) with the features ART and DEF, and whose attributes NUM and GEN have the same values of the ones of the node %noun, and insert it in front of the noun. Remove @def from the noun in order to avoid an infinite loop.
Triggering rules (!)
Inflectional rules are triggered in the grammar by the command "!"<ATTRIBUTE>. Given the dictionary entry:
- [foot] "foot" (POS=NOU, NUM(PLR:="oo":"ee")) <eng,0,0>;
The rule NUM(PLR:="oo":"ee") is triggered by !NUM
For instance:
- (NUM=PLR,^inflected):=(!NUM,+inflected); or
- (PLR,^inflected):=(!NUM,+inflected); or
- (NUM,^inflected):=(!NUM,+inflected);
In the first case (NUM=PLR), the system verifies if the attribute "NUM" is set and if it has the value "PLR". In the second and in the third case, the system simply verifies if the word has any feature (attribute or value) equal to "PLR" or "NUM".
It's important to stress that, as the features of the dictionary are defined by the user, there is no way of pre-assigning attribute-value pairs. In that sense, it's not possible to infer that "PLR" will be a value of the attribute "NUM" except through an assignment of the form "NUM=PLR" (i.e., given only "PLR" or "NUM", is not possible to state "NUM=PLR").
General properties of transformation rules
- PRIORITY
- Rules are applied serially, according to the order defined in the grammar. The first rule will be the first to be applied, the second will be the second, and so on. In case of the same rule being applicable more than once, rules are applied from left to right (in case of lists) and top-down (in case of graphs).
- For instance:
- List structure
- INPUT = [a][ ][beautiful][ ][book]
- GRAMMAR =
- RULE#1: ([ ]):=; (delete the blank space)
- RULE#2: ([beautiful])([ ])([book]):=([book])([beautiful]); (replace "beautiful+blank+book" by "book+beautiful")
- RESULT:
- INITIAL STATE: [a][ ][beautiful][ ][book]
- STATE#1: [a][beautiful][ ][book] (the RULE#1 is the first applicable rule to appear in the grammar and, therefore, is the first one to be applied, and it will apply to the leftmost blank)
- STATE#2: [a][beautiful][book] (the RULE#1 applies a second time, because there is a second blank space in the input)
- FINAL STATE: [a][beautiful][book] (the RULE#2 never applies, because its condition is no longer true after the second application of RULE#1)
- Graph structure
- INPUT = mod(book,beautiful)mod(book,new)
- GRAMMAR =
- RULE#1: mod(%x;%y):=NA(%x;%y); (replace the mod relation between the nodes %x and %y by a NA relation between the same nodes)
- RESULT:
- INITIAL STATE: mod(book,beautiful)mod(book,new)
- STATE#1: NA(book,beautiful)mod(book,new) (the RULE#1 is the first applicable rule to appear in the grammar and, therefore, is the first one to be applied, and it will apply to the topmost relation, i.e., the first one to appear in the graph)
- STATE#2: NA(book,beautiful)NA(book,new) (the RULE#1 applies a second time, because there is a second mod relation to be replaced)
- FINAL STATE: NA(book,beautiful)NA(book,new)
- List structure
- For instance:
- RECURSIVENESS
- Rules are applied recursively as long as their conditions are true. Because of that, special attention should be paid to ADD rules:
(%x,A):=(%x,+B);(creates an infinite loop, because the feature B will be added infinitely to the node %x)
In order to avoid the endless repetition, the condition side must be changed to (%x,A,^B):=(%x,+B); (the rule applies only once)
REL1(%x;%y):=+REL2(%x;%z);(creates an infinite loop, because the REL2 will be added infinitely to the graph)
In order to avoid the endless repetition, the condition side must be changed to REL1(%x;%y)^REL(%x;%z):=+REL2(%x;%z); OR REL1(%x,^BREAK;%y):=+REL2(%x,+BREAK;%z); (the rule applies only once)
- COMPREHENSIVENESS
- Grammars are applied comprehensively as long as there is at least one applicable rule.
- CONSERVATION
- Rules affect only the information clearly specified. No relation, node or feature is deleted unless explicitly informed.
- For instance, in the examples below, the source node of the “agt” relation preserves, in all cases, the value “a”. The only change concerns the feature “c”, which is added to the source node of the “agt” in the first two cases; and the feature “b”, which is deleted from the target node in the third case.
- agt(a;b):=agt(c;);
- agt(a;b):=agt(+c;);
- agt(a;b):=agt(;-b);
- In any case, the ADD and DELETE rules (i.e., when the right side starts with “+” or “-“) preserve the items in the left side, except for the explicitly deleted ones:
- INPUT: agt(%x;%y) obj(%x;%z) tim(%x;%w)
- RULE: agt(%x;%y) ^mod(%x;%k):=+mod(%x;%k);
- OUTPUT: agt(%x;%y) obj(%x;%z) tim(%x;%w) mod(%x;%k)
- or
- INPUT: agt(%x;%y) obj(%x;%z) tim(%x;%w)
- RULE: agt(%x;%y):=-agt(%x;%y);
- OUTPUT: obj(%x;%z) tim(%x;%w)
- SCOPE
- LL and LT rules apply over nodes, whereas NN, TT, NT, TN and TL rules apply over relations.
- Nodes may be deleted only through LL and LT rules (i.e., when appearing in the left side of rules).
- (A):=; (the node containing the feature "A" is deleted)
- (%A,A):=-(%A); (the node containing the feature "A" is deleted - the same as above)
- (%A,A)(%B,B):=(%A); (the node containing the feature "B" is deleted because not present in the right side - see indexes)
- (%A,A)(%B,B)(%C,C):=REL(%A;%B); (the node containing the feature "C" is deleted, because not present in the right side - see indexes)
- Nodes may not be deleted through NN, TT, NT, TN and TL rules (i.e., when not appearing in the left side of rules).
- REL(%A,A;%B,B;%C,C):=(%A)(%B); (the relation between the nodes containing the features "A", "B" and "C" is replaced by a linear relation between the nodes containing the features "A" and "B". The node "C", however, is not deleted, even though absent from the right side - see indexes)
- Relations may be deleted directly through NN, TT, NT, TN and TL rules, and indirectly through LL and LT rules (when their nodes are deleted).
- REL(A;B):=; (The relation between the nodes containing the features "A" and "B" is deleted, but the nodes are preserved)
- REL(A;B)REL(%C,C;%D,D):=REL(%C;%D); (The relation between the nodes containing the features "A" and "B" is deleted; its nodes and the relation between the nodes containing the features "C" and "D" are preserved)
- (A):=; (The node containing the feature "A" is deleted, as well as all relations in which it figures as an argument)
- Nodes may be deleted only through LL and LT rules (i.e., when appearing in the left side of rules).
- INDEXATION
- All instances of the same node must be co-indexed (or they will be considered different nodes). See indexes.
- ACTION
- Rules may add or delete values to the source and the target nodes, but only in the right side items:
- agt(a;b):=agt(+c;);
- agt(a;b):=agt(;-b);
- CONJUNCTION
- Both the left and the right side of the rule may have as many items as necessary, as exemplified below:
- SEM(A;B)SEM(C;D)SEM(E;F):=SEM(G;H)SEM(I;J)SEM(K;L);
- DISJUNCTION
- The left side of the rules may bring disjuncts, but not the right side. Disjuncts must be represented between {braces} and must be separated by |.
- {SEM(A;B)|SEM(C;D)}^SEM(E;F):=+SEM(E;F);
- SEM(A;B){SEM(C;D)|SEM(E;F)}:=-SEM(A;B);
- agt(VER,{V01|V02};NOU,^SNG}:=;
- REGULAR EXPRESSIONS
- The left side of the rules may bring [Perl Compatible Regular Expressions] between "/", as indicated below:
- /(agt|obj|aoj)/(A,%a;B,%b):=VS(%a;%b);
- The rule above applies in case of agt(A;B), obj(A;B) and aoj(A;B)
- /[a-z]{2,3}/(A,%a;B,%b):=VS(%a;%b);
- The rule above applies in case of any sequence of two or three alphabetic characters in the position of relation of A and B
- agt(/(VER|NOU)/,%a;%b):=VS(%a;%b);
- The rule above applies in case of VER and NOU as features of the first node of the relation "agt"
- agt(POS=/(VER|NOU)/,%a;%b):=VS(%a;%b);
- The rule above applies in case of VER and NOU as values of the attribute POS of the first node of the relation "agt"
- /(agt|obj|aoj)/(A,%a;B,%b):=VS(%a;%b);
- CONCISION
- In order for rules to be as small as possible, the source and the target nodes may be simple place-holders or indexes:
- In the DELETE rules, the right side may be omitted in case of deletion of the entire left side:
- obj(PRE;):=;
- READABILITY
- There can be blank spaces between variables and symbols. Comments can be added after the “;”.
- obj ( ; ) := ; this rule deletes every “obj” relation.
- COMMUTATIVITY
- Inside the same side of NN, NT, TT and TN rules, the order of the factors does not affect the result[1]
- SEM(A;B):=SEM(C;D)SEM(E;F); IS EQUIVALENT TO SEM(A;B):= SEM(E;F)SEM(C;D);
- SYN(A;B):=SYN(C;D)SYN(E;F); IS EQUIVALENT TO SYN(A;B):= SYN(E;F)SYN(C;D);
- The order of the factors affect the result in case list-processing rules (LL, LT and TL):
- (A):=(B)(C); IS DIFFERENT FROM (A):=(C)(B);
- SYN(A;B):=(C)(D); IS DIFFERENT FROM SYN(A;B):=(D)(C);
- (C)(D):=SYN(A;B); IS DIFFERENT FROM (D)(C):=SYN(A;B);
- Additionally, the order of the features inside a relation does not affect the end result, but the order of the nodes is non-commutative.
- SEM( VER,TRA ; NOU,MCL ) IS THE SAME AS SEM( TRA,VER ; MCL,NOU )
- But:
- SEM( VER,TRA ; NOU,MCL) IS DIFFERENT FROM SEM( NOU,MCL ; VER,TRA )
- DICTIONARY ATTRIBUTES
- Dictionary attributes can be used as variables (see indexes).
- SYN(%x,^NUM;%y,NUM):=SYN(NUM=%y;%x);
- DICTIONARY RULES (see also Inflection rules inside dictionary entries)
- Dictionary rules are triggered by "!"<ATTRIBUTE>:
- Dictionary
- [foot] "foot" (NOU, NUM(PLR:=”oo”:”ee”)) <eng,0,0>;
- [city] "city" (NOU, NUM(PLR:=”y”>”ies”)) <eng,0,0>;
- Grammar
- (@pl, NUM):=(!NUM,-@pl);
- Output
- foot>feet
- city>cities
- Dictionary
- NLW SPLITTING
- Sub-NLWs in complex entries are referred by # (see indexes).
- Dictionary
- [[bring] [back]] "bring back" (VER,MTW,VA(01>02), #01(HEAD,VER), #02(ADJT,PP)) <eng,0,0>;
- Grammar
- VC(VER,MTW,VA(01>02),%head;NOU,%obj):=VB(VC(%head#01;%obj);%head#02);
- Dictionary
Formal Syntax of Transformation Rules
<TRANSFORMATION RULE> ::= <NN RULE> | <NT RULE> | <TT RULE> | <TL RULE> | <LL RULE> | <LT RULE> | <TN RULE> <NN RULE> ::= (<SEM>)+ ":=" ( ("-"|"+")? <SEM> )* ";" <TT RULE> ::= (<SYN>)+ ":=" ( ("-"|"+")? <SYN> )* ";" <LL RULE> ::= ( "(" <NODE> ")" )+ ":=" ( ("-"|"+")? "(" <NODE> ")" )* ";" <NT RULE> ::= (<SEM>)+ ":=" ( <SYN> )+ ";" <TN RULE> ::= (<SYN>)+ ":=" ( <SEM> )+ ";" <TL RULE> ::= (<SYN>)+ ":=" ( "(" <NODE> ")" )+ ";" <LT RULE> ::= ( "(" <NODE> ")" )+ ":=" ( <SYN> )+ ";" <SEM> ::= <TEXT> "(" <NODE> ";" <NODE> ")" <SYN> ::= <TEXT> "(" <NODE> ";" <NODE> ")" <NODE> ::= ( (<DESCRIPTION>)( "," <DESCRIPTION> )* )? <DESCRIPTION> ::= <STRING> | <ENTRY> | <SUB-ENTRY> | <FEATURE> | <INDEX> | <RELATION> <STRING> ::= """<text>""" <ENTRY> ::= "["<entry>"]" <SUB-ENTRY> ::= <INDEX>"#"[01-99] <FEATURE> ::= <VALUE> | <ATTRIBUTE> | <ATTRIBUTE>"="<VALUE> <INDEX> ::= ( "%"([01-99]|[a-zA-Z_]+) )+ <RELATION> ::= <SEM>|<SYN> <VALUE> ::= <TEXT> <ATTRIBUTE> ::= <TEXT> <TEXT> ::= any sequence of characters except whitespace | <REGULAR EXPRESSION> <REGULAR EXPRESSION> ::= "/"<PERL COMPATIBLE REGULAR EXPRESSIONS>"/"
Where:
"" = constant
+ = to be repeated one or more times
* = to be repeated zero or more times
? = to be repeated zero or one time
| = or
[x-y] = from x to y
Examples
- LL rules
- (BLK):=; deletes a node that has the feature BLK
- (A):=(+B); add the feature B to a node having the feature A
- (A):=(-A); remove the feature A from a node having the feature A
- (A):=(B)(C); replaces a node having the feature A by two nodes having the features B and C, respectively
- (A)(B):=(C); replaces two nodes having the features A and B, respectively, by a node having the feature C
- TT rules
- XB(A;B):=; deletes the relation XB between two nodes having the features A e B, respectively]
- XB(A;B):=XA(C;D); replaces the relation XB between two nodes having the features A and B, respectively, by a a relation XA between two nodes having the features C and D
- XB(A;B):=XB(+C;-B); adds the feature C to the source argument and removes the feature B from the target argument of a relation XB where the the source node has the feature A and the target node has the feature B
- XB(A;B):=+XC(C;D); adds the relation XC between two nodes having the features C and D, respectively, to the graph where exists a relation XB between two nodes having the feature A and B, respectively
- NN rules
- agt(A;B):=; deletes the relation agt between two nodes having the features A e B, respectively]
- agt(A;B):=obj(C;D); replaces the relation agt between two nodes having the features A and B, respectively, by a a relation obj between two nodes having the features C and D
- agt(A;B):=agt(+C;-B); adds the feature C to the source argument and removes the feature B from the target argument of a relation agt where the the source node has the feature A and the target node has the feature B
- LT rules
- (A)(B):=XB(C;D); replaces two nodes having the features A and B respectively by a relation XB between two nodes having the features C and D, respectively
- TL rules
- XB(C;D):=(A)(B); replaces the relation XB between two nodes having the features C and D, respectively, by two nodes having the features A and B, respectively
- TN rules
- XB(%a;%b):=agt(%a;%b); replaces the relation XB between two nodes %a and %b by a relation agt between the same nodes
- NT rules
- agt(%a;%b):=XB(%a;%b); replaces the relation agt between two nodes %a and %b by a relation XB between the same nodes
Special types of T-rules
According to their behavior, T-rules may also be classified in:
- A-rules (affixation rules) apply over isolated word forms (as to generate possible inflections);
- L-rules (linear rules) apply over lists of word forms (as to provide transformations in the surface structure);
- S-rules (syntactic rules) apply over trees (as to modify the syntactic configuration).
Further information
For further information on T-rules, see the UNL Grammar Specs