Transformation over nodes

From UNL Wiki

(Difference between revisions)

Revision as of 09:57, 3 December 2013

Nodes are altered, replaced, created and deleted by T-rules:

(""):=(+"a"); (the string of the node is set from "" to "a")
(""):=("a"); (the same as above)
():=("a"); (the same as above)
("a"):=(+"b"); (the string of the node is set from "a" to "b")
("a"):=("b"); (the same as above)
("a"):=(-"a"); (the string of the node is reset, i.e., it changes from "a" to "")
("a"):=(""); (the same as above)
("a"):=("x"<0,0>"y"); (the string of the node is modified from "a" to "xay")

[headwords]

Nodes may have one single headword. This value is set through the operator "+" and reset through the operator "-". The operator "+" may be omitted. Headwords may not be modified through A-rules. Changes to headwords do not affect any other element (strings, UWs and features)

([]):=(+[a]); (the headword of the node is set from [] to [a])
([]):=([a]); (the same as above)
():=([a]); (the same as above)
([a]):=(+[b]); (the headword of the node is set from [a] to [b])
([a]):=([b]); (the same as above)
([a]):=(-[a]); (the headword of the node is reset, i.e., it changes from [a] to [])
([a]):=([]); (the same as above)
~~([a]):=("x"<0,0>"y");~~ (it is not possible to modify headwords through A-rules.

[[UWs]]

Nodes may have one single UW. This value is set through the operator "+" and reset through the operator "-". The operator "+" may be omitted. UWs may not be modified through A-rules. Changes to UWs do not affect any other element (strings, headwords and features)

([[]]):=(+[[a]]); (the UW of the node is set from [[]] to [[a]])
([[]]):=([[a]]); (the same as above)
():=([[a]]); (the same as above)
([[a]]):=(+[[b]]); (the UW of the node is set from [[a]] to [[b]])
([[a]]):=([[b]]); (the same as above)
([[a]]):=(-[[a]]); (the UW of the node is reset, i.e., it changes from [[a]] to [[]])
([[a]]):=([[]]); (the same as above)
~~([a]):=("x"<0,0>"y");~~ (it is not possible to modify UWs through A-rules.

Features: Nodes may have as many features as necessary. Features may come isolated in a list (POS,NOU,GEN,MCL,NUM,SNG) or as pairs of attribute=value (POS=NOU,GEN=MCL,NUM=SNG). Changes to features do not affect any other element (strings, headwords and UWs)

Adding features to nodes
Features are added through the operator + (add). The operator "+" may be omitted.
- ():=(+B); (add the feature B to the node)
- ():=(B); (the same as above)
Rules are recursive: the feature will be added to the node while the condition is true.
- ():=(+B); (the resulting node is (B,B,B,...), i.e., this is an infinite loop)
- (^B):=(+B); (the resulting node is (B), i.e., the feature B is added only if the node does not contain it yet)
The operator + (add) does not create attribute=value pairs automatically (it simply adds features to the nodes)
- (%x,^POS,^NOU):=(%x,+NOU); (the resulting node is (NOU) and not (POS=NOU) because POS has not been added)
- (%x,^POS,^NOU):=(%x,+POS,+NOU); (the resulting node is (POS,NOU) and not (POS=NOU) because there was no assignment POS=NOU)
- (%x,POS,^NOU):=(%x,+NOU); (the resulting node is (POS,NOU) because there was no assignment POS=NOU)
- (%x,^POS,^NOU):=(%x,+POS=NOU); (the resulting node is (POS=NOU))
- (%x,POS,^NOU):=(%x,+POS=NOU); (the resulting node is (POS, POS=NOU) because the feature POS has been duplicated)
Deleting features from nodes
Features are deleted through the operator - (delete).
():=(-B); (delete the feature B from the node)
Rules are recursive: the feature will be deleted from the node while the condition is true.
():=(-B); (the rule will delete all instances of the feature B from the node, i.e., the node (B,B,B,B,B) will become ()
The operator "-" may also be used to reset attributes:
- (%x,POS,NOU):=(%x,-NOU); (the resulting node is (POS) because the feature POS was not deleted)
- (%x,POS,NOU):=(%x,-POS,-NOU); (the resulting node is () because both features POS and NOU were deleted)
- (%x,POS=NOU):=(%x,-NOU); (the resulting node is (POS) because only the value of the attribute POS was deleted)
- (%x,POS=NOU):=(%x,-POS); (the resulting node is () because the attribute POS was deleted with all its values)

Indexes

Indexes are used to make reference to the whole node instead of its elements. Any change in the index means a completely new node, and no element is preserved.

(%x,"a"):=(%x,"b"); (the string of the node %x is set from "a" to "b"; all the other elements of the node %x are preserved)
(%x,"a"):=(%y,"b"); (the whole node %x is replaced by a new node %y whose string is "b"; no element from %x is copied to %y)

Deleting nodes

In linear rules, nodes are deleted if they are not repeated (co-indexed) in the right side:

(%x)(%y):=(%x); (the node %y will be deleted)

In other rules, nodes are deleted if they are not repeated (co-indexed) in the right side and are not part of any other relation:

rel(%x;%y):=rel(%x); (the node %y will be deleted if, and only if, it is not part of any other relation)

Creating nodes

Nodes are created through the use of new indexes in the right side:

("a",%x)("b",%y):=(%x)(%y)("c",%z); (the node %z will be created)
("a",%x)("b",%y):=(%x)("c",%z); (the node %z will be created, and %b will be deleted)

Duplicating (cloning) nodes

Nodes may be duplicated by repeating indexes on the right side along with the command #CLONE:

("a",^CLONED,%x):=(%x,+CLONED)(%x,+CLONED,#CLONE);
("a") becomes ("a")("a")

In order to avoid infinite recursion, it is important to change the condition on the right side (in the example above, the feature +CLONED, assigned to all instances of the clone, prevents the rule from applying indefinitely)
Clones contain the same elements of the original nodes, unless they are explicitly altered during the cloning:

("a",[a],[[a]],A,^CLONED,%x):=(%x,+CLONED)(%x,+CLONED,#CLONE);
("a",[a],[[a]],A) becomes ("a",[a],[[a]],A,CLONED)("a",[a],[[a]],A,CLONED)
(A,^CLONED,%x):=(%x,-A,+B,+CLONED)(%x,-A,+C,+CLONED,#CLONE);
(A) becomes (B,CLONED)(C,CLONED)

Splitting nodes

One node may be split into two or more nodes through the use of splitting rules. Consider, for instance, the cases below:

Splitting rules deal only with strings and apply only to nodes with the feature TEMP.: Original node: ("abc",TEMP); Split rule: ("abc"):=("ab")("c");; Resulting nodes: ("ab",TEMP)("c",TEMP);; However, if the original node was ("abc"), without TEMP, the rule would not have been applied (i.e., it is necessary to assign the feature TEMP to the node before splitting it)
Splitting rules are conservative: the elements of the original node, except the string, will be preserved unless explicitly altered.: Original node: ("abc",[abc],[[abc]],A,B,C,TEMP); Split rule: ("abc"):=("ab")("c");; Resulting nodes: ("ab",[abc],[[abc]],A,B,C,TEMP)("c",[abc],[[abc]],A,B,C,TEMP) (i.e., the elements of the original node will be copied to the new nodes); However, if the rule was: ("abc"):=("ab",-A,-B,-C,-TEMP,+AB)("c",-TEMP);; The result would be: ("ab",[abc],[[abc]],AB)("c",[abc],[[abc]],C,TEMP)

Merging nodes (&)

Two or more nodes may be merged by the command &:

(%x)(%y)(%z):=(%x&%y&%z);

In the example above("a")("b")("c") becomes ("abc")

Merge operations concatenate headwords and UWs, and join features

("hw1",[[uw1]],F1,%x)("hw2",[[uw2]],F2,%y)("hw3",[[uw3]],F3,%z):=(%x&%y&%z);
The resulting node is ("hw1hw2hw3",[[uw1uw2uw3]],F1,F2,F3)

Compare the difference

(%x)(%y):=(%z); (the nodes %x and %y are replaced by %z, and their features are lost unless explicitly included in %z)
(%x)(%y):=(%x&%y); (the nodes %x and %y are merged)

Retrieving entries in the dictionary after tokenization (?)

During transformation (i.e., after tokenization), dictionary entries may be accessed from transformation rules by the command "?"

(?[headword]) retrieves the first entry in the dictionary with the headword "headword"
(?[[uw]]) retrieves the first entry in the dictionary with the UW "uw"
(?[headword],?[[uw]],?feature) retrieves the first entry in the dictionary with the headword "headword", the UW "uw" and the feature "feature"

Regular expressions, variables and disjunction may also be used in dictionary search

(?[/abcd./]) retrieves the first entry in the dictionary whose headword has 5 characters and begins with "abcd" (this works only in natural language generation)
(?[[/abcd./]]) retrieves the first entry in the dictionary whose UW has 5 characters and begins with "abcd" (this works only in natural language analysis)

Obligatory parameters: Due to the indexation algorithm, the headword is obligatory in IAN and the UW is obligatory in EUGENE:

(?[headword]) will work only in IAN
(?[[uw]]) will work only in EUGENE
(?feature) will not work in IAN or EUGENE

Variables: In order to avoid repetition, dictionary look-up may use the values of indexed nodes in the left side

(?[%x]) retrieves the first entry in the dictionary with the same headword of the node %x
(?[[%x]]) retrieves the first entry in the dictionary with the same UW of the node %x
(?[%x],ATT=%x) retrieves the first entry in the dictionarz with the same headword of the node %x and whose attribute ATT has the same value of the attribute ATT of the node %x

Example

Dictionary search is used mainly in natural language generation

(N,NUM,GEN,@def,%noun):=(?[[]],?ART,?DEF,?NUM=%noun,?GEN=%noun)(%noun,-@def);

In case of node %noun with the features noun (N), number (NUM) and gender (GEN), and with the attribute @def (definite), search the first entry in the dictionary associated with the UW "" (empty UW) with the features ART and DEF, and whose attributes NUM and GEN have the same values of the ones of the node %noun, and insert it in front of the noun. Remove @def from the noun in order to avoid an infinite loop.

Triggering rules (!)

Inflectional rules are triggered in the grammar by the command "!"<ATTRIBUTE>.
Given the dictionary entry:

[foot] "foot" (POS=NOU, NUM(PLR:="oo":"ee")) <eng,0,0>;

The rule NUM(PLR:="oo":"ee") is triggered by !NUM
For instance:

(NUM=PLR,^inflected):=(!NUM,+inflected); or
(PLR,^inflected):=(!NUM,+inflected); or
(NUM,^inflected):=(!NUM,+inflected);

In the first case (NUM=PLR), the system verifies if the attribute "NUM" is set and if it has the value "PLR". In the second and in the third case, the system simply verifies if the word has any feature (attribute or value) equal to "PLR" or "NUM".
It's important to stress that, as the features of the dictionary are defined by the user, there is no way of pre-assigning attribute-value pairs. In that sense, it's not possible to infer that "PLR" will be a value of the attribute "NUM" except through an assignment of the form "NUM=PLR" (i.e., given only "PLR" or "NUM", is not possible to state "NUM=PLR").

@@ Line 59: / Line 59: @@
 *:*(%x,POS,NOU):=(%x,-POS,-NOU); (the resulting node is () because both features POS and NOU were deleted)
 *:*(%x,POS=NOU):=(%x,-NOU); (the resulting node is (POS) because only the value of the attribute POS was deleted)
-*:*(%x,POS=NOU):=(%X,-POS); (the resulting node is () because the attribute POS was deleted with all its values)
+*:*(%x,POS=NOU):=(%x,-POS); (the resulting node is () because the attribute POS was deleted with all its values)
 ;Indexes
 :Indexes are used to make reference to the whole node instead of its elements. Any change in the index means a completely new node, and no element is preserved.

Transformation over nodes

Revision as of 09:57, 3 December 2013

Contents

Altering elements of nodes

Deleting nodes

Creating nodes

Duplicating (cloning) nodes

Splitting nodes

Merging nodes (&)

Retrieving entries in the dictionary after tokenization (?)

Triggering rules (!)

Views

Personal tools

Search

UNL

Lingware

Software

UNL Program

Navigation

Toolbox

Print/export