L-rule

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(+header for clarity)
 
(109 intermediate revisions by 5 users not shown)
Line 1: Line 1:
'''Ph-rule''' (phonetic rule) is the formalism used for generating spelling changes in the UNL<sup>arium</sup> framework.  
+
'''L-rule''' (linear rule) is a specific type of [[transformation rule]] used for applying transformations over ordered sequences of isolated nodes.  
== When to use Ph-rules ==
+
 
Ph-rules are used for generating sound changes that produce spelling changes (such as "a">"an", "I am">"I'm", etc). They are also used to generate spelling conventions, such as the use of capital letters and punctuation marks.
+
== When to use L-rules ==
 +
L-rules are used for:
 +
*reordering nodes in a list: a b c > a c b
 +
*replacing nodes in a list: a b c > a x c
 +
*adding nodes in a list: a b c > a x b c
 +
*deleting nodes in a list: a b c > a c
 +
 
 +
== When not to use L-rules ==
 +
L-rules are not used in transformations over structures other than lists (i.e., trees and graphs). In these cases, we use [[S-rules]] (syntactic rules).
  
== When not to use Ph-rules ==
 
Ph-rules are not to be used for sound changes that do not affect spelling.
 
 
== Syntax ==
 
== Syntax ==
The general syntax for Ph-rules is the following:
+
L-rules comply with the syntax below:
<br>
+
(<NODE>)(<NODE>)...(<NODE>) := (<NODE>)(<NODE>)...(<NODE>);
(CONDITION) := (ACTION);
+
 
Where:
 
Where:
*CONDITION is a single form or a sequence of forms over which actions will take place; and
+
*<NODE> is a [[node]]  
*ACTION is the action to be performed over each form or sequence of forms of the CONDITION.
+
*the left side of the operator := states the condition
CONDITION and ACTION may be expressed as:
+
*the right side of the operator := states the action (replacement, change, creation, deletion, division, merge) to be performed over the condition.
*a character or string of characters, between quotes: ("a");
+
*a tag or list of tags, extracted from the [[tagset|UNDL Foundation tagset]]: (VOW);
+
*a combination of characters and tags: ("a",PRE);
+
Examples:
+
*("Mr."):=("Mister"); (replace "Mr." by "Mister")
+
*("doctor"):=("dr."); (replace "doctor" by "dr.")
+
  
;Conditions and actions must always come between parentheses
+
== Examples ==
*("Mr."):=("Mister");
+
{|border="1" align="center" cellpadding="2"
*<strike>"Mr.":="Mister";</strike>
+
 
+
;Context-sensitiveness
+
Ph-rules are normally sensitive to the context and apply over a set of conditions rather than over isolated word forms. In this case, each separate word form must be isolated between parentheses and described as a different condition.
+
*<strike>("I am"):=("I'm)</strike>;
+
*("I")(BLK)("am"):=("I'm");
+
 
+
== Types of Ph-rules ==
+
There are basically three types of Ph-rules:
+
*'''replacement''', when the number of parentheses in the CONDITION field is equal to the number of parentheses in the ACTION field:
+
*'''addition''', when the number of parentheses in the CONDITION field is lower than the number of parentheses in the ACTION field;
+
*'''deletion''', when the number of parentheses in the CONDITION field is greater than the number parentheses in the ACTION field.
+
Parentheses are automatically co-indexed between the CONDITION and the ACTION field, so that the first pair of parentheses of the CONDITION field corresponds to the first pair of parentheses of the ACTION field, and so on. This means that parentheses are to be repeated on the right side of a Ph-rule if they are not expected to be deleted. In order to control the process of adding, deleting and reordering parentheses, they must be referred by the index "%N" where is the order of appearance in the left side:
+
{|border=1; cellpadding=2
+
 
|+Examples
 
|+Examples
!RULE
+
!width=20%|RULE
!BEFORE > AFTER
+
!width=15%|BEFORE > AFTER
!DESCRIPTION
+
!width=65%|DESCRIPTION
 
|-
 
|-
 
|("a")("b")("c"):=("d")("e")("f");  
 
|("a")("b")("c"):=("d")("e")("f");  
Line 45: Line 30:
 
|"a" will be replaced by "d"; "b" by "e"; and "c" by "f"
 
|"a" will be replaced by "d"; "b" by "e"; and "c" by "f"
 
|-
 
|-
|("a")("b")("c"):=("d")()();  
+
|("a")("b")("c"):=("d")( )( );  
 
|abc > dbc
 
|abc > dbc
 
|"a" will be replaced by "d"; "b" and "c" will be preserved
 
|"a" will be replaced by "d"; "b" and "c" will be preserved
Line 51: Line 36:
 
|("a")("b")("c"):=("d")("")("");
 
|("a")("b")("c"):=("d")("")("");
 
|abc > d
 
|abc > d
|"a" will be replaced by "d"; "b" and "c" will be replaced by ""
+
|"a" will be replaced by "d"; "b" and "c" will be replaced by "" (i.e., blank)
 
|-
 
|-
|("a")("b")("c"):=("d")();  
+
|("a")("b")("c"):=("d",%01)(%02);  
|abc >  ab
+
|abc >  db
 
|"a" will be replaced by "d"; "b" will be preserved; "c" will be deleted
 
|"a" will be replaced by "d"; "b" will be preserved; "c" will be deleted
 
|-
 
|-
|("a")("b")("c"):=("d");  
+
|("a")("b")("c"):=("d",%01);  
 
|abc > d
 
|abc > d
 
|"a" will be replaced by "d"; "b" and "c" will be deleted
 
|"a" will be replaced by "d"; "b" and "c" will be deleted
Line 63: Line 48:
 
|("a")("b")("c"):=(%03)(%02)(%01);  
 
|("a")("b")("c"):=(%03)(%02)(%01);  
 
|abc > cba
 
|abc > cba
|"a", "b" and "c" will be preserved, but reordered: ("c")("b")("a")
+
|"a", "b" and "c" will be preserved, but reordered
 
|-
 
|-
|("a")("b")("c"):=("d")(%03);  
+
|("a")("b")("c"):=("d",%01)(%03);  
 
|abc > dc
 
|abc > dc
 
|"a" will be replaced by "d"; "b" will be deleted; "c" will be preserved
 
|"a" will be replaced by "d"; "b" will be deleted; "c" will be preserved
 
|-
 
|-
|("a")("b")("c"):=("d")("g")()();  
+
|("a")("b")("c"):=("d",%01)("g")(%02)(%03);  
|abc > dgc
+
|abc > dgbc
|"a" will be replaced by "d"; "b" will be replaced by "g"; "c" will be preserved; and new form will be generate after it
+
|"a" will be replaced by "d"; "b" and "c" will be preserved; and a new node "g" will be created between "a" and "b"
 
|-  
 
|-  
|("a")("b")("c"):=("d")("g")(%02)(%03);
+
|("a",ART)(BLK)("/[aeiou].*/"):=("an")( )( );
|abc > dgbc
+
|'''a''' adjective > '''an''' adjective
|"a" will be replaced by "d"; "g" will be generated after it; and then "b" and "c", which will be preserved
+
|replace the article (ART) "a" by "an" before a blank space (BLK) and a node starting with "a", "e", "i", "o" or "u"; preserve the second node (BLK) and the third node without any change
|}
+
=== Examples ===
+
{|border="1" align="center" cellpadding="2"
+
!CASE
+
!RULE
+
!BEHAVIOUR
+
!BEFORE
+
!AFTER
+
 
|-
 
|-
|width=50|Dissimilation
+
|("a",PRE)(BLK)("a",ART):=("à",PRE,ART,CTC);
|width=100|("a",ART)(BLK)(VOW):=("an")()();
+
|'''a''' '''a''' > '''à'''
|width=300|replace the article "a" by "an" before a blank space and a vowel
+
|replace the preposition (PRE) "a" + blank (BLK) + article (ART) "a" by "à"; add the features PRE (preposition), ART (article) and CTC (contraction) to the node "à"
|width=50|'''a''' adjective
+
|width=50|'''an''' adjective
+
 
|-
 
|-
|Crasis
+
|("de",PRE)(BLK)("le",ART):=("du",PRE,ART,CTC);
|("a",PRE)(BLK)("a",ART):=("à",ART,CTC);
+
|'''de''' '''le''' > '''du'''
|replace the preposition "a" in front of blank and "a" by "à"; add the features ART (article) and CTC (contraction); and delete the blank and the second "a"
+
|replace the preposition (PRE) "de" + blank (BLK) + article (ART) "le" by "du"; add the features PRE, ART and CTC to the node "du"
|'''a''' '''a'''
+
|'''à''' = (PRE,ART,CTC)
+
 
|-
 
|-
|Contraction
+
|("a",VER)(BLK)("il",PPR):=( )("-t-",-BLK)( );
|("de",PRE)(BLK)("le",ART):=("du",ART,CTC);
+
|'''a il''' > '''a-t-il'''
|replace the preposition "de" in front of blank and "le" by "du"; add the features ART and CTC; and delete the blank and "le"
+
|replace the blank space (BLK) between the verb (VER) "a" and the pronoun (PPR) "il" by "-t-"; remove the feature BLK from the second form; preserve the first and the third form without any change
|'''de''' '''le'''
+
|'''du''' = (PRE,ART,CTC)
+
 
|-
 
|-
|Epenthesis
+
|("de",PRE)(BLK)("/[aeiou].*/"):=("d'",%01)(%03);
|("a",VER)(BLK)("il",PPR):=()("-t-",-BLK)();
+
|'''de avoir''' > '''d'avoir'''
|replace the blank space between the verb "a" and the pronoun "il" by "-t-"
+
|replace the preposition (PRE) "de" + blank space (BLK) + a node starting with "a", "e", "i", "o" or "u" by "d'"; delete the second form (BLK); and preserve the third form (%03) without any change
|'''a il'''
+
|'''a-t-il'''
+
|-
+
|Elision
+
|("de",PRE)(BLK)(VOW):=("d'")(%3);
+
|replace the preposition "de" before a blank space and a vowel by "d'" and delete the blank space
+
|'''de avoir'''
+
|'''d'avoir'''
+
 
|-
 
|-
 
|}
 
|}
=== Observations ===
 
;Rules will only be applied if all conditions are true:
 
:X:=”y”<”z”; ( “zabc” changes to “yabc”, but “abc” remains “abc” since there is no "z" to be replaced)
 
;String fields are necessarily continuous:
 
:X:=”aaa”<”xyz”; ( “xyzbbb” changes to “aaabbb”, but “bxbybz” remains “bxbybz” since there is no continuous string "xyz" to be replaced)
 
;Each action is applied only once (i.e, rules are not exhaustive)
 
:PLR:=0>”s”; ("X" becomes "Xs", and not "Xssssss...")
 
;The replacement rule applies only once to the same string:
 
:X:=”a”:”b”; ( “aaa” becomes “baa” and not “bbb”)
 
;In prefixation and suffixation rules, the part to be deleted may be represented by the number of characters (without quotes):
 
{|align=center cellpadding=2
 
|-
 
|width=150|PLR := “X”<””;
 
|=
 
|width=150|PLR := “X”<0;
 
|(ABC becomes XABC)
 
|-
 
|PLR:= “X”<”A”;
 
|=
 
|PLR:= “X”<1;
 
|(ABC becomes XBC)
 
|-
 
|PLR:= “XY”<”AB”;
 
|=
 
|PLR:= “XY”<2;
 
|(ABC becomes XYC)
 
|-
 
|PLR:=””>”X”;
 
|=
 
|PLR:= 0>”X”;
 
|(ABC becomes ABCX)
 
|-
 
|PLR:=”C”>”X”;
 
|=
 
|PLR:= 1>”X”;
 
|(ABC becomes ABX)
 
|-
 
|PLR:=”BC”>”XY”;
 
|=
 
|PLR:= 2>”XY”;
 
|(ABC becomes AXY)
 
|}
 
;In infixation rules, the position of the addition may be made with reference to the end of string by using "-".
 
{|border="1" align="center" cellpadding="2"
 
! RULE
 
! BEHAVIOR
 
! BEFORE
 
! AFTER
 
|-
 
|width=70| X:=[1]>"y";
 
|width=300| if X add "y" to the right of the first character
 
|width=50| abc
 
|width=50| a'''y'''bc
 
|-
 
|X:=[-1]>"y";
 
|if X add "y" to the right of the last character
 
|abc
 
|ab'''y'''c
 
|-
 
|X:="y"<[2];
 
|if X add "y" to the left of the second character
 
|abcde
 
|a'''y'''bc
 
|-
 
|X:="y"<[-2];
 
|if X add "y" to the left of the second character
 
|abcde
 
|abc'''y'''de
 
|}
 
;In replacement rules, the part to be deleted may be omitted if the whole string is to be replaced:
 
{|cellpadding=2
 
|-
 
|width=150|PLR:=”ABC”:”XYZ”;
 
|=
 
|width=150|PLR:=”XYZ”
 
|(ABC becomes XYZ)
 
|}
 
;In replacement rules, the part to be deleted may be represented by an interval of characters in the format [beginning-end]:
 
{|cellpadding=3
 
|-
 
|width=150|PLR:=”B”:”X”;
 
|=
 
|width=150|PLR:=[2-2]:”X”;
 
|(ABC becomes AXC)
 
|}
 
;The symbol “^” is used for negation (“^MCL” means “not MCL”):
 
:NOU&^MCL:=”x”:”y”; (If NOU and not MCL then replace “x” by “y”)
 
;“<<” and “>>” add blank spaces<ref>This feature is not supported by the UNL<sup>dev</sup> and it is automatically replaced, in the UNL<sup>arium</sup>, by a blank space.</ref>
 
:X:=”a”<<”b” (“bc” becomes “a bc” and not “abc”)
 
  
=== Common mistakes ===
+
== Transformations ==
*nou:= ”y”<”z”;  (WRONG: Tags are case sensitive)
+
{{:Transformation_over_nodes}}
*NNN:= ”y”<”z”;  (WRONG: NNN is not defined in the tagset)
+
*NOUFEM:=”y”<”z”; (WRONG: Tags must be separated by “&”)
+
*NOU,FEM:=”y”<”z”; (WRONG: Tags must be separated by “&”)
+
*NOU & FEM:=”y”<”z”;  (WRONG: There can be no blank spaces between tags)
+
*X:=1<1; (WRONG: The left side must always be a string in a prefixation rule)
+
*X:=1>1; (WRONG: The right side must always be a string in a suffixation rule)
+
*X:=1; (WRONG: Replacement rules do not allow for numbers)
+
*X:=1:1; (WRONG: Replacement rules do not allow for numbers)
+
  
== Complex a-rules ==
+
== Transformations over hyper-nodes ==
Complex a-rules are formed from the combination of simple a-rules:
+
{{:Transformation_over_hyper-nodes}}
*circumfixation (prefixation + suffixation), to add a prefix and a suffix at the same time
+
*prefixation + infixation, to add a prefix and a suffix at the same time
+
*infixation + suffixation, to add an infix and a suffix at the same time
+
*prefixation + infixation + suffixation, to add a prefix, an infix and a suffix at the same time
+
=== Syntax ===
+
Complex a-rules are formed by concatenating simple a-rules with ",":
+
<br>
+
<br>
+
'''circumfixation'''
+
CONDITION := “ADDED” < DELETED , DELETED > "ADDED";
+
'''prefixation + infixation'''
+
CONDITION := “ADDED” < DELETED , DELETED > "ADDED";
+
'''infixation + suffixation'''
+
CONDITION := DELETED > "ADDED" , "DELETED" > "ADDED";
+
etc.
+
  
=== Examples ===
+
== Properties ==
{|border="1" align="center" cellpadding="2"
+
#L-rules are recursive<nowiki>:</nowiki> rules will apply while conditions are true:
|+Complex m-rules
+
#:The rule "(BLK):=("-");" will transform "a b c d e" into "a-b-c-d-e" (and not only in "a-b c d e")
! RULE
+
#:The rule "(X):=(+Y);" will never stop (i.e., it contains an infinite loop): the feature Y will keep been added eternally (X,Y,Y,Y,Y,Y,Y,Y,...)
! BEHAVIOR
+
#The symbol '''^''' is used for negation and may be used to prevent infinite loops:
! BEFORE
+
#:*(X,^Y):=(+Y); (= add the feature Y to a node containing the feature X that does not contain the feature Y yet)
! AFTER
+
#:*(^".")(STAIL):=(%01)(".")(%02); (Add a period before the end of the sentence if there is not a period yet)
|-
+
#Rules are conservative. No feature is changed or deleted unless explicitly indicated through "-".
|width=100| X:=”x”<0, 0>"y";
+
#:In the rule ("x",A):=("y"); the string "x" is replaced by the string "y", but the feature A is not altered (i.e.,the final state will be ("y",A));
|width=300| if X add "x" to the beginning and "z" to the end of the string
+
#:In the rule ("x",A):=("y",A); the string "x" is replaced by the string "y" and the feature A is added to the node (i.e., the final state will be ("y",A,A))
|width=50| A
+
#:The rule "("a",ART)(BLK)("/a[bcd]e/"):=("an")( )( );" does not affect the status of the second and the third nodes. On the other hand, the rule "("a",VER)(BLK)("il",PPR):=( )("-t-",-BLK)( );" alters the status of the second node by deleting the feature BLK.
|width=50| '''x'''A'''y'''
+
#Regular expressions may be used in order to make reference to the elements of a node, but only in the left (condition) side.<br />
|-
+
#:("/[A-Z]/",%x)(".",%y):=(%x);  
| X:=”x”<0, "A":"y";
+
#:<strike>("/[A-Z]/")("."):=("/[A-Z]/");</strike>  
| if X add "x" to the beginning and replace "A" by "y"
+
#In the ACTION field, changes to the elements of a node may be expressed by the right side of [[A-rule]]s
| ABC
+
#:The rule "("a",ART)(BLK)("/[aeiou].*/"):=("an")( )( );" could also be expressed as "("a",ART)(BLK)("/[aeiou].*/"):=(0>"n")( )( );", i.e., the change from "a" to "an" could be expressed either by "an" or 0>"n".
| '''xy'''BC
+
#Rules apply only if all conditions are true.
|-
+
#:The rule "("a")(BLK)("/[aeiou].*/"):=("an")( )( );" will apply only in case of "a" before a blank and a node starting with "a", "e", "i", "o" or "u".
| X:="A":"y", 0>"x";
+
| if X replace "A" by "y" and add "x" to the end of the string
+
| ABC
+
| '''y'''BC'''x'''
+
|-
+
| X:=”x”<0, "A":"y", 0>"z";
+
| if X add "x" to the beginning, replace "A" by "y" and add "z" to the end of the string
+
| ABC
+
| '''xy'''BCz
+
|}
+
=== Observations ===
+
;Complex a-rules are also used to integrate different simple a-rules:
+
{|cellpadding=2 border=1 align=center
+
|-
+
|ORD:="1">"1st";<br>ORD:="2">"2nd";<br>ORD:="3">"3rd";
+
|ORD:="1">"1st", "2">"2nd", "3">"3rd";
+
|}
+
;Actions are applied from left to right (i.e., order is important)
+
:PLR := "s" > "ses", "y" > "ies"(kiss > kisses, city > cities)
+
:PLR := "y" > "ies", "s" > "ses";  (kiss > kisses, city>cities>citieses)
+
== Formal syntax ==
+
A-rules comply with the following syntax:
+
  
<A-RULE>          ::= <CONDITION> “:=” <ACTION> ("," <ACTION>)* “;”
+
== Indexes ==
<CONDITION>        ::= <ATAG>(“&”(“^”)?<ATAG>)*
+
See: [[Indexation]]
<ATAG>            ::= {one of the tags defined in the [[Tagset|UNDLF Tagset]]}
+
<ACTION>          ::= <PREFIXATION> | <SUFFIXATION> | <INFIXATION> | <REPLACEMENT>
+
<PREFIXATION>      ::= <ADDED> {“<” | “<<”} (<DELETED>)?
+
<SUFFIXATION>      ::= (<DELETED>)? {“>” | “>>”} <ADDED>
+
<INFIXATION>      ::= "["<DELETED"]" ">" <ADDED> | <ADDED> "<" "["<DELETED"]"
+
<REPLACEMENT>      ::= ( <STRING> ":" )? <ADDED> | "[" <INTEGER> "-" <INTEGER> "]" ":"  <ADDED>
+
<ADDED>            ::= <STRING>
+
<DELETED>          ::= <STRING> | <INTEGER> 
+
<STRING>          ::= “ “ “ [a..Z]+ “ “ “
+
<INTEGER>          ::= [0..9]+
+
  
where
+
== Common mistakes ==
 +
*<strike>"Mr":="Mister";</strike>
 +
**Conditions and actions must always come between parentheses: ("Mr"):=("Mister");
 +
*<strike>(Mr):=(Mister);</strike>
 +
**Constants must come between quotes (inside the parentheses): ("Mr"):=("Mister");
 +
*<strike>("Mr"):=("Mister")</strike>
 +
**Rules must end in semicolon: ("Mr"):=("Mister");
 +
*<strike>("I am"):=("I'm");</strike>
 +
**Each separate word form must be isolated between parentheses and described as a different condition: ("I")(BLK)("am"):=("I'm");
 +
*<strike><nowiki>("a",ART)(BLK)(VOW):=("an");</nowiki></strike>
 +
**"a adjective">"a": the blank and the following form are deleted because they are not present at the right side
 +
*<strike><nowiki>("de",PRE)(BLK)(VOW):=("d'")(VOW);</nowiki></strike>
 +
**"de avoir">"d' ": coindexation is based on ordering and not on features. The third form is deleted because it's not present at the right side; the second form, which is BLK, receives the feature VOW;
  
<a> = a is a non-terminal symbol<br />
 
“a“ = a is a constant<br />
 
a | b = a or b<br />
 
{ a | b } = either a or b<br />
 
(a)? = a can occur 0 or 1 time<br />
 
(a)* = a can be repeated 0 or more times<br />
 
(a)+ = a can be repeated 1 or more times<br />
 
  
== Notes ==
+
== N-rules and L-rules ==
<references/>
+
{{:Difference between N-rules and L-rules}}

Latest revision as of 11:41, 15 February 2014

L-rule (linear rule) is a specific type of transformation rule used for applying transformations over ordered sequences of isolated nodes.

Contents

When to use L-rules

L-rules are used for:

  • reordering nodes in a list: a b c > a c b
  • replacing nodes in a list: a b c > a x c
  • adding nodes in a list: a b c > a x b c
  • deleting nodes in a list: a b c > a c

When not to use L-rules

L-rules are not used in transformations over structures other than lists (i.e., trees and graphs). In these cases, we use S-rules (syntactic rules).

Syntax

L-rules comply with the syntax below:

(<NODE>)(<NODE>)...(<NODE>) := (<NODE>)(<NODE>)...(<NODE>);

Where:

  • <NODE> is a node
  • the left side of the operator := states the condition
  • the right side of the operator := states the action (replacement, change, creation, deletion, division, merge) to be performed over the condition.

Examples

Examples
RULE BEFORE > AFTER DESCRIPTION
("a")("b")("c"):=("d")("e")("f"); abc > def "a" will be replaced by "d"; "b" by "e"; and "c" by "f"
("a")("b")("c"):=("d")( )( ); abc > dbc "a" will be replaced by "d"; "b" and "c" will be preserved
("a")("b")("c"):=("d")("")(""); abc > d "a" will be replaced by "d"; "b" and "c" will be replaced by "" (i.e., blank)
("a")("b")("c"):=("d",%01)(%02); abc > db "a" will be replaced by "d"; "b" will be preserved; "c" will be deleted
("a")("b")("c"):=("d",%01); abc > d "a" will be replaced by "d"; "b" and "c" will be deleted
("a")("b")("c"):=(%03)(%02)(%01); abc > cba "a", "b" and "c" will be preserved, but reordered
("a")("b")("c"):=("d",%01)(%03); abc > dc "a" will be replaced by "d"; "b" will be deleted; "c" will be preserved
("a")("b")("c"):=("d",%01)("g")(%02)(%03); abc > dgbc "a" will be replaced by "d"; "b" and "c" will be preserved; and a new node "g" will be created between "a" and "b"
("a",ART)(BLK)("/[aeiou].*/"):=("an")( )( ); a adjective > an adjective replace the article (ART) "a" by "an" before a blank space (BLK) and a node starting with "a", "e", "i", "o" or "u"; preserve the second node (BLK) and the third node without any change
("a",PRE)(BLK)("a",ART):=("à",PRE,ART,CTC); a a > à replace the preposition (PRE) "a" + blank (BLK) + article (ART) "a" by "à"; add the features PRE (preposition), ART (article) and CTC (contraction) to the node "à"
("de",PRE)(BLK)("le",ART):=("du",PRE,ART,CTC); de le > du replace the preposition (PRE) "de" + blank (BLK) + article (ART) "le" by "du"; add the features PRE, ART and CTC to the node "du"
("a",VER)(BLK)("il",PPR):=( )("-t-",-BLK)( ); a il > a-t-il replace the blank space (BLK) between the verb (VER) "a" and the pronoun (PPR) "il" by "-t-"; remove the feature BLK from the second form; preserve the first and the third form without any change
("de",PRE)(BLK)("/[aeiou].*/"):=("d'",%01)(%03); de avoir > d'avoir replace the preposition (PRE) "de" + blank space (BLK) + a node starting with "a", "e", "i", "o" or "u" by "d'"; delete the second form (BLK); and preserve the third form (%03) without any change

Transformations

Nodes are altered, replaced, created and deleted by T-rules:

Altering elements of nodes

Any changes to the elements of nodes must be stated in the right side of rules. Changes affect only the elements explicitly indicated in rules:

("a",[a],[[a]],A):=("b"); (only the string is affected by the rule; all other elements are preserved. The resulting node is ("b",[a],[[a]],A) )
"Strings"
Nodes may have one single string. This value is set through the operator "+", reset through the operator "-", or modified through A-rules. The operator "+" may be omitted. Changes to strings do not affect any other element (headwords, UWs and features)
  • (""):=(+"a"); (the string of the node is set from "" to "a")
  • (""):=("a"); (the same as above)
  • ():=("a"); (the same as above)
  • ("a"):=(+"b"); (the string of the node is set from "a" to "b")
  • ("a"):=("b"); (the same as above)
  • ("a"):=(-"a"); (the string of the node is reset, i.e., it changes from "a" to "")
  • ("a"):=(""); (the same as above)
  • ("a"):=("x"<0,0>"y"); (the string of the node is modified from "a" to "xay")
[headwords]
Nodes may have one single headword. This value is set through the operator "+" and reset through the operator "-". The operator "+" may be omitted. Headwords may not be modified through A-rules. Changes to headwords do not affect any other element (strings, UWs and features)
  • ([]):=(+[a]); (the headword of the node is set from [] to [a])
  • ([]):=([a]); (the same as above)
  • ():=([a]); (the same as above)
  • ([a]):=(+[b]); (the headword of the node is set from [a] to [b])
  • ([a]):=([b]); (the same as above)
  • ([a]):=(-[a]); (the headword of the node is reset, i.e., it changes from [a] to [])
  • ([a]):=([]); (the same as above)
  • ([a]):=("x"<0,0>"y"); (it is not possible to modify headwords through A-rules.
[[UWs]]
Nodes may have one single UW. This value is set through the operator "+" and reset through the operator "-". The operator "+" may be omitted. UWs may not be modified through A-rules. Changes to UWs do not affect any other element (strings, headwords and features)
  • ([[]]):=(+[[a]]); (the UW of the node is set from [[]] to [[a]])
  • ([[]]):=([[a]]); (the same as above)
  • ():=([[a]]); (the same as above)
  • ([[a]]):=(+[[b]]); (the UW of the node is set from [[a]] to [[b]])
  • ([[a]]):=([[b]]); (the same as above)
  • ([[a]]):=(-[[a]]); (the UW of the node is reset, i.e., it changes from [[a]] to [[]])
  • ([[a]]):=([[]]); (the same as above)
  • ([a]):=("x"<0,0>"y"); (it is not possible to modify UWs through A-rules.
Features
Nodes may have as many features as necessary. Features may come isolated in a list (POS,NOU,GEN,MCL,NUM,SNG) or as pairs of attribute=value (POS=NOU,GEN=MCL,NUM=SNG). Changes to features do not affect any other element (strings, headwords and UWs)
  • Adding features to nodes
    Features are added through the operator + (add). The operator "+" may be omitted.
    • ():=(+B); (add the feature B to the node)
    • ():=(B); (the same as above)
    Rules are recursive: the feature will be added to the node while the condition is true.
    • ():=(+B); (the resulting node is (B,B,B,...), i.e., this is an infinite loop)
    • (^B):=(+B); (the resulting node is (B), i.e., the feature B is added only if the node does not contain it yet)
    The operator + (add) does not create attribute=value pairs automatically (it simply adds features to the nodes)
    • (%x,^POS,^NOU):=(%x,+NOU); (the resulting node is (NOU) and not (POS=NOU) because POS has not been added)
    • (%x,^POS,^NOU):=(%x,+POS,+NOU); (the resulting node is (POS,NOU) and not (POS=NOU) because there was no assignment POS=NOU)
    • (%x,POS,^NOU):=(%x,+NOU); (the resulting node is (POS,NOU) because there was no assignment POS=NOU)
    • (%x,^POS,^NOU):=(%x,+POS=NOU); (the resulting node is (POS=NOU))
    • (%x,POS,^NOU):=(%x,+POS=NOU); (the resulting node is (POS, POS=NOU) because the feature POS has been duplicated)
  • Deleting features from nodes
    Features are deleted through the operator - (delete).
    ():=(-B); (delete the feature B from the node)
    Rules are recursive: the feature will be deleted from the node while the condition is true.
    ():=(-B); (the rule will delete all instances of the feature B from the node, i.e., the node (B,B,B,B,B) will become ()
    The operator "-" may also be used to reset attributes:
    • (%x,POS,NOU):=(%x,-NOU); (the resulting node is (POS) because the feature POS was not deleted)
    • (%x,POS,NOU):=(%x,-POS,-NOU); (the resulting node is () because both features POS and NOU were deleted)
    • (%x,POS=NOU):=(%x,-NOU); (the resulting node is (POS) because only the value of the attribute POS was deleted)
    • (%x,POS=NOU):=(%x,-POS); (the resulting node is () because the attribute POS was deleted with all its values)
  • Copying features
    Features can be copied from one to another node through indexes
    • (%x,GEN)(%y,^GEN):=(%x)(%y,GEN=%x); (the value of the attribute GEN is copied from the node %x to %y);
Indexes
Indexes are used to make reference to the whole node instead of its elements. Any change in the index means a completely new node, and no element is preserved.
  • (%x,"a"):=(%x,"b"); (the string of the node %x is set from "a" to "b"; all the other elements of the node %x are preserved)
  • (%x,"a"):=(%y,"b"); (the whole node %x is replaced by a new node %y whose string is "b"; no element from %x is copied to %y)

Deleting nodes

In linear rules, nodes are deleted if they are not repeated (co-indexed) in the right side:

  • (%x)(%y):=(%x); (the node %y will be deleted)

In other rules, nodes are deleted if they are not repeated (co-indexed) in the right side and are not part of any other relation:

  • rel(%x;%y):=rel(%x); (the node %y will be deleted if, and only if, it is not part of any other relation)

Creating nodes

Nodes are created through the use of new indexes in the right side:

  • ("a",%x)("b",%y):=(%x)(%y)("c",%z); (the node %z will be created)
  • ("a",%x)("b",%y):=(%x)("c",%z); (the node %z will be created, and %y will be deleted)

Duplicating (cloning) nodes

Nodes may be duplicated by repeating indexes on the right side along with the command #CLONE:

  • ("a",^CLONED,%x):=(%x,+CLONED)(%x,+CLONED,#CLONE);
    ("a") becomes ("a")("a")

In order to avoid infinite recursion, it is important to change the condition on the right side (in the example above, the feature +CLONED, assigned to all instances of the clone, prevents the rule from applying indefinitely)
Clones contain the same elements of the original nodes, unless they are explicitly altered during the cloning:

  • ("a",[a],[[a]],A,^CLONED,%x):=(%x,+CLONED)(%x,+CLONED,#CLONE);
    ("a",[a],[[a]],A) becomes ("a",[a],[[a]],A,CLONED)("a",[a],[[a]],A,CLONED)
  • (A,^CLONED,%x):=(%x,-A,+B,+CLONED)(%x,-A,+C,+CLONED,#CLONE);
    (A) becomes (B,CLONED)(C,CLONED)

Splitting nodes

One node may be split into two or more nodes through the use of splitting rules. Consider, for instance, the cases below:

Splitting rules deal only with strings and apply only to nodes with the feature TEMP.
Original node: ("abc",TEMP)
Split rule: ("abc"):=("ab")("c");
Resulting nodes: ("ab",TEMP)("c",TEMP);
However, if the original node was ("abc"), without TEMP, the rule would not have been applied (i.e., it is necessary to assign the feature TEMP to the node before splitting it)
Splitting rules are conservative: the elements of the original node, except the string, will be preserved unless explicitly altered.
Original node: ("abc",[abc],[[abc]],A,B,C,TEMP)
Split rule: ("abc"):=("ab")("c");
Resulting nodes: ("ab",[abc],[[abc]],A,B,C,TEMP)("c",[abc],[[abc]],A,B,C,TEMP) (i.e., the elements of the original node will be copied to the new nodes)
However, if the rule was: ("abc"):=("ab",-A,-B,-C,-TEMP,+AB)("c",-TEMP);
The result would be: ("ab",[abc],[[abc]],AB)("c",[abc],[[abc]],A,B,C)

Merging nodes (&)

Two or more nodes may be merged by the command &:

  • (%x)(%y)(%z):=(%x&%y&%z);

In the example above("a")("b")("c") becomes ("abc")

Merge operations concatenate headwords and UWs, and join features

("hw1",[[uw1]],F1,%x)("hw2",[[uw2]],F2,%y)("hw3",[[uw3]],F3,%z):=(%x&%y&%z);
The resulting node is ("hw1hw2hw3",[[uw1uw2uw3]],F1,F2,F3)

Compare the difference
  • (%x)(%y):=(%z); (the nodes %x and %y are replaced by %z, and their features are lost unless explicitly included in %z)
  • (%x)(%y):=(%x&%y); (the nodes %x and %y are merged)

Retrieving entries in the dictionary after tokenization (?)

During transformation (i.e., after tokenization), dictionary entries may be accessed from transformation rules by the command "?"

  • (?[headword]) retrieves the first entry in the dictionary with the headword "headword"
  • (?[[uw]]) retrieves the first entry in the dictionary with the UW "uw"
  • (?[headword],?[[uw]],?feature) retrieves the first entry in the dictionary with the headword "headword", the UW "uw" and the feature "feature"

Regular expressions, variables and disjunction may also be used in dictionary search

  • (?[/abcd./]) retrieves the first entry in the dictionary whose headword has 5 characters and begins with "abcd" (this works only in natural language generation)
  • (?[[/abcd./]]) retrieves the first entry in the dictionary whose UW has 5 characters and begins with "abcd" (this works only in natural language analysis)
Obligatory parameters
Due to the indexation algorithm, the headword is obligatory in IAN and the UW is obligatory in EUGENE:
  • (?[headword]) will work only in IAN
  • (?[[uw]]) will work only in EUGENE
  • (?feature) will not work in IAN or EUGENE
Variables
In order to avoid repetition, dictionary look-up may use the values of indexed nodes in the left side
  • (?[%x]) retrieves the first entry in the dictionary with the same headword of the node %x
  • (?[[%x]]) retrieves the first entry in the dictionary with the same UW of the node %x
  • (?[%x],ATT=%x) retrieves the first entry in the dictionary with the same headword of the node %x and whose attribute ATT has the same value of the attribute ATT of the node %x
Example

Dictionary search is used mainly in natural language generation

  • (N,NUM,GEN,@def,%noun):=(?[[]],?ART,?DEF,?NUM=%noun,?GEN=%noun)(%noun,-@def);

In case of node %noun with the features noun (N), number (NUM) and gender (GEN), and with the attribute @def (definite), search the first entry in the dictionary associated with the UW "" (empty UW) with the features ART and DEF, and whose attributes NUM and GEN have the same values of the ones of the node %noun, and insert it in front of the noun. Remove @def from the noun in order to avoid an infinite loop.

Triggering rules (!)

Inflectional rules are triggered in the grammar by the command "!"<ATTRIBUTE>.
Given the dictionary entry:

  • [foot] "foot" (POS=NOU, NUM(PLR:="oo":"ee")) <eng,0,0>;

The rule NUM(PLR:="oo":"ee") is triggered by !NUM
For instance:

  • (NUM=PLR,^inflected):=(!NUM,+inflected); or
  • (PLR,^inflected):=(!NUM,+inflected); or
  • (NUM,^inflected):=(!NUM,+inflected);

In the first case (NUM=PLR), the system verifies if the attribute "NUM" is set and if it has the value "PLR". In the second and in the third case, the system simply verifies if the word has any feature (attribute or value) equal to "PLR" or "NUM".
It's important to stress that, as the features of the dictionary are defined by the user, there is no way of pre-assigning attribute-value pairs. In that sense, it's not possible to infer that "PLR" will be a value of the attribute "NUM" except through an assignment of the form "NUM=PLR" (i.e., given only "PLR" or "NUM", is not possible to state "NUM=PLR").

Transformations over hyper-nodes

Changes

Hyper-nodes, as nodes, have elements, which may be altered by the use of the operators + (add) and - (delete). The operator + may be omitted. Changes affect only the scopes indicated.

Changes to the main scope
  • (%a,"a",[a],[[a]],A,(%b,"b")):=(%a,"c");(the string of the hyper-node is set to "c"; the internal node %b is not affected)
  • (%a,"a",[a],[[a]],A,(%b,"b")):=(%a,"");(the string of the hyper-node is set to ""; the internal node %b is not affected)
  • (%a,"a",[a],[[a]],A,(%b,"b")):=(%a,-"a");(the same as above)
  • (%a,"a",[a],[[a]],A,(%b,"b")):=(%a,[c]);(the headword of the hyper-node is set to [c]; the internal node %b is not affected)
  • (%a,"a",[a],[[a]],A,(%b,"b")):=(%a,[[c]]);(the UW of the hyper-node is set to [[c]]; the internal node %b is not affected)
  • (%a,"a",[a],[[a]],A,^B,(%b,"b")):=(%a,+B);(add the feature B to the hyper-node %a; the internal node %b is not affected)
  • (%a,"a",[a],[[a]],A,^B,(%b,"b")):=(%a,B); (the same as above: add the feature B to %a)
  • (%a,"a",[a],[[a]],A,(%b,"b")):=(%a,-A);(delete the feature A from the hyper-node %a; the internal node %b is not affected)
Changes to inner scopes
  • (%a,(%b,"b",[b],[[b]],B)):=(%a,(%b,"c"));(the string of the inner node %b is set to "c"; the hyper-node %a is not affected)
  • (%a,(%b,"b",[b],[[b]],B)):=(%a,(%b,""));(the string of the inner node %b is set to ""; the hyper-node %a is not affected)
  • (%a,(%b,"b",[b],[[b]],B)):=(%a,(%b,-"b"));(the same as above)
  • (%a,(%b,"b",[b],[[b]],B)):=(%a,(%b,[c]));(the headword of the inner node %bis set to [c]; the hyper-node %a is not affected)
  • (%a,(%b,"b",[b],[[b]],B)):=(%a,(%b,[[c]]));(the UW of the inner node %b is set to [[c]]; the hyper-node %a is not affected)
  • (%a,(%b,"b",[b],[[b]],B,^C)):=(%a,(%b,+C));(add the feature C to the inner node %b; the hyper-node %a is not affected)
  • (%a,(%b,"b",[b],[[b]],B,^C)):=(%a,(%b,C)); (the same as above: add the feature C to %b)
  • (%a,(%b,"b",[b],[[b]],B)):=(%a,(%b,-B));(delete the feature B from the inner node %b; the hyper-node %a is not affected)
Rules must have as many parentheses as the depth of the inner scope to be altered
  • (%a,(%b,(%c,(%d,(%e,"e",[e],[[e]],E))))):=(%a,(%b,(%c,(%d,(%e,"f"))))); (the string of inner node %e is set to "f"; the enclosing nodes %d, %c, %b and %a are not affected)
Hyper-nodes do not need to be represented if the changes apply to nodes instead of nodes inside hyper-nodes
  • (%a,(%b,"b",[b],[[b]],B)):=(%a,(%b,"c")); (the string of the inner node %b is set to "c"; the hyper-node %a is not affected)

could be represented simply as

  • (%b,"b",[b],[[b]],B)):=(%b,"c");

if the changes apply to all nodes ("b",[b],[[b]],B) and not only to those inside scopes.

Deletion

Hyper-nodes, as any node, are deleted if they are not repeated (co-indexed) in the right side. In this case, all the inner nodes are deleted as well:

  • (REL(%x;%y),%z):=; (the hyper-node %z will be deleted, and all its internal nodes and relations as well)

As any feature, inner nodes are conservative, and are not deleted even if they are not repeated (co-indexed) in the right side:

  • (%a,A,^B):=(%a,+B); (the feature A is not deleted from the node %a)
  • (%a,^B,(%b,"b")):=(%a,+B); (the node %b is not deleted from the hyper-node %a)

In order to delete inner nodes, the operator "-" must be used

  • (%a,A,(%b,B)):=(%a,-(%b)); (the node %b is deleted from the hyper-node %a)
  • (%a,A,rel(%b;%c)):=(%a,-rel(%b;%c)); (the relation rel(%b;%c) is deleted from the hyper-node %a)

Extraction

Nodes may be extracted from hyper-nodes by removing the corresponding parentheses. In this case, the hyper-node is deleted (along with its features), but the internal nodes and relations are preserved, if repeated on the right side.

  • ((%x),%y):=(%x); (the hyper-node %y is deleted, but its internal node %x is preserved; in case %y have nodes other than %x, these nodes will be deleted as well, because they are not repeated in the right side)
  • (REL(%x;%y),%z):=REL(%x;%y); (the hyper-node %z is deleted, but its internal relation REL(%x;%y) is preserved; in case %z have relations other than REL(%x;%y), and nodes other than %x and %y, these will be deleted as well, because they are not repeated in the right side.

Create

Hyper-nodes are created through the encapsulation of existing nodes

  • (%x):=((%x),%y); (the hyper-node %y is created, with the node %x there inside)
  • REL(%x;%y):=(REL(%x;%y),%z); (the hyper-node %z is created, with the relation REL between the nodes %x and %y inside)
  • (%x)(%y):=((%x)(%y),%z); (the hyper-node %z is created, with the linear relation between the nodes %x and %y there inside)
Attention
relations and nodes must be repeated in the right side or they will be deleted
  • (%x):=(%y); (the node %x will be simply replaced by %y; no hyper-node will be created)
  • REL(%x;%y):=(%z); (the relation REL between the nodes %x and %y will be replaced by the node %z; no hyper-node will be created)

Properties

  1. L-rules are recursive: rules will apply while conditions are true:
    The rule "(BLK):=("-");" will transform "a b c d e" into "a-b-c-d-e" (and not only in "a-b c d e")
    The rule "(X):=(+Y);" will never stop (i.e., it contains an infinite loop): the feature Y will keep been added eternally (X,Y,Y,Y,Y,Y,Y,Y,...)
  2. The symbol ^ is used for negation and may be used to prevent infinite loops:
    • (X,^Y):=(+Y); (= add the feature Y to a node containing the feature X that does not contain the feature Y yet)
    • (^".")(STAIL):=(%01)(".")(%02); (Add a period before the end of the sentence if there is not a period yet)
  3. Rules are conservative. No feature is changed or deleted unless explicitly indicated through "-".
    In the rule ("x",A):=("y"); the string "x" is replaced by the string "y", but the feature A is not altered (i.e.,the final state will be ("y",A));
    In the rule ("x",A):=("y",A); the string "x" is replaced by the string "y" and the feature A is added to the node (i.e., the final state will be ("y",A,A))
    The rule "("a",ART)(BLK)("/a[bcd]e/"):=("an")( )( );" does not affect the status of the second and the third nodes. On the other hand, the rule "("a",VER)(BLK)("il",PPR):=( )("-t-",-BLK)( );" alters the status of the second node by deleting the feature BLK.
  4. Regular expressions may be used in order to make reference to the elements of a node, but only in the left (condition) side.
    ("/[A-Z]/",%x)(".",%y):=(%x);
    ("/[A-Z]/")("."):=("/[A-Z]/");
  5. In the ACTION field, changes to the elements of a node may be expressed by the right side of A-rules
    The rule "("a",ART)(BLK)("/[aeiou].*/"):=("an")( )( );" could also be expressed as "("a",ART)(BLK)("/[aeiou].*/"):=(0>"n")( )( );", i.e., the change from "a" to "an" could be expressed either by "an" or 0>"n".
  6. Rules apply only if all conditions are true.
    The rule "("a")(BLK)("/[aeiou].*/"):=("an")( )( );" will apply only in case of "a" before a blank and a node starting with "a", "e", "i", "o" or "u".

Indexes

See: Indexation

Common mistakes

  • "Mr":="Mister";
    • Conditions and actions must always come between parentheses: ("Mr"):=("Mister");
  • (Mr):=(Mister);
    • Constants must come between quotes (inside the parentheses): ("Mr"):=("Mister");
  • ("Mr"):=("Mister")
    • Rules must end in semicolon: ("Mr"):=("Mister");
  • ("I am"):=("I'm");
    • Each separate word form must be isolated between parentheses and described as a different condition: ("I")(BLK)("am"):=("I'm");
  • ("a",ART)(BLK)(VOW):=("an");
    • "a adjective">"a": the blank and the following form are deleted because they are not present at the right side
  • ("de",PRE)(BLK)(VOW):=("d'")(VOW);
    • "de avoir">"d' ": coindexation is based on ordering and not on features. The third form is deleted because it's not present at the right side; the second form, which is BLK, receives the feature VOW;


N-rules and L-rules

N-rules and L-rules are basically the same. The only difference is that L-rules are part of the Transformation Grammar and, therefore, applies after tokenization, whereas N-rules constitute the N-grammar, and apply before tokenization. This means that N-rules may only deal with strings or regular expressions, whereas L-rules may also deal with other elements (such as features and UW's):

  • L-rule
    • ("I")(BLK)("am"):=("I'm"); (I am>I'm)
    • ("a",PRE)(BLK)("a",ART):=("à",+ART,+CTC); (a a>à)
    • ("de",PRE)(BLK)("le",ART):=("du",+ART,+CTC); (de le>du)
  • N-rule
    • ("I")(" ")("am"):=("I'm"); (replace "I am" by "I'm")

Note, in the above, that we may use dictionary features (such as BLK, PRE, ART) in L-rules, but we cannot use any dictionary feature in N-rules. The only features available in N-rules are the system-defined features, such as SHEAD (beginning of the sentence) and STAIL (end of the sentence).

Software