L-rule
From UNL Wiki
L-rule (linear rule) is the formalism used for applying transformations over ordered sequences of isolated nodes.
Contents |
When to use L-rules
L-rules are used for:
- reordering nodes in a list (a b c > a c b)
- replacing nodes in a list (a b c > a x c)
- adding nodes in a list (a b c > a x b c)
- deleting nodes in a list (a b c > a c)
When not to use L-rules
L-rules are not used in transformations over structures other than lists (i.e., in trees and graphs)
Syntax
The general syntax for L-rules is the following:
(CONDITION) := (ACTION);
Where:
- CONDITION is a single node or a sequence of nodes over which actions will take place; and
- ACTION is the action to be performed over each node or sequence of nodes of the CONDITION.
Examples:
- ("Mr."):=("Mister"); (replace "Mr." by "Mister")
- ("I")(BLK)("am"):=("I'm"); (replace "I am" by "I'm")
- ("a")(BLK)("/[aeiou].*/"):=("an")()(); (replace "a" by "an" before a blank space (BLK) and word beginning with "a", "e", "i", "o" or "u")
- ("he")(BLK)("is"):=(%03)(%02)(%01); (reorder "he is" to "is he")
Types of L-rules
There are three types of L-rules:
- replacement, when the number of parentheses in the CONDITION field is equal to the number of parentheses in the ACTION field:
- addition, when the number of parentheses in the CONDITION field is lower than the number of parentheses in the ACTION field;
- deletion, when the number of parentheses in the CONDITION field is greater than the number parentheses in the ACTION field.
RULE | BEFORE > AFTER | DESCRIPTION |
---|---|---|
("a")("b")("c"):=("d")("e")("f"); | abc > def | "a" will be replaced by "d"; "b" by "e"; and "c" by "f" |
("a")("b")("c"):=("d")( )( ); | abc > dbc | "a" will be replaced by "d"; "b" and "c" will be preserved |
("a")("b")("c"):=("d")("")(""); | abc > d | "a" will be replaced by "d"; "b" and "c" will be replaced by "" (i.e., blank) |
("a")("b")("c"):=("d",%01)(%02); | abc > db | "a" will be replaced by "d"; "b" will be preserved; "c" will be deleted |
("a")("b")("c"):=("d",%01); | abc > d | "a" will be replaced by "d"; "b" and "c" will be deleted |
("a")("b")("c"):=(%03)(%02)(%01); | abc > cba | "a", "b" and "c" will be preserved, but reordered |
("a")("b")("c"):=("d",%01)(%03); | abc > dc | "a" will be replaced by "d"; "b" will be deleted; "c" will be preserved |
("a")("b")("c"):=("d",%01)("g")(%02)(%03); | abc > dgc | "a" will be replaced by "d"; "b" and "c" will be preserved; and a new node "g" will be created between "a" and "b" |
Examples
RULE | BEFORE > AFTER | DESCRIPTION |
---|---|---|
("a",ART)(BLK)("/[aeiou].*/"):=("an")( )( ); | a adjective > an adjective | replace the article (ART) "a" by "an" before a blank space (BLK) and a node starting with "a", "e", "i", "o" or "u"; preserve the second node (BLK) and the third node without any change |
("a",PRE)(BLK)("a",ART):=("à",PRE,ART,CTC); | a a > à | replace the preposition (PRE) "a" + blank (BLK) + article (ART) "a" by "à"; add the features PRE (preposition), ART (article) and CTC (contraction) to the node "à" |
("de",PRE)(BLK)("le",ART):=("du",PRE,ART,CTC); | de le > du | replace the preposition (PRE) "de" + blank (BLK) + article (ART) "le" by "du"; add the features PRE, ART and CTC to the node "du" |
("a",VER)(BLK)("il",PPR):=( )("-t-",-BLK)( ); | a il > a-t-il | replace the blank space (BLK) between the verb (VER) "a" and the pronoun (PPR) "il" by "-t-"; remove the feature BLK from the second form; preserve the first and the third form without any change |
("de",PRE)(BLK)("/[aeiou].*/"):=("d'",%01)(%03); | de avoir > d'avoir | replace the preposition (PRE) "de" + blank space (BLK) + a node starting with "a", "e", "i", "o" or "u" by "d'"; delete the second form (BLK); and preserve the third form (%03) without any change |
Observations
- Strings in the right side always replace strings in the left side
- In the rule ("x"):=("y"); the string "x" is replaced by the string "y".
- L-rules are recursive: rules will apply while conditions are true
- The rule "(BLK):=("-");" will transform "a b c d e" into "a-b-c-d-e" (and not only in "a-b c d e")
- The rule "(X):=(+Y);" will never stop (i.e., it contains an infinite loop): the feature Y will keep been added eternally (X,Y,Y,Y,Y,Y,Y,Y,...)
- The symbol ^ is used for negation and may be used to prevent infinite loops
-
- (X,^Y):=(+Y); (= add the feature Y to a node containing the feature X that does not contain the feature Y yet)
- (^".")(STAIL):=(%01)(".")(%02); (Add a period before the end of the sentence if there is not a period yet)
- Rules are conservative. No feature is changed or deleted unless explicitly indicate through "-".
- In the rule ("x",FEA):=("y"); the string "x" is replaced by the string "y", but the feature FEA is not altered (i.e.,the final state will be ("y",FEA));
- The rule "("a",ART)(BLK)(VOW):=("an")( )( );" does not affect the status of the second and the third word forms, which continue to be BLK and VOW. On the other hand, the rule "("a",VER)(BLK)("il",PPR):=( )("-t-",-BLK)( );" alters the status of the second form by deleting the feature BLK.
- In the ACTION field, changes may be expressed by the right side of A-rules (i.e., by prefixation, infixation, suffixation or replacement) inside each form. The default is replacement.
- The rule "("a",ART)(BLK)("/[aeiou].*/"):=("an")( )( );" could also be expressed as "("a",ART)(BLK)("/[aeiou].*/"):=(0>"n")( )( );", i.e., the change from "a" to "an" could be expressed either by "an" or 0>"n".
- Rules apply only if all conditions are true.
- The rule "("a")(BLK)("/[aeiou].*/"):=("an")( )( );" will apply only in case of "a" before a blank and a vowel.
- In order to enhance its power, conditions (but not actions) may be replaced by regular expressions between //.
- ("/a[bcd]e/"):=(""); (Delete the words "abe", "ace" and "ade")
Indexes
- Nodes are always indexed in L-rules
- Indexes (%) are used for indexing nodes, attributes and values between the left (condition) and the right side of rules.
- (%a)(%b):=(%b)(%a); (change the order of the constituents)
- If omitted, indexes are assigned by default, according to the position
-
- (A)(B):=(C)(D); is the same as (A,%01)(B,%02):=(C,%01)(D,%02);
- Indexes can be replaced by user-defined labels made of any sequence of alphabetic characters and underscore
- (A,%a)(B,%b):=(C,%a)(D,%b);
- Numeric characters cannot be used as user-defined indexes
- (A,%03)(B,%05):=(C,%03)(D,%05);
- %01 = A, %02 = B (there is no %03 nor %05)
- Indexes may also be used to transfer attribute values expressed in the format ATTRIBUTE=VALUE
- (A,%a,ATT1=VAL1)(B,%b):=()(B,ATT1=%a); (the value "VAL1" of "ATT1" of %a is copied to the node %b)
Common mistakes
"Mr":="Mister";- Conditions and actions must always come between parentheses: ("Mr"):=("Mister");
(Mr):=(Mister);- Constants must come between quotes (inside the parentheses): ("Mr"):=("Mister");
("Mr"):=("Mister")- Rules must end in semicolon: ("Mr"):=("Mister");
("I am"):=("I'm");- Each separate word form must be isolated between parentheses and described as a different condition: ("I")(BLK)("am"):=("I'm");
("a",ART)(BLK)(VOW):=("an");- "a adjective">"a": the blank and the following form are deleted because they are not present at the right side
("de",PRE)(BLK)(VOW):=("d'")(VOW);- "de avoir">"d' ": coindexation is based on ordering and not on features. The third form is deleted because it's not present at the right side; the second form, which is BLK, receives the feature VOW;
Formal syntax
L-rules comply with the following formal syntax:
<L-RULE> ::= ( "("<CONDITION>")" )+ ":=" ( "("<ACTION>")" )+ ";" <CONDITION> ::= """<STRING>""" ("," <TAGLIST> )* | "["<STRING>"]" ("," <TAGLIST> )* | <TAGLIST> <ACTION> ::= (<INDEX>)? ( <AFFIXATION> ("," <AFFIXATION>)* )* ( <ATT_CHANGE> ("," <ATT_CHANGE>)* )* <AFFIXATION> ::= <PREFIXATION> | <SUFFIXATION> | <INFIXATION> | <REPLACEMENT> (cf. A-rule) <ATT_CHANGE> ::= { "+" | "-" } <TAG> <TAGLIST> ::= <INDEX> | (<INDEX> ",")? <TAG> ("," <TAG>)* <INDEX> ::= "%"[01..99] <TAG> ::= {one of the tags defined in the UNDLF Tagset} <STRING> ::= [a-Z]+ <INTEGER> ::= [0-9]+
where
<a> = a is a non-terminal symbol
“a“ = a is a constant
a | b = a or b
{ a | b } = either a or b
(a)? = a can occur 0 or 1 time
(a)* = a can be repeated 0 or more times
(a)+ = a can be repeated 1 or more times