L-rule
(→Observations) |
(→Indexes) |
||
Line 115: | Line 115: | ||
== Indexes == | == Indexes == | ||
− | + | Indexes are used to control rules: | |
− | : | + | :*(%a)(%b)(%c):=(%b); (delete the first and the third nodes, and keep the second) |
− | :*( | + | :*(%a)(%b)(%c):=(%c)(%b)(%a); (reverse the order) |
− | + | Indexation is done automatically by the machine, as follows: | |
− | + | *if the number of nodes is the same in the left and in the right side, NODES ARE CO-INDEXED | |
− | ; | + | *:("a")("b")("c"):=("d")("e")("f"); is the same as ("a",%01)("b",%02)("c",%03):=("d",%01)("e",%02)("f",%03); (i.e., "a" will be replaced by "d", "b" by "e", and "c" by "f") |
+ | *if the number of nodes is not the same in both sides, NODES ARE NOT CO-INDEXED | ||
+ | *:("a")("b")("c"):=("d")("e"); is the same as ("a",%01)("b",%02)("c",%03):=("d",%04)("e",%05); (i.e., "a", "b" and "c" will be deleted, and "d" and "e" will be created | ||
+ | In order to avoid ambiguities, it is highly recommended that indexes are replaced by user-defined labels made of any sequence of alphabetic characters and underscore: | ||
:(A,%a)(B,%b):=(C,%a)(D,%b); | :(A,%a)(B,%b):=(C,%a)(D,%b); | ||
− | + | Numeric characters cannot be used as user-defined indexes: | |
− | :(A,%03)(B,%05):=(C,%03)(D,%05); | + | :<strike>(A,%03)(B,%05):=(C,%03)(D,%05);</strike> |
− | + | Indexes may also be used to transfer attribute values expressed in the format ATTRIBUTE=VALUE: | |
− | + | ||
:(A,%a,ATT1=VAL1)(B,%b):=()(B,ATT1=%a); (the value "VAL1" of "ATT1" of %a is copied to the node %b) | :(A,%a,ATT1=VAL1)(B,%b):=()(B,ATT1=%a); (the value "VAL1" of "ATT1" of %a is copied to the node %b) | ||
Revision as of 14:37, 16 August 2013
L-rule (linear rule) is the formalism used for applying transformations over ordered sequences of isolated nodes.
Contents |
When to use L-rules
L-rules are used for:
- reordering nodes in a list (a b c > a c b)
- replacing nodes in a list (a b c > a x c)
- adding nodes in a list (a b c > a x b c)
- deleting nodes in a list (a b c > a c)
When not to use L-rules
L-rules are not used in transformations over structures other than lists (i.e., in trees and graphs)
Syntax
The general syntax for L-rules is the following:
(CONDITION) := (ACTION);
Where:
- CONDITION is a single node or a sequence of nodes over which actions will take place; and
- ACTION is the action to be performed over each node or sequence of nodes of the CONDITION.
Examples:
- ("Mr."):=("Mister"); (replace "Mr." by "Mister")
- ("I")(BLK)("am"):=("I'm"); (replace "I am" by "I'm")
- ("a")(BLK)("/[aeiou].*/"):=("an")()(); (replace "a" by "an" before a blank space (BLK) and word beginning with "a", "e", "i", "o" or "u")
- ("he")(BLK)("is"):=(%03)(%02)(%01); (reorder "he is" to "is he")
Types of L-rules
There are three types of L-rules:
- replacement, when the number of parentheses in the CONDITION field is equal to the number of parentheses in the ACTION field:
- addition, when the number of parentheses in the CONDITION field is lower than the number of parentheses in the ACTION field;
- deletion, when the number of parentheses in the CONDITION field is greater than the number parentheses in the ACTION field.
RULE | BEFORE > AFTER | DESCRIPTION |
---|---|---|
("a")("b")("c"):=("d")("e")("f"); | abc > def | "a" will be replaced by "d"; "b" by "e"; and "c" by "f" |
("a")("b")("c"):=("d")( )( ); | abc > dbc | "a" will be replaced by "d"; "b" and "c" will be preserved |
("a")("b")("c"):=("d")("")(""); | abc > d | "a" will be replaced by "d"; "b" and "c" will be replaced by "" (i.e., blank) |
("a")("b")("c"):=("d",%01)(%02); | abc > db | "a" will be replaced by "d"; "b" will be preserved; "c" will be deleted |
("a")("b")("c"):=("d",%01); | abc > d | "a" will be replaced by "d"; "b" and "c" will be deleted |
("a")("b")("c"):=(%03)(%02)(%01); | abc > cba | "a", "b" and "c" will be preserved, but reordered |
("a")("b")("c"):=("d",%01)(%03); | abc > dc | "a" will be replaced by "d"; "b" will be deleted; "c" will be preserved |
("a")("b")("c"):=("d",%01)("g")(%02)(%03); | abc > dgc | "a" will be replaced by "d"; "b" and "c" will be preserved; and a new node "g" will be created between "a" and "b" |
Examples
RULE | BEFORE > AFTER | DESCRIPTION |
---|---|---|
("a",ART)(BLK)("/[aeiou].*/"):=("an")( )( ); | a adjective > an adjective | replace the article (ART) "a" by "an" before a blank space (BLK) and a node starting with "a", "e", "i", "o" or "u"; preserve the second node (BLK) and the third node without any change |
("a",PRE)(BLK)("a",ART):=("à",PRE,ART,CTC); | a a > à | replace the preposition (PRE) "a" + blank (BLK) + article (ART) "a" by "à"; add the features PRE (preposition), ART (article) and CTC (contraction) to the node "à" |
("de",PRE)(BLK)("le",ART):=("du",PRE,ART,CTC); | de le > du | replace the preposition (PRE) "de" + blank (BLK) + article (ART) "le" by "du"; add the features PRE, ART and CTC to the node "du" |
("a",VER)(BLK)("il",PPR):=( )("-t-",-BLK)( ); | a il > a-t-il | replace the blank space (BLK) between the verb (VER) "a" and the pronoun (PPR) "il" by "-t-"; remove the feature BLK from the second form; preserve the first and the third form without any change |
("de",PRE)(BLK)("/[aeiou].*/"):=("d'",%01)(%03); | de avoir > d'avoir | replace the preposition (PRE) "de" + blank space (BLK) + a node starting with "a", "e", "i", "o" or "u" by "d'"; delete the second form (BLK); and preserve the third form (%03) without any change |
Observations
- Strings in the right side always replace strings in the left side
- In the rule ("x"):=("y"); the string "x" is replaced by the string "y".
- L-rules are recursive: rules will apply while conditions are true
- The rule "(BLK):=("-");" will transform "a b c d e" into "a-b-c-d-e" (and not only in "a-b c d e")
- The rule "(X):=(+Y);" will never stop (i.e., it contains an infinite loop): the feature Y will keep been added eternally (X,Y,Y,Y,Y,Y,Y,Y,...)
- The symbol ^ is used for negation and may be used to prevent infinite loops
-
- (X,^Y):=(+Y); (= add the feature Y to a node containing the feature X that does not contain the feature Y yet)
- (^".")(STAIL):=(%01)(".")(%02); (Add a period before the end of the sentence if there is not a period yet)
- Rules are conservative. No feature is changed or deleted unless explicitly indicate through "-".
- In the rule ("x",FEA):=("y"); the string "x" is replaced by the string "y", but the feature FEA is not altered (i.e.,the final state will be ("y",FEA));
- The rule "("a",ART)(BLK)(VOW):=("an")( )( );" does not affect the status of the second and the third word forms, which continue to be BLK and VOW. On the other hand, the rule "("a",VER)(BLK)("il",PPR):=( )("-t-",-BLK)( );" alters the status of the second form by deleting the feature BLK.
- In the ACTION field, changes may be expressed by the right side of A-rules (i.e., by prefixation, infixation, suffixation or replacement) inside each form. The default is replacement.
- The rule "("a",ART)(BLK)("/[aeiou].*/"):=("an")( )( );" could also be expressed as "("a",ART)(BLK)("/[aeiou].*/"):=(0>"n")( )( );", i.e., the change from "a" to "an" could be expressed either by "an" or 0>"n".
- Rules apply only if all conditions are true.
- The rule "("a")(BLK)("/[aeiou].*/"):=("an")( )( );" will apply only in case of "a" before a blank and a vowel.
- In order to enhance its power, conditions (but not actions) may be replaced by regular expressions between //.
- ("/a[bcd]e/"):=(""); (Delete the words "abe", "ace" and "ade")
Indexes
Indexes are used to control rules:
- (%a)(%b)(%c):=(%b); (delete the first and the third nodes, and keep the second)
- (%a)(%b)(%c):=(%c)(%b)(%a); (reverse the order)
Indexation is done automatically by the machine, as follows:
- if the number of nodes is the same in the left and in the right side, NODES ARE CO-INDEXED
- ("a")("b")("c"):=("d")("e")("f"); is the same as ("a",%01)("b",%02)("c",%03):=("d",%01)("e",%02)("f",%03); (i.e., "a" will be replaced by "d", "b" by "e", and "c" by "f")
- if the number of nodes is not the same in both sides, NODES ARE NOT CO-INDEXED
- ("a")("b")("c"):=("d")("e"); is the same as ("a",%01)("b",%02)("c",%03):=("d",%04)("e",%05); (i.e., "a", "b" and "c" will be deleted, and "d" and "e" will be created
In order to avoid ambiguities, it is highly recommended that indexes are replaced by user-defined labels made of any sequence of alphabetic characters and underscore:
- (A,%a)(B,%b):=(C,%a)(D,%b);
Numeric characters cannot be used as user-defined indexes:
(A,%03)(B,%05):=(C,%03)(D,%05);
Indexes may also be used to transfer attribute values expressed in the format ATTRIBUTE=VALUE:
- (A,%a,ATT1=VAL1)(B,%b):=()(B,ATT1=%a); (the value "VAL1" of "ATT1" of %a is copied to the node %b)
Common mistakes
"Mr":="Mister";- Conditions and actions must always come between parentheses: ("Mr"):=("Mister");
(Mr):=(Mister);- Constants must come between quotes (inside the parentheses): ("Mr"):=("Mister");
("Mr"):=("Mister")- Rules must end in semicolon: ("Mr"):=("Mister");
("I am"):=("I'm");- Each separate word form must be isolated between parentheses and described as a different condition: ("I")(BLK)("am"):=("I'm");
("a",ART)(BLK)(VOW):=("an");- "a adjective">"a": the blank and the following form are deleted because they are not present at the right side
("de",PRE)(BLK)(VOW):=("d'")(VOW);- "de avoir">"d' ": coindexation is based on ordering and not on features. The third form is deleted because it's not present at the right side; the second form, which is BLK, receives the feature VOW;
Formal syntax
L-rules comply with the following formal syntax:
<L-RULE> ::= ( "("<CONDITION>")" )+ ":=" ( "("<ACTION>")" )+ ";" <CONDITION> ::= """<STRING>""" ("," <TAGLIST> )* | "["<STRING>"]" ("," <TAGLIST> )* | <TAGLIST> <ACTION> ::= (<INDEX>)? ( <AFFIXATION> ("," <AFFIXATION>)* )* ( <ATT_CHANGE> ("," <ATT_CHANGE>)* )* <AFFIXATION> ::= <PREFIXATION> | <SUFFIXATION> | <INFIXATION> | <REPLACEMENT> (cf. A-rule) <ATT_CHANGE> ::= { "+" | "-" } <TAG> <TAGLIST> ::= <INDEX> | (<INDEX> ",")? <TAG> ("," <TAG>)* <INDEX> ::= "%"[01..99] <TAG> ::= {one of the tags defined in the UNDLF Tagset} <STRING> ::= [a-Z]+ <INTEGER> ::= [0-9]+
where
<a> = a is a non-terminal symbol
“a“ = a is a constant
a | b = a or b
{ a | b } = either a or b
(a)? = a can occur 0 or 1 time
(a)* = a can be repeated 0 or more times
(a)+ = a can be repeated 1 or more times