L-rule

From UNL Wiki
Revision as of 14:37, 16 August 2013 by Martins (Talk | contribs)
Jump to: navigation, search

L-rule (linear rule) is the formalism used for applying transformations over ordered sequences of isolated nodes.

Contents

When to use L-rules

L-rules are used for:

  • reordering nodes in a list (a b c > a c b)
  • replacing nodes in a list (a b c > a x c)
  • adding nodes in a list (a b c > a x b c)
  • deleting nodes in a list (a b c > a c)

When not to use L-rules

L-rules are not used in transformations over structures other than lists (i.e., in trees and graphs)

Syntax

The general syntax for L-rules is the following:

(CONDITION) := (ACTION);

Where:

  • CONDITION is a single node or a sequence of nodes over which actions will take place; and
  • ACTION is the action to be performed over each node or sequence of nodes of the CONDITION.

Examples:

  • ("Mr."):=("Mister"); (replace "Mr." by "Mister")
  • ("I")(BLK)("am"):=("I'm"); (replace "I am" by "I'm")
  • ("a")(BLK)("/[aeiou].*/"):=("an")()(); (replace "a" by "an" before a blank space (BLK) and word beginning with "a", "e", "i", "o" or "u")
  • ("he")(BLK)("is"):=(%03)(%02)(%01); (reorder "he is" to "is he")

Types of L-rules

There are three types of L-rules:

  • replacement, when the number of parentheses in the CONDITION field is equal to the number of parentheses in the ACTION field:
  • addition, when the number of parentheses in the CONDITION field is lower than the number of parentheses in the ACTION field;
  • deletion, when the number of parentheses in the CONDITION field is greater than the number parentheses in the ACTION field.
Examples
RULE BEFORE > AFTER DESCRIPTION
("a")("b")("c"):=("d")("e")("f"); abc > def "a" will be replaced by "d"; "b" by "e"; and "c" by "f"
("a")("b")("c"):=("d")( )( ); abc > dbc "a" will be replaced by "d"; "b" and "c" will be preserved
("a")("b")("c"):=("d")("")(""); abc > d "a" will be replaced by "d"; "b" and "c" will be replaced by "" (i.e., blank)
("a")("b")("c"):=("d",%01)(%02); abc > db "a" will be replaced by "d"; "b" will be preserved; "c" will be deleted
("a")("b")("c"):=("d",%01); abc > d "a" will be replaced by "d"; "b" and "c" will be deleted
("a")("b")("c"):=(%03)(%02)(%01); abc > cba "a", "b" and "c" will be preserved, but reordered
("a")("b")("c"):=("d",%01)(%03); abc > dc "a" will be replaced by "d"; "b" will be deleted; "c" will be preserved
("a")("b")("c"):=("d",%01)("g")(%02)(%03); abc > dgc "a" will be replaced by "d"; "b" and "c" will be preserved; and a new node "g" will be created between "a" and "b"

Examples

Examples
RULE BEFORE > AFTER DESCRIPTION
("a",ART)(BLK)("/[aeiou].*/"):=("an")( )( ); a adjective > an adjective replace the article (ART) "a" by "an" before a blank space (BLK) and a node starting with "a", "e", "i", "o" or "u"; preserve the second node (BLK) and the third node without any change
("a",PRE)(BLK)("a",ART):=("à",PRE,ART,CTC); a a > à replace the preposition (PRE) "a" + blank (BLK) + article (ART) "a" by "à"; add the features PRE (preposition), ART (article) and CTC (contraction) to the node "à"
("de",PRE)(BLK)("le",ART):=("du",PRE,ART,CTC); de le > du replace the preposition (PRE) "de" + blank (BLK) + article (ART) "le" by "du"; add the features PRE, ART and CTC to the node "du"
("a",VER)(BLK)("il",PPR):=( )("-t-",-BLK)( ); a il > a-t-il replace the blank space (BLK) between the verb (VER) "a" and the pronoun (PPR) "il" by "-t-"; remove the feature BLK from the second form; preserve the first and the third form without any change
("de",PRE)(BLK)("/[aeiou].*/"):=("d'",%01)(%03); de avoir > d'avoir replace the preposition (PRE) "de" + blank space (BLK) + a node starting with "a", "e", "i", "o" or "u" by "d'"; delete the second form (BLK); and preserve the third form (%03) without any change

Observations

Strings in the right side always replace strings in the left side
In the rule ("x"):=("y"); the string "x" is replaced by the string "y".
L-rules are recursive: rules will apply while conditions are true
The rule "(BLK):=("-");" will transform "a b c d e" into "a-b-c-d-e" (and not only in "a-b c d e")
The rule "(X):=(+Y);" will never stop (i.e., it contains an infinite loop): the feature Y will keep been added eternally (X,Y,Y,Y,Y,Y,Y,Y,...)
The symbol ^ is used for negation and may be used to prevent infinite loops
  • (X,^Y):=(+Y); (= add the feature Y to a node containing the feature X that does not contain the feature Y yet)
  • (^".")(STAIL):=(%01)(".")(%02); (Add a period before the end of the sentence if there is not a period yet)
Rules are conservative. No feature is changed or deleted unless explicitly indicate through "-".
In the rule ("x",FEA):=("y"); the string "x" is replaced by the string "y", but the feature FEA is not altered (i.e.,the final state will be ("y",FEA));
The rule "("a",ART)(BLK)(VOW):=("an")( )( );" does not affect the status of the second and the third word forms, which continue to be BLK and VOW. On the other hand, the rule "("a",VER)(BLK)("il",PPR):=( )("-t-",-BLK)( );" alters the status of the second form by deleting the feature BLK.
In the ACTION field, changes may be expressed by the right side of A-rules (i.e., by prefixation, infixation, suffixation or replacement) inside each form. The default is replacement.
The rule "("a",ART)(BLK)("/[aeiou].*/"):=("an")( )( );" could also be expressed as "("a",ART)(BLK)("/[aeiou].*/"):=(0>"n")( )( );", i.e., the change from "a" to "an" could be expressed either by "an" or 0>"n".
Rules apply only if all conditions are true.
The rule "("a")(BLK)("/[aeiou].*/"):=("an")( )( );" will apply only in case of "a" before a blank and a vowel.
In order to enhance its power, conditions (but not actions) may be replaced by regular expressions between //.
("/a[bcd]e/"):=(""); (Delete the words "abe", "ace" and "ade")

Indexes

Indexes are used to control rules:

  • (%a)(%b)(%c):=(%b); (delete the first and the third nodes, and keep the second)
  • (%a)(%b)(%c):=(%c)(%b)(%a); (reverse the order)

Indexation is done automatically by the machine, as follows:

  • if the number of nodes is the same in the left and in the right side, NODES ARE CO-INDEXED
    ("a")("b")("c"):=("d")("e")("f"); is the same as ("a",%01)("b",%02)("c",%03):=("d",%01)("e",%02)("f",%03); (i.e., "a" will be replaced by "d", "b" by "e", and "c" by "f")
  • if the number of nodes is not the same in both sides, NODES ARE NOT CO-INDEXED
    ("a")("b")("c"):=("d")("e"); is the same as ("a",%01)("b",%02)("c",%03):=("d",%04)("e",%05); (i.e., "a", "b" and "c" will be deleted, and "d" and "e" will be created

In order to avoid ambiguities, it is highly recommended that indexes are replaced by user-defined labels made of any sequence of alphabetic characters and underscore:

(A,%a)(B,%b):=(C,%a)(D,%b);

Numeric characters cannot be used as user-defined indexes:

(A,%03)(B,%05):=(C,%03)(D,%05);

Indexes may also be used to transfer attribute values expressed in the format ATTRIBUTE=VALUE:

(A,%a,ATT1=VAL1)(B,%b):=()(B,ATT1=%a); (the value "VAL1" of "ATT1" of %a is copied to the node %b)

Common mistakes

  • "Mr":="Mister";
    • Conditions and actions must always come between parentheses: ("Mr"):=("Mister");
  • (Mr):=(Mister);
    • Constants must come between quotes (inside the parentheses): ("Mr"):=("Mister");
  • ("Mr"):=("Mister")
    • Rules must end in semicolon: ("Mr"):=("Mister");
  • ("I am"):=("I'm");
    • Each separate word form must be isolated between parentheses and described as a different condition: ("I")(BLK)("am"):=("I'm");
  • ("a",ART)(BLK)(VOW):=("an");
    • "a adjective">"a": the blank and the following form are deleted because they are not present at the right side
  • ("de",PRE)(BLK)(VOW):=("d'")(VOW);
    • "de avoir">"d' ": coindexation is based on ordering and not on features. The third form is deleted because it's not present at the right side; the second form, which is BLK, receives the feature VOW;

Formal syntax

L-rules comply with the following formal syntax:

<L-RULE>          ::= ( "("<CONDITION>")" )+ ":=" ( "("<ACTION>")" )+ ";"
<CONDITION>        ::= """<STRING>""" ("," <TAGLIST> )* | "["<STRING>"]" ("," <TAGLIST> )* | <TAGLIST>
<ACTION>           ::= (<INDEX>)? ( <AFFIXATION> ("," <AFFIXATION>)* )* ( <ATT_CHANGE> ("," <ATT_CHANGE>)* )*
<AFFIXATION>       ::= <PREFIXATION> | <SUFFIXATION> | <INFIXATION> | <REPLACEMENT> (cf. A-rule)
<ATT_CHANGE>       ::= { "+" | "-" } <TAG> 
<TAGLIST>          ::= <INDEX> | (<INDEX> ",")? <TAG> ("," <TAG>)* 
<INDEX>            ::= "%"[01..99]
<TAG>              ::= {one of the tags defined in the UNDLF Tagset}
<STRING>           ::= [a-Z]+
<INTEGER>          ::= [0-9]+

where

<a> = a is a non-terminal symbol
“a“ = a is a constant
a | b = a or b
{ a | b } = either a or b
(a)? = a can occur 0 or 1 time
(a)* = a can be repeated 0 or more times
(a)+ = a can be repeated 1 or more times

Software