L-rule

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(Types of L-rules)
Line 1: Line 1:
 
'''L-rule''' (linear rule) is a specific type of [[transformation rule]] used for applying transformations over ordered sequences of isolated nodes.  
 
'''L-rule''' (linear rule) is a specific type of [[transformation rule]] used for applying transformations over ordered sequences of isolated nodes.  
 +
 
== When to use L-rules ==
 
== When to use L-rules ==
 
L-rules are used for:
 
L-rules are used for:
*reordering nodes in a list (a b c > a c b)
+
*reordering nodes in a list: a b c > a c b
*replacing nodes in a list (a b c > a x c)
+
*replacing nodes in a list: a b c > a x c
*adding nodes in a list (a b c > a x b c)
+
*adding nodes in a list: a b c > a x b c
*deleting nodes in a list (a b c > a c)
+
*deleting nodes in a list: a b c > a c
 +
 
 
== When not to use L-rules ==
 
== When not to use L-rules ==
L-rules are not used in transformations over structures other than lists (i.e., in trees and graphs). In these cases, we use [[S-rules]] (syntactic rules).
+
L-rules are not used in transformations over structures other than lists (i.e., trees and graphs). In these cases, we use [[S-rules]] (syntactic rules).
  
 
== Syntax ==
 
== Syntax ==
Line 16: Line 18:
 
*CONDITION is a single [[node]] or a sequence of nodes over which actions will take place; and
 
*CONDITION is a single [[node]] or a sequence of nodes over which actions will take place; and
 
*ACTION is the action to be performed over each node or sequence of nodes of the CONDITION.
 
*ACTION is the action to be performed over each node or sequence of nodes of the CONDITION.
Examples:
 
*("Mr."):=("Mister"); (replace "Mr." by "Mister")
 
*("I")(BLK)("am"):=("I'm"); (replace "I am" by "I'm")
 
*("a")(BLK)("/[aeiou].*/"):=("an")()(); (replace "a" by "an" before a blank space (BLK) and word beginning with "a", "e", "i", "o" or "u", such as a "a apple">"an apple")
 
*("he")(BLK)("is"):=(%03)(%02)(%01); (reorder "he is" to "is he")
 
  
 
== Examples ==
 
== Examples ==

Revision as of 16:25, 20 August 2013

L-rule (linear rule) is a specific type of transformation rule used for applying transformations over ordered sequences of isolated nodes.

Contents

When to use L-rules

L-rules are used for:

  • reordering nodes in a list: a b c > a c b
  • replacing nodes in a list: a b c > a x c
  • adding nodes in a list: a b c > a x b c
  • deleting nodes in a list: a b c > a c

When not to use L-rules

L-rules are not used in transformations over structures other than lists (i.e., trees and graphs). In these cases, we use S-rules (syntactic rules).

Syntax

The general syntax for L-rules is the following:

(CONDITION) := (ACTION);

Where:

  • CONDITION is a single node or a sequence of nodes over which actions will take place; and
  • ACTION is the action to be performed over each node or sequence of nodes of the CONDITION.

Examples

Examples
RULE BEFORE > AFTER DESCRIPTION
("a",ART)(BLK)("/[aeiou].*/"):=("an")( )( ); a adjective > an adjective replace the article (ART) "a" by "an" before a blank space (BLK) and a node starting with "a", "e", "i", "o" or "u"; preserve the second node (BLK) and the third node without any change
("a",PRE)(BLK)("a",ART):=("à",PRE,ART,CTC); a a > à replace the preposition (PRE) "a" + blank (BLK) + article (ART) "a" by "à"; add the features PRE (preposition), ART (article) and CTC (contraction) to the node "à"
("de",PRE)(BLK)("le",ART):=("du",PRE,ART,CTC); de le > du replace the preposition (PRE) "de" + blank (BLK) + article (ART) "le" by "du"; add the features PRE, ART and CTC to the node "du"
("a",VER)(BLK)("il",PPR):=( )("-t-",-BLK)( ); a il > a-t-il replace the blank space (BLK) between the verb (VER) "a" and the pronoun (PPR) "il" by "-t-"; remove the feature BLK from the second form; preserve the first and the third form without any change
("de",PRE)(BLK)("/[aeiou].*/"):=("d'",%01)(%03); de avoir > d'avoir replace the preposition (PRE) "de" + blank space (BLK) + a node starting with "a", "e", "i", "o" or "u" by "d'"; delete the second form (BLK); and preserve the third form (%03) without any change

Observations

L-rules are recursive: rules will apply while conditions are true
The rule "(BLK):=("-");" will transform "a b c d e" into "a-b-c-d-e" (and not only in "a-b c d e")
The rule "(X):=(+Y);" will never stop (i.e., it contains an infinite loop): the feature Y will keep been added eternally (X,Y,Y,Y,Y,Y,Y,Y,...)
The symbol ^ is used for negation and may be used to prevent infinite loops
  • (X,^Y):=(+Y); (= add the feature Y to a node containing the feature X that does not contain the feature Y yet)
  • (^".")(STAIL):=(%01)(".")(%02); (Add a period before the end of the sentence if there is not a period yet)
Rules are conservative. No feature is changed or deleted unless explicitly indicate through "-".
In the rule ("x",FEA):=("y"); the string "x" is replaced by the string "y", but the feature FEA is not altered (i.e.,the final state will be ("y",FEA));
The rule "("a",ART)(BLK)("/a[bcd]e/"):=("an")( )( );" does not affect the status of the second and the third nodes. On the other hand, the rule "("a",VER)(BLK)("il",PPR):=( )("-t-",-BLK)( );" alters the status of the second node by deleting the feature BLK.
In the ACTION field, changes may be expressed by the right side of A-rules inside each form. The default is replacement.
The rule "("a",ART)(BLK)("/[aeiou].*/"):=("an")( )( );" could also be expressed as "("a",ART)(BLK)("/[aeiou].*/"):=(0>"n")( )( );", i.e., the change from "a" to "an" could be expressed either by "an" or 0>"n".
Rules apply only if all conditions are true.
The rule "("a")(BLK)("/[aeiou].*/"):=("an")( )( );" will apply only in case of "a" before a blank and a node starting with "a", "e", "i", "o" or "u".

Indexes

Indexes are used to associate nodes in the left side of the rule (CONDITION) to nodes in the right side of the rule (ACTION):

  • (%a)(%b)(%c):=(%b); (delete the first and the third nodes, and keep the second)
  • (%a)(%b)(%c):=(%c)(%b)(%a); (reverse the order)

Indexation is done automatically by the machine, as follows:

  • if the number of nodes is the same in the left and in the right side, NODES ARE CO-INDEXED
    ("a")("b")("c"):=("d")("e")("f"); is the same as ("a",%01)("b",%02)("c",%03):=("d",%01)("e",%02)("f",%03); (i.e., "a" will be replaced by "d", "b" by "e", and "c" by "f")
  • if the number of nodes is not the same in both sides, NODES ARE NOT CO-INDEXED
    ("a")("b")("c"):=("d")("e"); is the same as ("a",%01)("b",%02)("c",%03):=("d",%04)("e",%05); (i.e., "a", "b" and "c" will be deleted, and "d" and "e" will be created

In order to avoid ambiguities, it is highly recommended that indexes are replaced by user-defined labels made of any sequence of alphabetic characters and underscore:

(A,%a)(B,%b):=(C,%a)(D,%b);

Numeric characters cannot be used as user-defined indexes:

(A,%03)(B,%05):=(C,%03)(D,%05);

Indexes may also be used to transfer attribute values expressed in the format ATTRIBUTE=VALUE:

(A,%a,ATT1=VAL1)(B,%b):=()(B,ATT1=%a); (the value "VAL1" of "ATT1" of %a is copied to the node %b)

Common mistakes

  • "Mr":="Mister";
    • Conditions and actions must always come between parentheses: ("Mr"):=("Mister");
  • (Mr):=(Mister);
    • Constants must come between quotes (inside the parentheses): ("Mr"):=("Mister");
  • ("Mr"):=("Mister")
    • Rules must end in semicolon: ("Mr"):=("Mister");
  • ("I am"):=("I'm");
    • Each separate word form must be isolated between parentheses and described as a different condition: ("I")(BLK)("am"):=("I'm");
  • ("a",ART)(BLK)(VOW):=("an");
    • "a adjective">"a": the blank and the following form are deleted because they are not present at the right side
  • ("de",PRE)(BLK)(VOW):=("d'")(VOW);
    • "de avoir">"d' ": coindexation is based on ordering and not on features. The third form is deleted because it's not present at the right side; the second form, which is BLK, receives the feature VOW;

Formal syntax

L-rules comply with the following formal syntax:

<L-RULE>          ::= ( "("<CONDITION>")" )+ ":=" ( "("<ACTION>")" )+ ";"
<CONDITION>        ::= """<STRING>""" ("," <TAGLIST> )* | "["<STRING>"]" ("," <TAGLIST> )* | <TAGLIST>
<ACTION>           ::= (<INDEX>)? ( <AFFIXATION> ("," <AFFIXATION>)* )* ( <ATT_CHANGE> ("," <ATT_CHANGE>)* )*
<AFFIXATION>       ::= <PREFIXATION> | <SUFFIXATION> | <INFIXATION> | <REPLACEMENT> (cf. A-rule)
<ATT_CHANGE>       ::= { "+" | "-" } <TAG> 
<TAGLIST>          ::= <INDEX> | (<INDEX> ",")? <TAG> ("," <TAG>)* 
<INDEX>            ::= "%"[01..99]
<TAG>              ::= {one of the tags defined in the UNDLF Tagset}
<STRING>           ::= [a-Z]+
<INTEGER>          ::= [0-9]+

where

<a> = a is a non-terminal symbol
“a“ = a is a constant
a | b = a or b
{ a | b } = either a or b
(a)? = a can occur 0 or 1 time
(a)* = a can be repeated 0 or more times
(a)+ = a can be repeated 1 or more times

Software