L-rule

From UNL Wiki

Revision as of 15:29, 16 August 2013 by Martins (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Jump to: navigation, search

L-rule (linear rule) is the formalism used for applying transformations over ordered sequences of isolated nodes.

When to use L-rules

L-rules are used for:

reordering nodes in a list (a b c > a c b)
replacing nodes in a list (a b c > a x c)
adding nodes in a list (a b c > a x b c)
deleting nodes in a list (a b c > a c)

When not to use L-rules

L-rules are not used in transformations over structures other than lists (i.e., in trees and graphs)

Syntax

The general syntax for L-rules is the following:

(CONDITION) := (ACTION);

Where:

CONDITION is a single node or a sequence of nodes over which actions will take place; and
ACTION is the action to be performed over each node or sequence of nodes of the CONDITION.

Examples:

("Mr."):=("Mister"); (replace "Mr." by "Mister")
("I")(BLK)("am"):=("I'm"); (replace "I am" by "I'm")
("a")(BLK)("/[aeiou].*/"):=("an")()(); (replace "a" by "an" before a blank space (BLK) and word beginning with "a", "e", "i", "o" or "u")
("he")(BLK)("is"):=(%03)(%02)(%01); (reorder "he is" to "is he")

Types of L-rules

There are three types of L-rules:

replacement, when the number of parentheses in the CONDITION field is equal to the number of parentheses in the ACTION field:
addition, when the number of parentheses in the CONDITION field is lower than the number of parentheses in the ACTION field;
deletion, when the number of parentheses in the CONDITION field is greater than the number parentheses in the ACTION field.

Examples
RULE	BEFORE > AFTER	DESCRIPTION
("a")("b")("c"):=("d")("e")("f");	abc > def	"a" will be replaced by "d"; "b" by "e"; and "c" by "f"
("a")("b")("c"):=("d")( )( );	abc > dbc	"a" will be replaced by "d"; "b" and "c" will be preserved
("a")("b")("c"):=("d")("")("");	abc > d	"a" will be replaced by "d"; "b" and "c" will be replaced by "" (i.e., blank)
("a")("b")("c"):=("d",%01)(%02);	abc > db	"a" will be replaced by "d"; "b" will be preserved; "c" will be deleted
("a")("b")("c"):=("d",%01);	abc > d	"a" will be replaced by "d"; "b" and "c" will be deleted
("a")("b")("c"):=(%03)(%02)(%01);	abc > cba	"a", "b" and "c" will be preserved, but reordered
("a")("b")("c"):=("d",%01)(%03);	abc > dc	"a" will be replaced by "d"; "b" will be deleted; "c" will be preserved
("a")("b")("c"):=("d",%01)("g")(%02)(%03);	abc > dgc	"a" will be replaced by "d"; "b" and "c" will be preserved; and a new node "g" will be created between "a" and "b"

Examples

Examples
RULE	BEFORE > AFTER	DESCRIPTION
("a",ART)(BLK)("/[aeiou].*/"):=("an")( )( );	a adjective > an adjective	replace the article (ART) "a" by "an" before a blank space (BLK) and a node starting with "a", "e", "i", "o" or "u"; preserve the second node (BLK) and the third node without any change
("a",PRE)(BLK)("a",ART):=("à",PRE,ART,CTC);	a a > à	replace the preposition (PRE) "a" + blank (BLK) + article (ART) "a" by "à"; add the features PRE (preposition), ART (article) and CTC (contraction) to the node "à"
("de",PRE)(BLK)("le",ART):=("du",PRE,ART,CTC);	de le > du	replace the preposition (PRE) "de" + blank (BLK) + article (ART) "le" by "du"; add the features PRE, ART and CTC to the node "du"
("a",VER)(BLK)("il",PPR):=( )("-t-",-BLK)( );	a il > a-t-il	replace the blank space (BLK) between the verb (VER) "a" and the pronoun (PPR) "il" by "-t-"; remove the feature BLK from the second form; preserve the first and the third form without any change
("de",PRE)(BLK)("/[aeiou].*/"):=("d'",%01)(%03);	de avoir > d'avoir	replace the preposition (PRE) "de" + blank space (BLK) + a node starting with "a", "e", "i", "o" or "u" by "d'"; delete the second form (BLK); and preserve the third form (%03) without any change

Observations

Strings in the right side always replace strings in the left side

In the rule ("x"):=("y"); the string "x" is replaced by the string "y".

L-rules are recursive: rules will apply while conditions are true

The rule "(BLK):=("-");" will transform "a b c d e" into "a-b-c-d-e" (and not only in "a-b c d e")

The rule "(X):=(+Y);" will never stop (i.e., it contains an infinite loop): the feature Y will keep been added eternally (X,Y,Y,Y,Y,Y,Y,Y,...)

The symbol ^ is used for negation and may be used to prevent infinite loops

(X,^Y):=(+Y); (= add the feature Y to a node containing the feature X that does not contain the feature Y yet)
(^".")(STAIL):=(%01)(".")(%02); (Add a period before the end of the sentence if there is not a period yet)

Rules are conservative. No feature is changed or deleted unless explicitly indicate through "-".: In the rule ("x",FEA):=("y"); the string "x" is replaced by the string "y", but the feature FEA is not altered (i.e.,the final state will be ("y",FEA));; The rule "("a",ART)(BLK)(VOW):=("an")( )( );" does not affect the status of the second and the third word forms, which continue to be BLK and VOW. On the other hand, the rule "("a",VER)(BLK)("il",PPR):=( )("-t-",-BLK)( );" alters the status of the second form by deleting the feature BLK.
In the ACTION field, changes may be expressed by the right side of A-rules (i.e., by prefixation, infixation, suffixation or replacement) inside each form. The default is replacement.: The rule "("a",ART)(BLK)("/[aeiou].*/"):=("an")( )( );" could also be expressed as "("a",ART)(BLK)("/[aeiou].*/"):=(0>"n")( )( );", i.e., the change from "a" to "an" could be expressed either by "an" or 0>"n".
Rules apply only if all conditions are true.: The rule "("a")(BLK)("/[aeiou].*/"):=("an")( )( );" will apply only in case of "a" before a blank and a vowel.
In order to enhance its power, conditions (but not actions) may be replaced by regular expressions between //.: ("/a[bcd]e/"):=(""); (Delete the words "abe", "ace" and "ade")

Indexes

Nodes are always indexed in L-rules

Indexes (%) are used for indexing nodes, attributes and values between the left (condition) and the right side of rules.

(%a)(%b):=(%b)(%a); (change the order of the constituents)

If omitted, indexes are assigned by default, according to the position

(A)(B):=(C)(D); is the same as (A,%01)(B,%02):=(C,%01)(D,%02);

Indexes can be replaced by user-defined labels made of any sequence of alphabetic characters and underscore

(A,%a)(B,%b):=(C,%a)(D,%b);

Numeric characters cannot be used as user-defined indexes

(A,%03)(B,%05):=(C,%03)(D,%05);

%01 = A, %02 = B (there is no %03 nor %05)

Indexes may also be used to transfer attribute values expressed in the format ATTRIBUTE=VALUE: (A,%a,ATT1=VAL1)(B,%b):=()(B,ATT1=%a); (the value "VAL1" of "ATT1" of %a is copied to the node %b)

Common mistakes

~~"Mr":="Mister";~~
- Conditions and actions must always come between parentheses: ("Mr"):=("Mister");
~~(Mr):=(Mister);~~
- Constants must come between quotes (inside the parentheses): ("Mr"):=("Mister");
~~("Mr"):=("Mister")~~
- Rules must end in semicolon: ("Mr"):=("Mister");
~~("I am"):=("I'm");~~
- Each separate word form must be isolated between parentheses and described as a different condition: ("I")(BLK)("am"):=("I'm");
~~("a",ART)(BLK)(VOW):=("an");~~
- "a adjective">"a": the blank and the following form are deleted because they are not present at the right side
~~("de",PRE)(BLK)(VOW):=("d'")(VOW);~~
- "de avoir">"d' ": coindexation is based on ordering and not on features. The third form is deleted because it's not present at the right side; the second form, which is BLK, receives the feature VOW;

Formal syntax

L-rules comply with the following formal syntax:

<L-RULE>          ::= ( "("<CONDITION>")" )+ ":=" ( "("<ACTION>")" )+ ";"
<CONDITION>        ::= """<STRING>""" ("," <TAGLIST> )* | "["<STRING>"]" ("," <TAGLIST> )* | <TAGLIST>
<ACTION>           ::= (<INDEX>)? ( <AFFIXATION> ("," <AFFIXATION>)* )* ( <ATT_CHANGE> ("," <ATT_CHANGE>)* )*
<AFFIXATION>       ::= <PREFIXATION> | <SUFFIXATION> | <INFIXATION> | <REPLACEMENT> (cf. A-rule)
<ATT_CHANGE>       ::= { "+" | "-" } <TAG> 
<TAGLIST>          ::= <INDEX> | (<INDEX> ",")? <TAG> ("," <TAG>)* 
<INDEX>            ::= "%"[01..99]
<TAG>              ::= {one of the tags defined in the UNDLF Tagset}
<STRING>           ::= [a-Z]+
<INTEGER>          ::= [0-9]+

where

<a> = a is a non-terminal symbol
“a“ = a is a constant
a | b = a or b
{ a | b } = either a or b
(a)? = a can occur 0 or 1 time
(a)* = a can be repeated 0 or more times
(a)+ = a can be repeated 1 or more times

L-rule

Contents

When to use L-rules

When not to use L-rules

Syntax

Types of L-rules

Examples

Observations

Indexes

Common mistakes

Formal syntax

Views

Personal tools

Search

UNL

Lingware

Software

UNL Program

Navigation

Toolbox

Print/export