A-rule

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(Observations)
(Observations)
 
(91 intermediate revisions by 4 users not shown)
Line 1: Line 1:
'''M-rule''' is the formalism used for describing morphological behaviour in the UNLarium framework. It is used in [[inflectional paradigms]], in [[inflectional rules]], in [[semantic rules]] and in [[language settings|morphological settings]].
+
'''A-rule''' (affixation rule) is a specific type of [[transformation rule]] used for generating affixes (prefixes, suffixes, infixes) in the UNLarium framework.  
 +
== When to use A-rules ==
 +
A-rules are used for prefixation, suffixation and infixation, i.e., for adding morphemes to a given base form. They are used for generating '''inflections''' (such as "book">"books", "love">"loved") or '''derivations''' (such as "dress">"undress", "write">"writer").  
  
== Generative and enumerative lexica ==
+
== When not to use A-rules ==
The repertoire of lexemes of a given language can be organized in two basic ways: 1) as a simple listing of all word forms, i.e., of all variants of the same lexeme ("die", "dies", "died", "dying", etc); or 2) as a list of base forms accompanied by morphological rules for generating their inflections ("die", +s, +d, etc). The first architecture, the "enumerative" one, states that a word form can be more accurately retrieved as a single atomic entity instead of as a combination of several different morphemes. Its main advantages concern word matching (faster and more precise as there is no possibility of over-generation) and construction (it is easier and often less expensive to list the irregular forms instead of trying to define paradigms for them). Nevertheless, the latter architecture, i.e., the "generative" one, which relies on the principle that “the smaller the better”, is far much more common, as its main advantages concern access (the word retrieval process is supposed to be faster), storage (it requires a smaller amount of memory space) and maintenance (changes are automatically propagated to all instances of a given entry).  
+
A-rules are not used for '''composition''' (i.e., to form new words by combining or putting together old words), as in "give">"give in", "go">"have gone" or "man">"fireman"). This should be treated by [[C-rule]]s.
  
The UNLarium is mainly a generative environment, in the sense that word forms are expected to be represented by their corresponding LRUs and base forms, along with rules for generating their possible inflections. These are the '''m-rules''', to be provided either as LRU-specific (in case of irregular behaviour) or as inflectional paradigms (applying to several different LRUs).
+
== Types of A-rules ==
 +
There are two types of A-rules:
 +
*'''simple A-rules''' involve a single action (such as prefixation, suffixation, infixation and replacement); and
 +
*'''complex A-rules''' involve more than one action (such as circumfixation).
  
== Types of m-rules ==
+
== Simple A-rules ==
There are two types of m-rules:
+
There are four types of simple A-rules:
*'''simple m-rules''' involve a single action (such as prefixation, suffixation or infixation); and
+
*'''prefixation''', for adding morphemes at the beginning of a base form.
*'''complex m-rules''' involve more than one action (such as prefixation and sufixation, or two suffixations).
+
*'''suffixation''', for adding morphemes at the end of a base form
== Simple m-rules ==
+
*'''infixation''', for adding morphemes to the middle of the base form
There are three types of simple m-rules:
+
*'''replacement''', for changing the base form
*left appending (prefixation), for adding/changing information at the beginning of a base form
+
*right appending (suffixation), for adding/changing information at the end of a base form
+
*replacement (infixation), for adding/changing information in the midlle of a base form
+
 
=== Syntax ===
 
=== Syntax ===
The syntax for simple m-rules is the following:
+
The syntax for simple A-rules is the following:
left appending
+
<br>
  CONDITION := “ADDED” < “DELETED”;
+
<br>
  right appending
+
'''prefixation'''
  CONDITION := “DELETED” > “ADDED”;
+
CONDITION := "ADDED" < DELETED;
  replacement
+
'''suffixation'''
   CONDITION := “DELETED” : “ADDED”;
+
  CONDITION := DELETED > "ADDED";
 +
'''infixation'''
 +
CONDITION := [REFERENCE] > "ADDED";
 +
  CONDITION := "ADDED" < [REFERENCE];
 +
'''replacement'''
 +
   CONDITION := DELETED : "ADDED";
 +
'''duplication'''
 +
  CONDITION := [REFERENCE]+;
 
Where:
 
Where:
*CONDITION = tag (such as “PLR”, “FEM”, etc) or list of tags (“FEM&PLR”) that indicates when the rule should be applied
+
*CONDITION = tag (such as "PLR", "FEM", etc) or list of tags ("FEM&PLR") that indicates when the rule should be applied
*ADDED = the string to be added (between quotes);
+
*ADDED (between quotes) = the string to be added ;
*DELETED = the string to be deleted (between quotes);
+
*REFERENCE (between square brackets) = the reference string (between quotes) or the position (without quotes) of the string to be added;
 +
*DELETED = the string (between quotes) or the number of characters (without quotes) to be deleted.
 +
 
 
=== Examples ===
 
=== Examples ===
==== Left appending (prefixation) rules ====
+
{|border="1" align="center" cellpadding="2"
{|border="1" align="center" cellpadding="5"
+
|+Prefixation
 
! RULE
 
! RULE
 
! BEHAVIOR
 
! BEHAVIOR
Line 35: Line 46:
 
! AFTER
 
! AFTER
 
|-
 
|-
| X:=”y”<”z”;
+
|width=100| X:="y"<"z";
| if X replace the string “z” by the string “y” in the beginning of the string
+
|width=300| if X replace the string "z" by the string "y" in the beginning of the string
| '''z'''abc
+
|width=50| '''z'''abc
| '''y'''abc
+
|width=50| '''y'''abc
 
|-
 
|-
| X:=”y”<1;
+
| X:="y"<1;
| if X replace the first character of the string  by “y”
+
| if X replace the first character of the string  by "y"
 
| '''z'''abc
 
| '''z'''abc
 
| '''y'''abc
 
| '''y'''abc
 
|-
 
|-
| X:=”y”<0;
+
| X:="y"<0;
| if X add the string “y” to the beginning of the string
+
| if X add the string "y" to the beginning of the string
 
| zabc
 
| zabc
 
| '''y'''zabc
 
| '''y'''zabc
 
|-
 
|-
| X:=”y”<;
+
| X:="y"<;<ref name="not">This feature is not supported by the UNL<sup>dev</sup>.</ref>
| if X add the string “y” to the beginning of the string (idem previous)
+
| if X add the string "y" to the beginning of the string (idem previous)
 
| zabc
 
| zabc
 
| '''y'''zabc
 
| '''y'''zabc
 
|-
 
|-
| X:=”y”<<0;
+
| X:="y"<<0;<ref name="not"/>
| if X add the string “y” and a blank space to the beginning of the string
+
| if X add the string "y" and a blank space to the beginning of the string
 
| zabc
 
| zabc
 
| '''y''' zabc
 
| '''y''' zabc
 
|-
 
|-
| X:=”y”<<;
+
| X:="y"<<;<ref name="not"/>
| if X add the string “y” and a blank space to the beginning of the string (idem previous)
+
| if X add the string "y" and a blank space to the beginning of the string (idem previous)
 
| zabc
 
| zabc
 
| '''y''' zabc
 
| '''y''' zabc
 
|}
 
|}
==== Right appending (suffixation) rules ====
+
<br>
{|border="1" align="center" cellpadding="5"
+
{|border="1" align="center" cellpadding="2"
 +
|+Suffixation
 
! RULE
 
! RULE
 
! BEHAVIOR
 
! BEHAVIOR
Line 72: Line 84:
 
! AFTER
 
! AFTER
 
|-
 
|-
| X:=”z”>”y”;
+
|width=100| X:="z">"y";
| if X replace the string “z” by the string “y” in the end of the string
+
|width=300| if X replace the string "z" by the string "y" in the end of the string
| abc'''z'''
+
|width=50| abc'''z'''
| abc'''y'''
+
|width=50| abc'''y'''
 
|-
 
|-
| X:=1>”y”;
+
| X:=1>"y";
| if X replace the last character of the string  by “y”
+
| if X replace the last character of the string  by "y"
 
| abc'''z'''
 
| abc'''z'''
 
| abc'''y'''
 
| abc'''y'''
 
|-
 
|-
| X:=0>”y”;
+
| X:=0>"y";
| if X add the string “y” to the end of the string
+
| if X add the string "y" to the end of the string
 
| abcz
 
| abcz
 
| abcz'''y'''
 
| abcz'''y'''
 
|-
 
|-
| X:=>”y”;
+
| X:=>"y";<ref name="not"/>
| if X add the string “y” to the end of the string (idem previous)
+
| if X add the string "y" to the end of the string (idem previous)
 
| abcz
 
| abcz
 
| abcz'''y'''
 
| abcz'''y'''
 
|-
 
|-
| X:=0>>”y”;
+
| X:=0>>"y";<ref name="not"/>
| if X add a blank space  and the string “y” to the end of the string
+
| if X add a blank space  and the string "y" to the end of the string
 
| abcz
 
| abcz
 
| abcz '''y'''
 
| abcz '''y'''
 
|-
 
|-
| X:=>>”y”;
+
| X:=>>"y";<ref name="not"/>
| if X add a blank space  and the string “y” to the end of the string (idem previous)
+
| if X add a blank space  and the string "y" to the end of the string (idem previous)
 
| abcz
 
| abcz
 
| abcz '''y'''
 
| abcz '''y'''
 
|}
 
|}
==== Replacement (infixation) rules ====
+
<br>
{|border="1" align="center" cellpadding="5"
+
{|border="1" align="center" cellpadding="2"
 +
|+Infixation
 
! RULE
 
! RULE
 
! BEHAVIOR
 
! BEHAVIOR
Line 109: Line 122:
 
! AFTER
 
! AFTER
 
|-
 
|-
| X:=”y”;
+
|width=100| X:=[2]>"y";
| if X replace the whole by “y”
+
|width=300| if X add "y" to the right of the second character
| X
+
|width=50| abc
| '''y'''
+
|width=50| ab'''y'''c
 
|-
 
|-
| X:=”z”:”y”;
+
| X:="y"<[3];
| if X replace the string “z” by “y”
+
| if X add "y" to the left of the third character
 +
| abc
 +
| ab'''y'''c
 +
|-
 +
| X:=["b"]>"y";
 +
| if X add "y" to the right of "b";
 +
| abc
 +
| ab'''y'''c
 +
|-
 +
| X:="y"<["c"];
 +
| if X add "y" to the left of "c"
 +
| abc
 +
| ab'''y'''c
 +
|-
 +
| X:="y"<[3="c"];
 +
| if X add "y" to the left of "c", if "c" is the third character
 +
| abc
 +
| ab'''y'''c
 +
|-
 +
| X:=[2,="b"]>"y";
 +
| if X add "y" to the right of "b", if "b" is the second character;
 +
| abc
 +
| ab'''y'''c
 +
|-
 +
| X:=[-2]>"y";
 +
| if X add "y" to the right of the second character from the right
 +
| abc
 +
| ab'''y'''c
 +
|-
 +
| X:="y"<[-2];
 +
| if X add "y" to the left of the second character from the right
 +
| abc
 +
| a'''y'''bc
 +
|}
 +
<br>
 +
{|border="1" align="center" cellpadding="2"
 +
|+Replacement
 +
! RULE
 +
! BEHAVIOR
 +
! BEFORE
 +
! AFTER
 +
|-
 +
|width=100| X:="y";
 +
|width=300| if X replace the whole by "y"
 +
|width=50| X
 +
|width=50| '''y'''
 +
|-
 +
| X:="z":"y";
 +
| if X replace the string "z" by "y"
 
| a'''z'''bc
 
| a'''z'''bc
 
| a'''y'''bc
 
| a'''y'''bc
 
|-
 
|-
| X:=[2;3]:”y”;
+
| X:=[2-3]:"y";
| if X replace the second to the third character by “z”
+
| if X replace the second to the third character by "y"
 
| a'''bc'''z
 
| a'''bc'''z
 
| a'''y'''z
 
| a'''y'''z
 +
|}
 +
<br>
 +
{|border="1" align="center" cellpadding="2"
 +
|+Duplication
 +
! RULE
 +
! BEHAVIOR
 +
! BEFORE
 +
! AFTER
 
|-
 
|-
| X:=Y;
+
|width=100| X:=[2]+;
| replace the feature X by the feature Y
+
|width=300| if X duplicate the second character
| X
+
|width=50| abc
| Y
+
|width=50| ab'''b'''c
 +
|-
 +
|width=100| X:=[-2]+;
 +
|width=300| if X duplicate the second last character
 +
|width=50| abc
 +
|width=50| ab'''b'''c
 +
|-
 +
|width=100| X:=[2="b"]+;
 +
|width=300| if X duplicate the second character, if it is "b"
 +
|width=50| abc
 +
|width=50| ab'''b'''c
 
|}
 
|}
 +
 
=== Observations ===
 
=== Observations ===
;In appending rules, the part to be deleted may be represented by the number of characters (without quotes):
+
;Rules will only be applied if all conditions are true:
{|cellpadding=3
+
:X:="y"<"z"; ( "zabc" changes to "yabc", but "abc" remains "abc" since there is no "z" to be replaced)
|PLR := “X”<””;
+
;String fields are necessarily continuous:
|PLR := “X”<0;
+
:X:="aaa"<"xyz"; ( "xyzbbb" changes to "aaabbb", but "bxbybz" remains "bxbybz" since there is no continuous string "xyz" to be replaced)
|ABC becomes XABC
+
;Prefixation, infixation and suffixation rules apply only once (i.e, rules are not exhaustive)
 +
:PLR:=0>"s"; ("X" becomes "Xs", and not "Xssssss...")
 +
;Replacement rules apply as long as the conditions are true:
 +
:X:="a":"b"; ( "aaa" becomes "bbb" and not "abb")
 +
;In prefixation and suffixation rules, the part to be deleted may be represented by the number of characters (without quotes):
 +
{|align=center cellpadding=2
 
|-
 
|-
|PLR:= “X”<”A”;
+
|width=150|PLR := "X"<"";
|PLR:= “X”<1;
+
|=
ABC becomes XBC
+
|width=150|PLR := "X"<0;
 +
|(ABC becomes XABC)
 
|-
 
|-
|PLR:= “XY”<”AB”;
+
|PLR:= "X"<"A";
|PLR:= “XY”<2;
+
|=
ABC becomes XYC
+
|PLR:= "X"<1;
 +
|(ABC becomes XBC)
 
|-
 
|-
|PLR:=””>”X”;
+
|PLR:= "XY"<"AB";
|PLR:= 0>”X”;
+
|=
|ABC becomes ABCX
+
|PLR:= "XY"<2;
 +
|(ABC becomes XYC)
 
|-
 
|-
|PLR:=”C”>”X”;
+
|PLR:="">"X";
|PLR:= 1>”X”;
+
|=
|ABC becomes ABX
+
|PLR:= 0>"X";
 +
|(ABC becomes ABCX)
 
|-
 
|-
|PLR:=”BC”>”XY”;
+
|PLR:="C">"X";
|PLR:= 2>”XY”;
+
|=
|ABC becomes AXY
+
|PLR:= 1>"X";
 +
|(ABC becomes ABX)
 +
|-
 +
|PLR:="BC">"XY";
 +
|=
 +
|PLR:= 2>"XY";
 +
|(ABC becomes AXY)
 +
|}
 +
;In infixation and duplication rules, the position of the addition may be made with reference to the end of string by using "-".
 +
{|border="1" align="center" cellpadding="2"
 +
! RULE
 +
! BEHAVIOR
 +
! BEFORE
 +
! AFTER
 +
|-
 +
|width=70| X:=[2]>"y";
 +
|width=300| if X add "y" to the right of the second character
 +
|width=50| abc
 +
|width=50| ab'''y'''c
 +
|-
 +
|X:=[-2]>"y";
 +
|if X add "y" to the right of the second last character
 +
|abc
 +
|ab'''y'''c
 +
|-
 +
|X:="y"<[2];
 +
|if X add "y" to the left of the second character
 +
|abcde
 +
|a'''y'''bc
 +
|-
 +
|X:="y"<[-2];
 +
|if X add "y" to the left of the second last character
 +
|abcde
 +
|abc'''y'''de
 +
|}
 +
;In infixation and duplication rules, the reference may be either a string, a position or both:
 +
{|border="1" align="center" cellpadding="2"
 +
! RULE
 +
! REFERENCE
 +
|-
 +
|width=100| X:=[1]>"y";
 +
|width=300| The reference is the position only ("y" will be inserted to the right of the first character)
 +
|-
 +
| X:=["a"]>"y";
 +
| The reference is the string only ("y" will be inserted to the right of any "a")
 +
|-
 +
| X:=[1="a"]>"y";
 +
| The reference is the position and the string ("y" will be inserted to the right of the first character if the first character is "a")
 
|}
 
|}
 
;In replacement rules, the part to be deleted may be omitted if the whole string is to be replaced:
 
;In replacement rules, the part to be deleted may be omitted if the whole string is to be replaced:
:PLR:=”ABC”:”XYZ”;   = PLR:=”XYZ” (ABC becomes XYZ)
+
{|cellpadding=2
 +
|-
 +
|width=150|PLR:="ABC":"XYZ";
 +
|=
 +
|width=150|PLR:="XYZ"
 +
|(ABC becomes XYZ)
 +
|}
 
;In replacement rules, the part to be deleted may be represented by an interval of characters in the format [beginning-end]:
 
;In replacement rules, the part to be deleted may be represented by an interval of characters in the format [beginning-end]:
:PLR:=”B”:”X”; = PLR:=[2-3]:”X”;   (ABC becomes XYZ)
+
{|cellpadding=3
;The symbol ^” can be used for negation (^MCL” means “not MCL”):  
+
|-
:NOU&^MCL:=”x”:”y”; (If NOU and not MCL then replace “x” by “y”)
+
|width=150|PLR:="B":"X";
;Rules will only be applied if all conditions are true:
+
|=
:X:=”y”<”z”; ( “zabc” changes to “yabc”, but “abc” remains “abc”)
+
|width=150|PLR:=[2-2]:"X";
;The replacement rule applies only once the same action:
+
|(ABC becomes AXC)
:X:=”a”:”b”; ( “aaa” becomes “baa” and not “bbb”)
+
|}
;“<<” and “>>” add blank spaces
+
;The symbol "^" is used for negation ("^MCL" means "not MCL"):  
:X:=”a”<<”b” (“bc” becomes “a bc” and not “abc”)
+
:NOU&^MCL:="x":"y"; (If NOU and not MCL then replace "x" by "y")
 +
;"<<" and ">>" add blank spaces<ref name="not"/>
 +
:X:="a"<<"b" ("bc" becomes "a bc" and not "abc")
 +
;A-rules do not generate new words but only modify the existing ones.
 +
:The A-rule "FUT:="will"<<0;" (i.e, generate "will" as a prefix to the base form in case of future) will transform "love" into "will love", which will be considered, however, as a single word and not as a compound. Notice that this is the reason why compound tenses must never be generated through A-rules; otherwise, it would never be possible to generate other words (such as "not", "always", etc) between "will" and "love".
  
 
=== Common mistakes ===
 
=== Common mistakes ===
*nou:= ”y”<”z”;  (WRONG: Tags are case sensitive)
+
*nou:= "y"<"z";  (WRONG: Tags are case sensitive)
*NNN:= ”y”<”z”;  (WRONG: NNN is not defined in the tagset)
+
*NNN:= "y"<"z";  (WRONG: NNN is not defined in the tagset)
*NOUFEM:=”y”<”z”; (WRONG: Tags must be separated by &)
+
*NOUFEM:="y"<"z"; (WRONG: Tags must be separated by "&")
*NOU,FEM:=”y”<”z”; (WRONG: Tags must be separated by &)
+
*NOU,FEM:="y"<"z"; (WRONG: Tags must be separated by "&")
*NOU & FEM:=”y”<”z”;  (WRONG: There can be no blank spaces between tags)
+
*NOU & FEM:="y"<"z";  (WRONG: There can be no blank spaces between tags)
*X:=1<1; (WRONG: The left side must always be a string in a left appending rule)
+
*X:=1<1; (WRONG: The left side must always be a string in a prefixation rule)
*X:=1>1; (WRONG: The right side must always be a string in a right appending rule)
+
*X:=1>1; (WRONG: The right side must always be a string in a suffixation rule)
 
*X:=1; (WRONG: Replacement rules do not allow for numbers)
 
*X:=1; (WRONG: Replacement rules do not allow for numbers)
 
*X:=1:1; (WRONG: Replacement rules do not allow for numbers)
 
*X:=1:1; (WRONG: Replacement rules do not allow for numbers)
== Complex m-rules ==
+
 
Complex m-rules are formed from the combination of simple m-rules:
+
== Complex A-rules ==
left appending + right appending (circumfixation)
+
Complex A-rules are formed from the combination of simple A-rules:
(used, for instance, in circumfixation, i.e, to add a prefix and a suffix at the same time)
+
*circumfixation (prefixation + suffixation), to add a prefix and a suffix at the same time
CONDITION := “ADDED” < “DELETED”, “DELETED” > “ADDED”;
+
*prefixation + infixation, to add a prefix and a infix at the same time
Example: INC := “a”<0, 0>”ed”; (scatter > ascattered)
+
*infixation + suffixation, to add an infix and a suffix at the same time
replacement rule + right appending
+
*prefixation + infixation + suffixation, to add a prefix, an infix and a suffix at the same time
(used, for instance, for inflecting English irregular verbs)
+
=== Syntax ===
CONDITION := “DELETED”:”ADDED”, “DELETED”>”ADDED”;
+
Complex A-rules are formed by concatenating simple a-rules with ",":
Example: PAS := “ea”:”o”, 0>”en”; (break > broken)
+
<br>
left appending + left appending
+
<br>
 +
'''circumfixation'''
 +
CONDITION := "ADDED" < DELETED , DELETED > "ADDED";
 +
'''prefixation + infixation'''
 +
CONDITION := "ADDED" < DELETED , DELETED > "ADDED";
 +
'''infixation + suffixation'''
 +
CONDITION := DELETED > "ADDED" , "DELETED" > "ADDED";
 +
etc.
 +
 
 +
=== Examples ===
 +
{|border="1" align="center" cellpadding="2"
 +
|+Complex m-rules
 +
! RULE
 +
! BEHAVIOR
 +
! BEFORE
 +
! AFTER
 +
|-
 +
|width=100| X:="x"<0, 0>"y";
 +
|width=300| if X add "x" to the beginning and "y" to the end of the string
 +
|width=50| A
 +
|width=50| '''x'''A'''y'''
 +
|-
 +
| X:="x"<0, "A":"y";
 +
| if X add "x" to the beginning and replace "A" by "y"
 +
| ABC
 +
| '''xy'''BC
 +
|-
 +
| X:="A":"y", 0>"x";
 +
| if X replace "A" by "y" and add "x" to the end of the string
 +
| ABC
 +
| '''y'''BC'''x'''
 +
|-
 +
| X:="x"<0, "A":"y", 0>"z";
 +
| if X add "x" to the beginning, replace "A" by "y" and add "z" to the end of the string
 +
| ABC
 +
| '''xy'''BCz
 +
|}
 +
 
 
=== Observations ===
 
=== Observations ===
In complex inflectional rules:
+
;Complex A-rules are also used to integrate different simple A-rules:
;Actions must be conjoined through ","
+
{|cellpadding=2 border=1 align=center
;Each action is applied only once (i.e, rules are not exhaustive)
+
|-
:PLR:=0>”s”; (AAA > AAAS, and not AAASSSSSSSSS...)
+
|ORD:="1">"1st";<br>ORD:="2">"2nd";<br>ORD:="3">"3rd";
:CON := "A" : "E" ; (AAA > EAA, and not EEE)
+
|ORD:="1">"1st", "2">"2nd", "3">"3rd";
;Actions are applied from left to right (i.e., rules are not commutative)
+
|}
 +
;Actions are applied from left to right (i.e., order is important)
 
:PLR := "s" > "ses", "y" > "ies";  (kiss > kisses, city > cities)
 
:PLR := "s" > "ses", "y" > "ies";  (kiss > kisses, city > cities)
:PLR := "y" > "ies", "s" > "ses";  (kiss > kises, city>cities>citieses)
+
:PLR := "y" > "ies", "s" > "ses";  (kiss > kisses, city>cities>citieses)
 +
 
 
== Formal syntax ==  
 
== Formal syntax ==  
M-rules comply with the following syntax:
+
A-rules comply with the following syntax:
  
  <M-RULE>          ::= <CONDITION> :=<ACTION> [, <ACTION>]* ;
+
  <A-RULE>          ::= <CONDITION> ":=" <ACTION> ("," <ACTION>)* ";"
  <CONDITION>        ::= <ATAG>[“&”[“^”]<ATAG>]*
+
  <CONDITION>        ::= <ATAG>("&"("^")?<ATAG>)*
 
  <ATAG>            ::= {one of the tags defined in the [[Tagset|UNDLF Tagset]]}
 
  <ATAG>            ::= {one of the tags defined in the [[Tagset|UNDLF Tagset]]}
  <ACTION>          ::= <LEFT APPENDING> | <RIGHT APPENDING> | <REPLACEMENT>
+
  <ACTION>          ::= <PREFIXATION> | <SUFFIXATION> | <INFIXATION> | <REPLACEMENT>
  <LEFT APPENDING>   ::= <ADDED> {<| <<} [ <DELETED> ]
+
  <PREFIXATION>     ::= <ADDED> {"<" | "<<"} (<DELETED>)?
  <RIGHT APPENDING> ::= [ <DELETED> ] {>| >>} <ADDED>
+
  <SUFFIXATION>     ::= (<DELETED>)? {">" | ">>"} <ADDED>
  <REPLACEMENT>      ::= [ <STRING> ":" ] <ADDED> | "[" <INTEGER> "-" <INTEGER> "]" ":"  <ADDED>
+
<INFIXATION>      ::= "["<DELETED"]" ">" <ADDED> | <ADDED> "<" "["<DELETED"]"
 +
  <REPLACEMENT>      ::= ( <STRING> ":" )? <ADDED> | "[" <INTEGER> "-" <INTEGER> "]" ":"  <ADDED>
 
  <ADDED>            ::= <STRING>  
 
  <ADDED>            ::= <STRING>  
 
  <DELETED>          ::= <STRING> | <INTEGER>   
 
  <DELETED>          ::= <STRING> | <INTEGER>   
  <STRING>          ::= “ “ “ [a..Z]+ “ “ “
+
  <STRING>          ::= " " " [a..Z]+ " " "
 
  <INTEGER>          ::= [0..9]+
 
  <INTEGER>          ::= [0..9]+
  
Line 217: Line 403:
  
 
<a> = a is a non-terminal symbol<br />
 
<a> = a is a non-terminal symbol<br />
“a“ = a is a constant<br />
+
"a" = a is a constant<br />
[a] = a can be omitted<br />
+
 
a | b = a or b<br />
 
a | b = a or b<br />
 
{ a | b } = either a or b<br />
 
{ a | b } = either a or b<br />
a* = a can be repeated 0 or more times<br />
+
(a)? = a can occur 0 or 1 time<br />
a+ = a can be repeated 1 or more times<br />
+
(a)* = a can be repeated 0 or more times<br />
 +
(a)+ = a can be repeated 1 or more times<br />
 +
 
 +
== Notes ==
 +
<references/>

Latest revision as of 15:00, 5 September 2014

A-rule (affixation rule) is a specific type of transformation rule used for generating affixes (prefixes, suffixes, infixes) in the UNLarium framework.

Contents

When to use A-rules

A-rules are used for prefixation, suffixation and infixation, i.e., for adding morphemes to a given base form. They are used for generating inflections (such as "book">"books", "love">"loved") or derivations (such as "dress">"undress", "write">"writer").

When not to use A-rules

A-rules are not used for composition (i.e., to form new words by combining or putting together old words), as in "give">"give in", "go">"have gone" or "man">"fireman"). This should be treated by C-rules.

Types of A-rules

There are two types of A-rules:

  • simple A-rules involve a single action (such as prefixation, suffixation, infixation and replacement); and
  • complex A-rules involve more than one action (such as circumfixation).

Simple A-rules

There are four types of simple A-rules:

  • prefixation, for adding morphemes at the beginning of a base form.
  • suffixation, for adding morphemes at the end of a base form
  • infixation, for adding morphemes to the middle of the base form
  • replacement, for changing the base form

Syntax

The syntax for simple A-rules is the following:

prefixation

CONDITION := "ADDED" < DELETED;

suffixation

CONDITION := DELETED > "ADDED";

infixation

CONDITION := [REFERENCE] > "ADDED";
CONDITION := "ADDED" < [REFERENCE];

replacement

 CONDITION := DELETED : "ADDED";

duplication

 CONDITION := [REFERENCE]+;

Where:

  • CONDITION = tag (such as "PLR", "FEM", etc) or list of tags ("FEM&PLR") that indicates when the rule should be applied
  • ADDED (between quotes) = the string to be added ;
  • REFERENCE (between square brackets) = the reference string (between quotes) or the position (without quotes) of the string to be added;
  • DELETED = the string (between quotes) or the number of characters (without quotes) to be deleted.

Examples

Prefixation
RULE BEHAVIOR BEFORE AFTER
X:="y"<"z"; if X replace the string "z" by the string "y" in the beginning of the string zabc yabc
X:="y"<1; if X replace the first character of the string by "y" zabc yabc
X:="y"<0; if X add the string "y" to the beginning of the string zabc yzabc
X:="y"<;[1] if X add the string "y" to the beginning of the string (idem previous) zabc yzabc
X:="y"<<0;[1] if X add the string "y" and a blank space to the beginning of the string zabc y zabc
X:="y"<<;[1] if X add the string "y" and a blank space to the beginning of the string (idem previous) zabc y zabc


Suffixation
RULE BEHAVIOR BEFORE AFTER
X:="z">"y"; if X replace the string "z" by the string "y" in the end of the string abcz abcy
X:=1>"y"; if X replace the last character of the string by "y" abcz abcy
X:=0>"y"; if X add the string "y" to the end of the string abcz abczy
X:=>"y";[1] if X add the string "y" to the end of the string (idem previous) abcz abczy
X:=0>>"y";[1] if X add a blank space and the string "y" to the end of the string abcz abcz y
X:=>>"y";[1] if X add a blank space and the string "y" to the end of the string (idem previous) abcz abcz y


Infixation
RULE BEHAVIOR BEFORE AFTER
X:=[2]>"y"; if X add "y" to the right of the second character abc abyc
X:="y"<[3]; if X add "y" to the left of the third character abc abyc
X:=["b"]>"y"; if X add "y" to the right of "b"; abc abyc
X:="y"<["c"]; if X add "y" to the left of "c" abc abyc
X:="y"<[3="c"]; if X add "y" to the left of "c", if "c" is the third character abc abyc
X:=[2,="b"]>"y"; if X add "y" to the right of "b", if "b" is the second character; abc abyc
X:=[-2]>"y"; if X add "y" to the right of the second character from the right abc abyc
X:="y"<[-2]; if X add "y" to the left of the second character from the right abc aybc


Replacement
RULE BEHAVIOR BEFORE AFTER
X:="y"; if X replace the whole by "y" X y
X:="z":"y"; if X replace the string "z" by "y" azbc aybc
X:=[2-3]:"y"; if X replace the second to the third character by "y" abcz ayz


Duplication
RULE BEHAVIOR BEFORE AFTER
X:=[2]+; if X duplicate the second character abc abbc
X:=[-2]+; if X duplicate the second last character abc abbc
X:=[2="b"]+; if X duplicate the second character, if it is "b" abc abbc

Observations

Rules will only be applied if all conditions are true
X:="y"<"z"; ( "zabc" changes to "yabc", but "abc" remains "abc" since there is no "z" to be replaced)
String fields are necessarily continuous
X:="aaa"<"xyz"; ( "xyzbbb" changes to "aaabbb", but "bxbybz" remains "bxbybz" since there is no continuous string "xyz" to be replaced)
Prefixation, infixation and suffixation rules apply only once (i.e, rules are not exhaustive)
PLR:=0>"s"; ("X" becomes "Xs", and not "Xssssss...")
Replacement rules apply as long as the conditions are true
X:="a":"b"; ( "aaa" becomes "bbb" and not "abb")
In prefixation and suffixation rules, the part to be deleted may be represented by the number of characters (without quotes)
PLR := "X"<""; = PLR := "X"<0; (ABC becomes XABC)
PLR:= "X"<"A"; = PLR:= "X"<1; (ABC becomes XBC)
PLR:= "XY"<"AB"; = PLR:= "XY"<2; (ABC becomes XYC)
PLR:="">"X"; = PLR:= 0>"X"; (ABC becomes ABCX)
PLR:="C">"X"; = PLR:= 1>"X"; (ABC becomes ABX)
PLR:="BC">"XY"; = PLR:= 2>"XY"; (ABC becomes AXY)
In infixation and duplication rules, the position of the addition may be made with reference to the end of string by using "-".
RULE BEHAVIOR BEFORE AFTER
X:=[2]>"y"; if X add "y" to the right of the second character abc abyc
X:=[-2]>"y"; if X add "y" to the right of the second last character abc abyc
X:="y"<[2]; if X add "y" to the left of the second character abcde aybc
X:="y"<[-2]; if X add "y" to the left of the second last character abcde abcyde
In infixation and duplication rules, the reference may be either a string, a position or both
RULE REFERENCE
X:=[1]>"y"; The reference is the position only ("y" will be inserted to the right of the first character)
X:=["a"]>"y"; The reference is the string only ("y" will be inserted to the right of any "a")
X:=[1="a"]>"y"; The reference is the position and the string ("y" will be inserted to the right of the first character if the first character is "a")
In replacement rules, the part to be deleted may be omitted if the whole string is to be replaced
PLR:="ABC":"XYZ"; = PLR:="XYZ" (ABC becomes XYZ)
In replacement rules, the part to be deleted may be represented by an interval of characters in the format [beginning-end]
PLR:="B":"X"; = PLR:=[2-2]:"X"; (ABC becomes AXC)
The symbol "^" is used for negation ("^MCL" means "not MCL")
NOU&^MCL:="x":"y"; (If NOU and not MCL then replace "x" by "y")
"<<" and ">>" add blank spaces[1]
X:="a"<<"b" ("bc" becomes "a bc" and not "abc")
A-rules do not generate new words but only modify the existing ones.
The A-rule "FUT:="will"<<0;" (i.e, generate "will" as a prefix to the base form in case of future) will transform "love" into "will love", which will be considered, however, as a single word and not as a compound. Notice that this is the reason why compound tenses must never be generated through A-rules; otherwise, it would never be possible to generate other words (such as "not", "always", etc) between "will" and "love".

Common mistakes

  • nou:= "y"<"z"; (WRONG: Tags are case sensitive)
  • NNN:= "y"<"z"; (WRONG: NNN is not defined in the tagset)
  • NOUFEM:="y"<"z"; (WRONG: Tags must be separated by "&")
  • NOU,FEM:="y"<"z"; (WRONG: Tags must be separated by "&")
  • NOU & FEM:="y"<"z"; (WRONG: There can be no blank spaces between tags)
  • X:=1<1; (WRONG: The left side must always be a string in a prefixation rule)
  • X:=1>1; (WRONG: The right side must always be a string in a suffixation rule)
  • X:=1; (WRONG: Replacement rules do not allow for numbers)
  • X:=1:1; (WRONG: Replacement rules do not allow for numbers)

Complex A-rules

Complex A-rules are formed from the combination of simple A-rules:

  • circumfixation (prefixation + suffixation), to add a prefix and a suffix at the same time
  • prefixation + infixation, to add a prefix and a infix at the same time
  • infixation + suffixation, to add an infix and a suffix at the same time
  • prefixation + infixation + suffixation, to add a prefix, an infix and a suffix at the same time

Syntax

Complex A-rules are formed by concatenating simple a-rules with ",":

circumfixation

CONDITION := "ADDED" < DELETED , DELETED > "ADDED";

prefixation + infixation

CONDITION := "ADDED" < DELETED , DELETED > "ADDED";

infixation + suffixation

CONDITION := DELETED > "ADDED" , "DELETED" > "ADDED";

etc.

Examples

Complex m-rules
RULE BEHAVIOR BEFORE AFTER
X:="x"<0, 0>"y"; if X add "x" to the beginning and "y" to the end of the string A xAy
X:="x"<0, "A":"y"; if X add "x" to the beginning and replace "A" by "y" ABC xyBC
X:="A":"y", 0>"x"; if X replace "A" by "y" and add "x" to the end of the string ABC yBCx
X:="x"<0, "A":"y", 0>"z"; if X add "x" to the beginning, replace "A" by "y" and add "z" to the end of the string ABC xyBCz

Observations

Complex A-rules are also used to integrate different simple A-rules
ORD:="1">"1st";
ORD:="2">"2nd";
ORD:="3">"3rd";
ORD:="1">"1st", "2">"2nd", "3">"3rd";
Actions are applied from left to right (i.e., order is important)
PLR := "s" > "ses", "y" > "ies"; (kiss > kisses, city > cities)
PLR := "y" > "ies", "s" > "ses"; (kiss > kisses, city>cities>citieses)

Formal syntax

A-rules comply with the following syntax:

<A-RULE>           ::= <CONDITION> ":=" <ACTION> ("," <ACTION>)* ";"
<CONDITION>        ::= <ATAG>("&"("^")?<ATAG>)*
<ATAG>             ::= {one of the tags defined in the UNDLF Tagset}
<ACTION>           ::= <PREFIXATION> | <SUFFIXATION> | <INFIXATION> | <REPLACEMENT>
<PREFIXATION>      ::= <ADDED>	 {"<" | "<<"} 	(<DELETED>)?
<SUFFIXATION>      ::= (<DELETED>)? {">" | ">>"} 	<ADDED>
<INFIXATION>       ::= "["<DELETED"]" ">" <ADDED> | <ADDED> "<" "["<DELETED"]"
<REPLACEMENT>      ::= ( <STRING> ":" )? <ADDED> | "[" <INTEGER> "-" <INTEGER> "]" ":"  <ADDED>
<ADDED>            ::= <STRING> 
<DELETED>          ::= <STRING> | <INTEGER>  
<STRING>           ::= " " " [a..Z]+ " " "
<INTEGER>          ::= [0..9]+

where

<a> = a is a non-terminal symbol
"a" = a is a constant
a | b = a or b
{ a | b } = either a or b
(a)? = a can occur 0 or 1 time
(a)* = a can be repeated 0 or more times
(a)+ = a can be repeated 1 or more times

Notes

  1. 1.0 1.1 1.2 1.3 1.4 1.5 1.6 This feature is not supported by the UNLdev.
Software