A-rule

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(Examples)
Line 12: Line 12:
 
== Simple m-rules ==
 
== Simple m-rules ==
 
There are three types of simple m-rules:
 
There are three types of simple m-rules:
*left appending (prefixation), for adding/changing information at the beginning of a base form
+
*'''prefixation''', for adding/changing information at the beginning of a base form
*right appending (suffixation), for adding/changing information at the end of a base form
+
*'''suffixation''', for adding/changing information at the end of a base form
*replacement (infixation), for adding/changing information in the midlle of a base form
+
*'''infixation''', for adding/changing information in the midlle of a base form
 
=== Syntax ===
 
=== Syntax ===
 
The syntax for simple m-rules is the following:
 
The syntax for simple m-rules is the following:
  left appending
+
  prefixation
 
   CONDITION := “ADDED” < “DELETED”;
 
   CONDITION := “ADDED” < “DELETED”;
  right appending
+
  suffixation
 
   CONDITION := “DELETED” > “ADDED”;
 
   CONDITION := “DELETED” > “ADDED”;
  replacement
+
  infixation
 
   CONDITION := “DELETED” : “ADDED”;
 
   CONDITION := “DELETED” : “ADDED”;
 
Where:
 
Where:
Line 28: Line 28:
 
*DELETED = the string to be deleted (between quotes);
 
*DELETED = the string to be deleted (between quotes);
 
=== Examples ===
 
=== Examples ===
{|border="1" align="center" cellpadding="5"
+
{|border="1" align="center" cellpadding="2"
|+Left appending (prefixation) rules
+
|+Prefixation
 
! RULE
 
! RULE
 
! BEHAVIOR
 
! BEHAVIOR
Line 35: Line 35:
 
! AFTER
 
! AFTER
 
|-
 
|-
|width=100| X:=”y”<”z”;
+
|width=50| X:=”y”<”z”;
|width=200| if X replace the string “z” by the string “y” in the beginning of the string
+
|width=300| if X replace the string “z” by the string “y” in the beginning of the string
|width=100| '''z'''abc
+
|width=50| '''z'''abc
|width=100| '''y'''abc
+
|width=50| '''y'''abc
 
|-
 
|-
 
| X:=”y”<1;
 
| X:=”y”<1;
Line 65: Line 65:
 
| '''y''' zabc
 
| '''y''' zabc
 
|}
 
|}
{|border="1" align="center" cellpadding="5"
+
<br>
|+Right appending (suffixation) rules
+
{|border="1" align="center" cellpadding="2"
 +
|+Suffixation
 
! RULE
 
! RULE
 
! BEHAVIOR
 
! BEHAVIOR
Line 72: Line 73:
 
! AFTER
 
! AFTER
 
|-
 
|-
|width=100| X:=”z”>”y”;
+
|width=50| X:=”z”>”y”;
|width=200| if X replace the string “z” by the string “y” in the end of the string
+
|width=300| if X replace the string “z” by the string “y” in the end of the string
|width=100| abc'''z'''
+
|width=50| abc'''z'''
|width=100| abc'''y'''
+
|width=50| abc'''y'''
 
|-
 
|-
 
| X:=1>”y”;
 
| X:=1>”y”;
Line 102: Line 103:
 
| abcz '''y'''
 
| abcz '''y'''
 
|}
 
|}
{|border="1" align="center" cellpadding="5"
+
<br>
|+Replacement (infixation) rules
+
{|border="1" align="center" cellpadding="2"
 +
|+Infixation
 
! RULE
 
! RULE
 
! BEHAVIOR
 
! BEHAVIOR
Line 109: Line 111:
 
! AFTER
 
! AFTER
 
|-
 
|-
|width=100| X:=”y”;
+
|width=50| X:=”y”;
|width=200| if X replace the whole by “y”
+
|width=300| if X replace the whole by “y”
|width=100| X
+
|width=50| X
|width=100| '''y'''
+
|width=50| '''y'''
 
|-
 
|-
 
| X:=”z”:”y”;
 
| X:=”z”:”y”;
Line 129: Line 131:
 
| Y
 
| Y
 
|}
 
|}
 
 
=== Observations ===
 
=== Observations ===
;In appending rules, the part to be deleted may be represented by the number of characters (without quotes):
+
;Rules will only be applied if all conditions are true:
{|cellpadding=3
+
:X:=”y”<”z”; ( “zabc” changes to “yabc”, but “abc” remains “abc”)
 +
;Each action is applied only once (i.e, rules are not exhaustive)
 +
:PLR:=0>”s”; ("AAA" becomes "AAAS", and not "AAASSSSSSSSS...")
 +
;The replacement rule applies only once to the same string:
 +
:X:=”a”:”b”; ( “aaa” becomes “baa” and not “bbb”)
 +
;In prefixation and suffixation rules, the part to be deleted may be represented by the number of characters (without quotes):
 +
{|cellpadding=2
 
|-
 
|-
 
|width=150|PLR := “X”<””;
 
|width=150|PLR := “X”<””;
Line 165: Line 172:
 
|}
 
|}
 
;In replacement rules, the part to be deleted may be omitted if the whole string is to be replaced:
 
;In replacement rules, the part to be deleted may be omitted if the whole string is to be replaced:
{|cellpadding=3
+
{|cellpadding=2
 
|-
 
|-
 
|width=150|PLR:=”ABC”:”XYZ”;
 
|width=150|PLR:=”ABC”:”XYZ”;
Line 177: Line 184:
 
|width=150|PLR:=”B”:”X”;
 
|width=150|PLR:=”B”:”X”;
 
|=
 
|=
|width=150|PLR:=[2-3]:”X”;
+
|width=150|PLR:=[2-2]:”X”;
|(ABC becomes XYZ)
+
|(ABC becomes AXC)
 
|}
 
|}
 
;The symbol “^” can be used for negation (“^MCL” means “not MCL”):  
 
;The symbol “^” can be used for negation (“^MCL” means “not MCL”):  
 
:NOU&^MCL:=”x”:”y”; (If NOU and not MCL then replace “x” by “y”)
 
:NOU&^MCL:=”x”:”y”; (If NOU and not MCL then replace “x” by “y”)
;Rules will only be applied if all conditions are true:
 
:X:=”y”<”z”; ( “zabc” changes to “yabc”, but “abc” remains “abc”)
 
;The replacement rule applies only once the same action:
 
:X:=”a”:”b”; ( “aaa” becomes “baa” and not “bbb”)
 
 
;“<<” and “>>” add blank spaces
 
;“<<” and “>>” add blank spaces
 
:X:=”a”<<”b” (“bc” becomes “a bc” and not “abc”)
 
:X:=”a”<<”b” (“bc” becomes “a bc” and not “abc”)
 
 
=== Common mistakes ===
 
=== Common mistakes ===
 
*nou:= ”y”<”z”;  (WRONG: Tags are case sensitive)
 
*nou:= ”y”<”z”;  (WRONG: Tags are case sensitive)
Line 195: Line 197:
 
*NOU,FEM:=”y”<”z”; (WRONG: Tags must be separated by “&”)
 
*NOU,FEM:=”y”<”z”; (WRONG: Tags must be separated by “&”)
 
*NOU & FEM:=”y”<”z”;  (WRONG: There can be no blank spaces between tags)
 
*NOU & FEM:=”y”<”z”;  (WRONG: There can be no blank spaces between tags)
*X:=1<1; (WRONG: The left side must always be a string in a left appending rule)
+
*X:=1<1; (WRONG: The left side must always be a string in a prefixation rule)
*X:=1>1; (WRONG: The right side must always be a string in a right appending rule)
+
*X:=1>1; (WRONG: The right side must always be a string in a suffixation rule)
 
*X:=1; (WRONG: Replacement rules do not allow for numbers)
 
*X:=1; (WRONG: Replacement rules do not allow for numbers)
 
*X:=1:1; (WRONG: Replacement rules do not allow for numbers)
 
*X:=1:1; (WRONG: Replacement rules do not allow for numbers)
 
== Complex m-rules ==
 
== Complex m-rules ==
 
Complex m-rules are formed from the combination of simple m-rules:
 
Complex m-rules are formed from the combination of simple m-rules:
left appending + right appending (circumfixation)
+
*circumfixation (prefixation + suffixation), to add a prefix and a suffix at the same time
(used, for instance, in circumfixation, i.e, to add a prefix and a suffix at the same time)
+
*prefixation + infixation, to add a prefix and a suffix at the same time
CONDITION := “ADDED” < “DELETED”, “DELETED” > “ADDED”;
+
*infixation + suffixation, to add an infix and a suffix at the same time
Example: INC := “a”<0, 0>”ed”; (scatter > ascattered)
+
*prefixation + infixation + suffixation, to add a prefix, an infix and a suffix at the same time
  replacement rule + right appending
+
=== Syntax ===
(used, for instance, for inflecting English irregular verbs)
+
Complex m-rules are formed by concatenating simple m-rules with ",":
CONDITION := “DELETED”:”ADDED”, “DELETED”>”ADDED”;
+
circumfixation
Example: PAS := “ea”:”o”, 0>”en”; (break > broken)
+
  CONDITION := “ADDED” < “DELETED” , "DELETED" > "ADDED";
left appending + left appending
+
prefixation + infixation
 +
  CONDITION := “ADDED” < “DELETED” , "DELETED" : "ADDED";
 +
  infixation + suffixation
 +
  CONDITION := "DELETED" : "ADDED" , "DELETED" > "ADDED";
 +
prefixation + infixation + suffixation
 +
  CONDITION := “ADDED” < “DELETED” , "DELETED" : "ADDED" , "DELETED" > "ADDED";
 +
=== Examples ===
 +
{|border="1" align="center" cellpadding="2"
 +
|+Complex m-rules
 +
! RULE
 +
! BEHAVIOR
 +
! BEFORE
 +
! AFTER
 +
|-
 +
|width=50| X:=”x”<0, 0>"y";
 +
|width=300| if X add "x" to the beginning and "z" to the end of the string
 +
|width=50| A
 +
|width=50| '''x'''A'''y'''
 +
|-
 +
| X:=”x”<0, "A":"y";
 +
| if X add "x" to the beginning and replace "A" by "y"
 +
| ABC
 +
| '''x''''''y'''BC
 +
|-
 +
| X:="A":"y", 0>"x";
 +
| if X replace "A" by "y" and add "x" to the end of the string
 +
| ABC
 +
| '''y'''BC'''x'''
 +
|-
 +
| X:=”x”<0, "A":"y", 0>"z";
 +
| if X add "x" to the beginning, replace "A" by "y" and add "z" to the end of the string
 +
| ABC
 +
| '''x''''''y'''BCz
 +
|}
 
=== Observations ===
 
=== Observations ===
In complex inflectional rules:
+
;Complex m-rules may also used to integrate different simple m-rules:
;Actions must be conjoined through ","
+
{|cellpadding=2
;Each action is applied only once (i.e, rules are not exhaustive)
+
|-
:PLR:=0>”s”; (AAA > AAAS, and not AAASSSSSSSSS...)
+
|ORD:="1">"1st";<br>ORD:="2">"2nd";<br>ORD:="3">"3rd";
:CON := "A" : "E" ; (AAA > EAA, and not EEE)
+
|ORD:="1">"1st", "2">"2nd", "3">"3rd";
 +
|}
 
;Actions are applied from left to right (i.e., rules are not commutative)
 
;Actions are applied from left to right (i.e., rules are not commutative)
 
:PLR := "s" > "ses", "y" > "ies";  (kiss > kisses, city > cities)
 
:PLR := "s" > "ses", "y" > "ies";  (kiss > kisses, city > cities)
 
:PLR := "y" > "ies", "s" > "ses";  (kiss > kises,  city>cities>citieses)
 
:PLR := "y" > "ies", "s" > "ses";  (kiss > kises,  city>cities>citieses)
== Formal syntax ==  
+
== Formal syntax for m-rules ==  
 
M-rules comply with the following syntax:
 
M-rules comply with the following syntax:
  

Revision as of 13:51, 19 January 2010

M-rule is the formalism used for describing morphological behaviour in the UNLarium framework. It is used in inflectional paradigms, in inflectional rules, in semantic rules and in morphological settings.

Contents

Generative and enumerative lexica

The repertoire of lexemes of a given language can be organized in two basic ways: 1) as a simple listing of all word forms, i.e., of all variants of the same lexeme ("die", "dies", "died", "dying", etc); or 2) as a list of base forms accompanied by morphological rules for generating their inflections ("die", +s, +d, etc). The first architecture, the "enumerative" one, states that a word form can be more accurately retrieved as a single atomic entity instead of as a combination of several different morphemes. Its main advantages concern word matching (faster and more precise as there is no possibility of over-generation) and construction (it is easier and often less expensive to list the irregular forms instead of trying to define paradigms for them). Nevertheless, the latter architecture, i.e., the "generative" one, which relies on the principle that “the smaller the better”, is far much more common, as its main advantages concern access (the word retrieval process is supposed to be faster), storage (it requires a smaller amount of memory space) and maintenance (changes are automatically propagated to all instances of a given entry).

The UNLarium is mainly a generative environment, in the sense that word forms are expected to be represented by their corresponding LRUs and base forms, along with rules for generating their possible inflections. These are the m-rules, to be provided either as LRU-specific (in case of irregular behaviour) or as inflectional paradigms (applying to several different LRUs).

Types of m-rules

There are two types of m-rules:

  • simple m-rules involve a single action (such as prefixation, suffixation or infixation); and
  • complex m-rules involve more than one action (such as prefixation and sufixation, or two suffixations).

Simple m-rules

There are three types of simple m-rules:

  • prefixation, for adding/changing information at the beginning of a base form
  • suffixation, for adding/changing information at the end of a base form
  • infixation, for adding/changing information in the midlle of a base form

Syntax

The syntax for simple m-rules is the following:

prefixation
 CONDITION := “ADDED” < “DELETED”;
suffixation
 CONDITION := “DELETED” > “ADDED”;
infixation
 CONDITION := “DELETED” : “ADDED”;

Where:

  • CONDITION = tag (such as “PLR”, “FEM”, etc) or list of tags (“FEM&PLR”) that indicates when the rule should be applied
  • ADDED = the string to be added (between quotes);
  • DELETED = the string to be deleted (between quotes);

Examples

Prefixation
RULE BEHAVIOR BEFORE AFTER
X:=”y”<”z”; if X replace the string “z” by the string “y” in the beginning of the string zabc yabc
X:=”y”<1; if X replace the first character of the string by “y” zabc yabc
X:=”y”<0; if X add the string “y” to the beginning of the string zabc yzabc
X:=”y”<; if X add the string “y” to the beginning of the string (idem previous) zabc yzabc
X:=”y”<<0; if X add the string “y” and a blank space to the beginning of the string zabc y zabc
X:=”y”<<; if X add the string “y” and a blank space to the beginning of the string (idem previous) zabc y zabc


Suffixation
RULE BEHAVIOR BEFORE AFTER
X:=”z”>”y”; if X replace the string “z” by the string “y” in the end of the string abcz abcy
X:=1>”y”; if X replace the last character of the string by “y” abcz abcy
X:=0>”y”; if X add the string “y” to the end of the string abcz abczy
X:=>”y”; if X add the string “y” to the end of the string (idem previous) abcz abczy
X:=0>>”y”; if X add a blank space and the string “y” to the end of the string abcz abcz y
X:=>>”y”; if X add a blank space and the string “y” to the end of the string (idem previous) abcz abcz y


Infixation
RULE BEHAVIOR BEFORE AFTER
X:=”y”; if X replace the whole by “y” X y
X:=”z”:”y”; if X replace the string “z” by “y” azbc aybc
X:=[2;3]:”y”; if X replace the second to the third character by “z” abcz ayz
X:=Y; replace the feature X by the feature Y X Y

Observations

Rules will only be applied if all conditions are true
X:=”y”<”z”; ( “zabc” changes to “yabc”, but “abc” remains “abc”)
Each action is applied only once (i.e, rules are not exhaustive)
PLR:=0>”s”; ("AAA" becomes "AAAS", and not "AAASSSSSSSSS...")
The replacement rule applies only once to the same string
X:=”a”:”b”; ( “aaa” becomes “baa” and not “bbb”)
In prefixation and suffixation rules, the part to be deleted may be represented by the number of characters (without quotes)
PLR := “X”<””; = PLR := “X”<0; (ABC becomes XABC)
PLR:= “X”<”A”; = PLR:= “X”<1; (ABC becomes XBC)
PLR:= “XY”<”AB”; = PLR:= “XY”<2; (ABC becomes XYC)
PLR:=””>”X”; = PLR:= 0>”X”; (ABC becomes ABCX)
PLR:=”C”>”X”; = PLR:= 1>”X”; (ABC becomes ABX)
PLR:=”BC”>”XY”; = PLR:= 2>”XY”; (ABC becomes AXY)
In replacement rules, the part to be deleted may be omitted if the whole string is to be replaced
PLR:=”ABC”:”XYZ”; = PLR:=”XYZ” (ABC becomes XYZ)
In replacement rules, the part to be deleted may be represented by an interval of characters in the format [beginning-end]
PLR:=”B”:”X”; = PLR:=[2-2]:”X”; (ABC becomes AXC)
The symbol “^” can be used for negation (“^MCL” means “not MCL”)
NOU&^MCL:=”x”:”y”; (If NOU and not MCL then replace “x” by “y”)
“<<” and “>>” add blank spaces
X:=”a”<<”b” (“bc” becomes “a bc” and not “abc”)

Common mistakes

  • nou:= ”y”<”z”; (WRONG: Tags are case sensitive)
  • NNN:= ”y”<”z”; (WRONG: NNN is not defined in the tagset)
  • NOUFEM:=”y”<”z”; (WRONG: Tags must be separated by “&”)
  • NOU,FEM:=”y”<”z”; (WRONG: Tags must be separated by “&”)
  • NOU & FEM:=”y”<”z”; (WRONG: There can be no blank spaces between tags)
  • X:=1<1; (WRONG: The left side must always be a string in a prefixation rule)
  • X:=1>1; (WRONG: The right side must always be a string in a suffixation rule)
  • X:=1; (WRONG: Replacement rules do not allow for numbers)
  • X:=1:1; (WRONG: Replacement rules do not allow for numbers)

Complex m-rules

Complex m-rules are formed from the combination of simple m-rules:

  • circumfixation (prefixation + suffixation), to add a prefix and a suffix at the same time
  • prefixation + infixation, to add a prefix and a suffix at the same time
  • infixation + suffixation, to add an infix and a suffix at the same time
  • prefixation + infixation + suffixation, to add a prefix, an infix and a suffix at the same time

Syntax

Complex m-rules are formed by concatenating simple m-rules with ",":

circumfixation 
 CONDITION := “ADDED” < “DELETED” , "DELETED" > "ADDED";
prefixation + infixation 
 CONDITION := “ADDED” < “DELETED” , "DELETED" : "ADDED";
infixation + suffixation
 CONDITION := "DELETED" : "ADDED" , "DELETED" > "ADDED";
prefixation + infixation + suffixation
 CONDITION := “ADDED” < “DELETED” , "DELETED" : "ADDED" , "DELETED" > "ADDED";

Examples

Complex m-rules
RULE BEHAVIOR BEFORE AFTER
X:=”x”<0, 0>"y"; if X add "x" to the beginning and "z" to the end of the string A xAy
X:=”x”<0, "A":"y"; if X add "x" to the beginning and replace "A" by "y" ABC x'y'BC
X:="A":"y", 0>"x"; if X replace "A" by "y" and add "x" to the end of the string ABC yBCx
X:=”x”<0, "A":"y", 0>"z"; if X add "x" to the beginning, replace "A" by "y" and add "z" to the end of the string ABC x'y'BCz

Observations

Complex m-rules may also used to integrate different simple m-rules
ORD:="1">"1st";
ORD:="2">"2nd";
ORD:="3">"3rd";
ORD:="1">"1st", "2">"2nd", "3">"3rd";
Actions are applied from left to right (i.e., rules are not commutative)
PLR := "s" > "ses", "y" > "ies"; (kiss > kisses, city > cities)
PLR := "y" > "ies", "s" > "ses"; (kiss > kises, city>cities>citieses)

Formal syntax for m-rules

M-rules comply with the following syntax:

<M-RULE>           ::= <CONDITION> “:=” <ACTION> [, <ACTION>]* “;”
<CONDITION>        ::= <ATAG>[“&”[“^”]<ATAG>]*
<ATAG>             ::= {one of the tags defined in the UNDLF Tagset}
<ACTION>           ::= <LEFT APPENDING> | <RIGHT APPENDING> | <REPLACEMENT>
<LEFT APPENDING>   ::= <ADDED>	 {“<” | “<<”} 	[ <DELETED> ]
<RIGHT APPENDING>  ::= [ <DELETED> ]	 {“>” | “>>”} 	<ADDED>
<REPLACEMENT>      ::= [ <STRING> ":" ] <ADDED> | "[" <INTEGER> "-" <INTEGER> "]" ":"  <ADDED>
<ADDED>            ::= <STRING> 
<DELETED>          ::= <STRING> | <INTEGER>  
<STRING>           ::= “ “ “ [a..Z]+ “ “ “
<INTEGER>          ::= [0..9]+

where

<a> = a is a non-terminal symbol
“a“ = a is a constant
[a] = a can be omitted
a | b = a or b
{ a | b } = either a or b
a* = a can be repeated 0 or more times
a+ = a can be repeated 1 or more times

Software