L-rule
(→Observations) |
(→Observations) |
||
Line 109: | Line 109: | ||
:The rule "(BLK):=("-");" will transform "a b c d e" into "a-b-c-d-e" (and not only in "a-b c d e"). | :The rule "(BLK):=("-");" will transform "a b c d e" into "a-b-c-d-e" (and not only in "a-b c d e"). | ||
;The symbol '''^''' is used for negation and may be used to prevent infinite loops: | ;The symbol '''^''' is used for negation and may be used to prevent infinite loops: | ||
− | :(^".")(STAIL):=(".")(%02); (Add a period before the end of the sentence if there is not a period yet | + | :(^".")(STAIL):=(".")(%02); (Add a period before the end of the sentence if there is not a period yet) |
== Common mistakes == | == Common mistakes == |
Revision as of 16:11, 22 March 2010
Ph-rule (phonetic rule) is the formalism used for generating spelling changes in the UNLarium framework.
Contents |
When to use Ph-rules
Ph-rules are used for generating spelling changes (such as in contraction, elision, assimilation, etc). They are also used to generate other spelling conventions, such as the use of capital letters and punctuation marks.
When not to use Ph-rules
Ph-rules are not to be used for sound changes that do not affect spelling.
Syntax
The general syntax for Ph-rules is the following:
(CONDITION) := (ACTION);
Where:
- CONDITION is a single form or a sequence of forms over which actions will take place; and
- ACTION is the action to be performed over each form or sequence of forms of the CONDITION.
CONDITION and ACTION may be expressed as:
- a character or string of characters, between quotes: ("a");
- a tag or list of tags, extracted from the UNDL Foundation tagset: (VOW);
- a combination of characters and tags: ("a",PRE);
Examples:
- ("Mr."):=("Mister"); (replace "Mr." by "Mister")
- ("doctor"):=("dr."); (replace "doctor" by "dr.")
Ph-rules are normally sensitive to the context and apply over a set of conditions rather than over isolated word forms. In this case, each separate word form must be isolated between parentheses and described as a different condition.
- ("I")(BLK)("am"):=("I'm"); (replace "I am" by "I'm")
Types of Ph-rules
There are basically three types of Ph-rules:
- replacement, when the number of parentheses in the CONDITION field is equal to the number of parentheses in the ACTION field:
- addition, when the number of parentheses in the CONDITION field is lower than the number of parentheses in the ACTION field;
- deletion, when the number of parentheses in the CONDITION field is greater than the number parentheses in the ACTION field.
Parentheses are automatically co-indexed between the CONDITION and the ACTION field, so that the first pair of parentheses of the CONDITION field corresponds to the first pair of parentheses of the ACTION field, and so on. This means that parentheses are to be repeated on the right side of a Ph-rule if their corresponding forms are not expected to be deleted. In order to control the process of adding, deleting and reordering, forms may be referred by the index "%NN" where NN is the order of appearance in the left side:
RULE | BEFORE > AFTER | DESCRIPTION |
---|---|---|
("a")("b")("c"):=("d")("e")("f"); | abc > def | "a" will be replaced by "d"; "b" by "e"; and "c" by "f" |
("a")("b")("c"):=("d")( )( ); | abc > dbc | "a" will be replaced by "d"; "b" and "c" will be preserved |
("a")("b")("c"):=("d")("")(""); | abc > d | "a" will be replaced by "d"; "b" and "c" will be replaced by "" (i.e., blank) |
("a")("b")("c"):=("d")( ); | abc > ab | "a" will be replaced by "d"; "b" will be preserved; "c" will be deleted |
("a")("b")("c"):=("d"); | abc > d | "a" will be replaced by "d"; "b" and "c" will be deleted |
("a")("b")("c"):=(%03)(%02)(%01); | abc > cba | "a", "b" and "c" will be preserved, but reordered |
("a")("b")("c"):=("d")(%03); | abc > dc | "a" will be replaced by "d"; "b" will be deleted; "c" will be preserved |
("a")("b")("c"):=("d")("g")( )( ); | abc > dgc | "a" will be replaced by "d"; "b" will be replaced by "g"; "c" will be preserved; and a new form will be generate after it |
("a")("b")("c"):=("d")("g")(%02)(%03); | abc > dgbc | "a" will be replaced by "d"; "g" will be generated after it; and then "b" and "c", which will be preserved |
Observations
- Conditions may be expressed by features in addition to strings of characters. Features are added through "+" and deleted through "-".
RULE | BEFORE > AFTER | DESCRIPTION |
---|---|---|
("a",ART)(BLK)(VOW):=("an")( )( ); | a adjective > an adjective | replace the article (ART) "a" by "an" before a blank space (BLK) and a vowel (VOW); preserve the second (BLK) and the third form (VOW) without any change |
("a",PRE)(BLK)("a",ART):=("à",+ART,+CTC); | a a > à | replace the preposition (PRE) "a" in front of blank (BLK) and the article (ART) "a" by "à"; add the features ART (article) and CTC (contraction) to the first form; and delete the second (BLK) and the third form ("a",ART) |
("de",PRE)(BLK)("le",ART):=("du",+ART,+CTC); | de le > du | replace the preposition (PRE) "de" in front of blank (BLK) and the article (ART) "le" by "du"; add the features ART and CTC to the first form; and delete the second (BLK) and the third form ("le",ART) |
("a",VER)(BLK)("il",PPR):=( )("-t-",-BLK)( ); | a il > a-t-il | replace the blank space (BLK) between the verb (VER) "a" and the pronoun (PPR) "il" by "-t-"; remove the feature BLK from the second form; preserve the first and the third form without any change |
("de",PRE)(BLK)(VOW):=("d'")(%3); | de avoir > d'avoir | replace the preposition (PRE) "de" before a blank space (BLK) and a vowel (VOW) by "d'"; delete the second form (BLK); and preserve the third form (%03) without any change |
- Rules are conservative. No feature is deleted unless explicitly indicate through "-".
- The rule "("a",ART)(BLK)(VOW):=("an")( )( );" does not affect the status of the second and the third word forms, which continue to be BLK and VOW. On the other hand, the rule "("a",VER)(BLK)("il",PPR):=( )("-t-",-BLK)( );" alters the status of the second form by deleting the feature BLK.
- In the ACTION field, changes may be expressed by A-rules inside each form. The default is replacement.
- The rule "("a",ART)(BLK)(VOW):=("an")( )( );" could also be expressed as "("a",ART)(BLK)(VOW):=(0>"n")( )( );", i.e., the change from "a" to "an" could be expressed either by "an" or 0>"n".
- Rules apply only if all conditions are true.
- The rule "("a")(BLK)(VOW):=("an")( )( );" will apply only in case of "a" before a blank and a vowel.
- A-rules are recursive: rules will apply while conditions are true.
- The rule "(BLK):=("-");" will transform "a b c d e" into "a-b-c-d-e" (and not only in "a-b c d e").
- The symbol ^ is used for negation and may be used to prevent infinite loops
- (^".")(STAIL):=(".")(%02); (Add a period before the end of the sentence if there is not a period yet)
Common mistakes
"Mr":="Mister";- Conditions and actions must always come between parentheses: ("Mr"):=("Mister");
(Mr):=(Mister);- Constants must come between quotes (inside the parentheses): ("Mr"):=("Mister");
("Mr"):=("Mister")- Rules must end in semicolon: ("Mr"):=("Mister");
("I am"):=("I'm);- Each separate word form must be isolated between parentheses and described as a different condition: ("I")(BLK)("am"):=("I'm");
("a",ART)(BLK)(VOW):=("an");- "a adjective">"a": the blank and the following form are deleted because they are not present at the right side
("de",PRE)(BLK)(VOW):=("d'")(VOW);- "de avoir">"d' ": coindexation is based on ordering and not on features. The third form is deleted because it's not present at the right side; the second form, which is BLK, receives the feature VOW;
Formal syntax
Ph-rules comply with the following syntax:
<Ph-RULE> ::= ( "("<CONDITION>")" )+ ":=" ( "("<ACTION>")" )+ ";" <CONDITION> ::= """<STRING>""" ("," <TAGLIST> )* | "["<STRING>"]" ("," <TAGLIST> )* | <TAGLIST> <ACTION> ::= (<INDEX>)? ( <AFFIXATION> ("," <AFFIXATION>)* )* ( <ATT_CHANGE> ("," <ATT_CHANGE>)* )* <AFFIXATION> ::= <PREFIXATION> | <SUFFIXATION> | <INFIXATION> | <REPLACEMENT> <PREFIXATION> ::= """<STRING>""" {“<” | “<<”} (<DELETED>)? <SUFFIXATION> ::= (<DELETED>)? {“>” | “>>”} """<STRING>""" <INFIXATION> ::= "["<DELETED"]" ">" """<STRING>""" | """<STRING>""" "<" "["<DELETED"]" <REPLACEMENT> ::= ( """<STRING>""" ":" )? """<STRING>""" | "["<STRING>"]" | "[" <INTEGER> "-" <INTEGER> "]" ":" """<STRING>""" <DELETED> ::= """<STRING>""" | <INTEGER> <STRING> ::= [a..Z]+ <INTEGER> ::= [0..9]+ <ATT_CHANGE> ::= { "+" | "-" } <TAG> <TAGLIST> ::= <INDEX> | (<INDEX> ",")? <TAG> ("," <TAG>)* <INDEX> ::= "%"[01..99] <TAG> ::= {one of the tags defined in the UNDLF Tagset}
where
<a> = a is a non-terminal symbol
“a“ = a is a constant
a | b = a or b
{ a | b } = either a or b
(a)? = a can occur 0 or 1 time
(a)* = a can be repeated 0 or more times
(a)+ = a can be repeated 1 or more times