Grammar
From UNL Wiki
Grammar is the set of logical and structural rules that govern the composition of sentences, phrases and words in any given natural language. In the UNLarium framework, we distinguish three different types of rules according to the scope of their action:
- Ph-rules (phonetic rules) apply over a form to provide alternations and sound changes (such as as assimilation and elision);
- A-rules (affixation rules) apply over a form to generate its possible inflections by prefixation, infixation or suffixation; and
- S-rules (syntactic rules) apply over a form to project or modify its syntactic structure by specification, complementation or adjunction.
Examples:
Type | Rule | Description | Example |
---|---|---|---|
Ph-rule | "a",>>VOW:="a">"an"; | In case of "a" and before a blank space and a vowel, replace "a" by "an" | a adjective > an adjective |
A-rule | PLR:=0>"s"; | In case of plural (PLR), add "s" to the end of the word | table > tables, boy > boys |
S-rule | MTW:=VA("into account"); | In order to form the multiword expression, add "into account" as an adjunct to the verb (VA). | take > take into account |
Syntax
Rules are always composed of two fields: a condition and an action, which are separated by ":=". Rules must always be ended by a ";".
CONDITION := ACTION;
The CONDITION and the ACTION fields may be expressed as follows:
- by a constant, between "quotes"; (such as "a", "s", "into account", above);
- by a lemma, between [brackets]; (such as [be], [have]);
- by a feature, extracted from the UNDL Foundation tagset; (such as "VOW", "PLR", "MTW", "VA" above)
The CONDITION and the ACTION fields may be either simple or complex. Complex actions and conditions must be separated by ",".
SINGLE CONDITION := SINGLE ACTION; CONDITION#1,CONDITION#2, ..., CONDITION#n := SINGLE ACTION; CONDITION := ACTION#1, ACTION#2, ..., ACTION#n; CONDITION#1,CONDITION#2, ..., CONDITION#n := ACTION#1, ACTION#2, ..., ACTION#n;
Special symbols are used in the CONDITION and the ACTION field:
- > indicates suffixation
- >> indicates suffixation after a blank space
- < indicates prefixation
- << indicates prefixation before a blank space
- : indicates replacement
For further information on other symbols, see Ph-rules, A-rules or S-rules.
When to use Ph-rules, A-rules or S-rules
- Ph-rules must be used when the transformations affect isolated characters or isolated strings of characters. The transformations are rather at the surface level and do not affect the structure of the word or of the phrase.
- A-rules must be used when the transformations generate inflections of the base form. They should be used only when the transformations may be expressed by prefixation, infixation or suffixation. In any case, the transformation must affect only the structure of the word, but the structure of the phrase is to be preserved. In that sense, a-rules must never be used when a new word is introduced in the syntactic structure (as in the formation of compounds).
- S-rules must be used when the transformations affect the structure of the phrase, as in the generation of compounds (including compound tenses and periphrastic constructions). They are also used to describe syntactic behaviour such as word order, agreement and government.
Observations
- Ph-rules and A-rules do not generate new words but only modify the existing ones.
- The A-rule "FUT:="will"<<0;" (i.e, generate "will" as a prefix to the base form in case of future) will transform "love" into "will love", which will be considered, however, as a single word and not as a compound. Notice that this is the reason why compound tenses must never be generated through A-rules; otherwise, the negative form "will not love", in which "not" should be generated between the auxiliary verb and the main verb, would never be possible.
- S-rules are not necessarily constrained about the word order.
- The S-rule "MTW:=VA("into account");" just indicates that the multi-word expression requires the generation of the string "into account" as an adjunct to the base form. It does not indicate whether the adjunct should be generated at the left or at the right side of the base form. In order to explicitly indicate that, the symbols ">", ">>", "<", "<<" may be used: "MTW:=VA(>>,"into account")". When the information on word order is absent, the system follows the default rule, which is defined in the grammar. As in English the default for verb adjuncts (VA) is ">>", there is no need to indicate the order in adjuncts that are generated to the right after a blank space.