C-rule
From UNL Wiki
(Difference between revisions)
(→Observations) |
|||
Line 1: | Line 1: | ||
'''Compounding''' or '''composition''' is the word-formation process of creating compounds by combining or putting together lexemes. | '''Compounding''' or '''composition''' is the word-formation process of creating compounds by combining or putting together lexemes. | ||
− | == | + | == Syntax == |
In the UNL<sup>arium</sup> framework, compounds are treated as ordinary simple words except in case of discontinuous multi-word expressions or with infixation (such as "give in" or "take into account"). In these cases, the [[lemma]] is different from the [[base form]], and the compound-formation process is expected to be defined through [[S-rule]]s such as the following: | In the UNL<sup>arium</sup> framework, compounds are treated as ordinary simple words except in case of discontinuous multi-word expressions or with infixation (such as "give in" or "take into account"). In these cases, the [[lemma]] is different from the [[base form]], and the compound-formation process is expected to be defined through [[S-rule]]s such as the following: | ||
Line 10: | Line 10: | ||
<ADDED> is the term to be added to the base form to form the compound. It can be a string between "quotes" or a lemma between [brackets].<br /> | <ADDED> is the term to be added to the base form to form the compound. It can be a string between "quotes" or a lemma between [brackets].<br /> | ||
− | + | == Examples == | |
{|border=1 cellpadding=2 align=center | {|border=1 cellpadding=2 align=center | ||
!Lemma | !Lemma |
Revision as of 10:24, 26 March 2010
Compounding or composition is the word-formation process of creating compounds by combining or putting together lexemes.
Syntax
In the UNLarium framework, compounds are treated as ordinary simple words except in case of discontinuous multi-word expressions or with infixation (such as "give in" or "take into account"). In these cases, the lemma is different from the base form, and the compound-formation process is expected to be defined through S-rules such as the following:
+<SYNTACTIC ROLE>(<ADDED>);
Where:
<SYNTACTIC ROLE> is the syntactic role (VA, VC, VS, VH, etc) of the term to be added to the base form; and
<ADDED> is the term to be added to the base form to form the compound. It can be a string between "quotes" or a lemma between [brackets].
Examples
Lemma | Base Form | Compound | Description |
---|---|---|---|
give in | give | +VH([in]) | the lemma "in" is to be added to the base form as part of the head of the verb (VH) |
take into account | take | +VA("into account") | the string "into account" is to be added to the base form as an adjunct to the verb (VA) |
throw <person> to the lions | throw | +VA("to the lions") | the string "to the lions" is to be added to the base form as an adjunct to the verb (VA) |
Observations
- Phrasal verbs
- Particles of phrasal verbs must be represented as part of the head, if non separable, or as adjuncts, if separable:
- give in = +VH("in"); ("give in something" but
"give something in") - give back = +VA("back"); ("give back something" or "give something back")
- give in = +VH("in"); ("give in something" but
- General syntactic roles (NP, PP, XP) must not be defined in composition rules but inside the subcategorization frame
-
- throw <person> to the lions = +VA("to the lions"); (and not "+VA("to the lions")VC(NP);". The lemma should be associated to the transitive frame instead)
- "Quotes" or [brackets]?
- In the compound-formation process, the UNLarium distinguishes between strings (to be represented between "") and lemmas (to be represented between [ ]). The difference between strings and lemmas has to do with the dictionary status. Lemmas, but not strings, are expected to be defined as dictionary entries:
- +VA("into account"); (add the string "into account" as a verbal adjunct: take > take into account)
- +VC([love]); (add the lemma "love" as a verbal complement: make > make love)
- In the above, it's unlikely to have "into account" as a single entry, whereas "love" is probably already there. This is important because terms of compounds may be modified ("make much love", for instance).
- Complex compounds
- Compounds must include as many terms as different syntactic roles. One single "+" must be provided at the beginning of the rule:
- give up the gost = +VH([up])VC("the ghost"); (
+VH("up the ghost")or+VC("up the ghost"))
- give up the gost = +VH([up])VC("the ghost"); (
- Order is to be represented by the distribution features (">", ">>", "<", "<<", ...), if not default
-
- +VC([love]); (order must not be informed, because in English complements come at the right side by default: make > make love)
- +NS([the]); (order must not be informed, because in English specifiers come at the left side, by default: Netherlands > the Netherlands)
- NA(>>,"available"); (order must be informed, because in English nominal adjuncts come at the left side, by default: table > new table)
- Adjacency is to be represented by the adjacency features (AJ0,AJ1,AJ2,...), if not default
-
- +VC([love]); (adjacency must not be informed, because in English complements come after the head, by default: make > make love)
- +VH([up])VC("the ghost"); (adjacency must not be informed, because in English head particles come before complements, by default: give > give up the ghost)
- +VA([home],AJ1)VC("the bacon",AJ2); (adjacency must be informed because in English the complement is normally generated before the adjunct: bring the bacon home)