Multiword expression
(→How to treat multiword expressions in the UNLarium) |
(→How to treat multiword expressions in the UNLarium) |
||
Line 11: | Line 11: | ||
:The base form is the same as the lemma, except in case of multiword expressions that involve '''discontinuity''' or '''infixation''', i.e., where variations cannot be generated by simple prefixation and/or suffixation rules. In these cases, the base form will correspond to the lemma of the longest common denominator between all the possible variations of the word. The base form must necessarily belong to the same category of the lemma. | :The base form is the same as the lemma, except in case of multiword expressions that involve '''discontinuity''' or '''infixation''', i.e., where variations cannot be generated by simple prefixation and/or suffixation rules. In these cases, the base form will correspond to the lemma of the longest common denominator between all the possible variations of the word. The base form must necessarily belong to the same category of the lemma. | ||
:For instance: | :For instance: | ||
− | *coffee house (continuous multiword expression without infixation: "coffee house">"coffee house'''s'''"): BF=lemma="coffee house" | + | :*coffee house (continuous multiword expression without infixation: "coffee house">"coffee house'''s'''"): BF=lemma="coffee house" |
− | *give in (continuous multiword expression with infixation: "give in">"g'''a'''ve in"): BF="give" AND lemma="give in" | + | :*give in (continuous multiword expression with infixation: "give in">"g'''a'''ve in"): BF="give" AND lemma="give in" |
− | *behind one's back (discontinuous multiword expression without infixation: "behind my back", "behind his back", etc.): BF="behind" AND lemma="behind <person>'s back" | + | :*behind one's back (discontinuous multiword expression without infixation: "behind my back", "behind his back", etc.): BF="behind" AND lemma="behind <person>'s back" |
− | *take into account (discontinuous multiword expression with infixation: "take it into account", "took that into account"): BF="take" AND lemma="take into account" | + | :*take into account (discontinuous multiword expression with infixation: "take it into account", "took that into account"): BF="take" AND lemma="take into account" |
;Composition rules | ;Composition rules | ||
:Composition rules are rules that are applied over the base form to generate the lemma. They are used only when the lemma is different from the base form. | :Composition rules are rules that are applied over the base form to generate the lemma. They are used only when the lemma is different from the base form. | ||
:For instance: | :For instance: | ||
− | *coffee house: lemma = base form, composition rule = NULL | + | :*coffee house: lemma = base form, composition rule = NULL |
− | *give in: lemma (= give in) <code>≠</code> base form (= give), composition rule = VH([in]); (i.e., lemma = base form + "in") | + | :*give in: lemma (= give in) <code>≠</code> base form (= give), composition rule = VH([in]); (i.e., lemma = base form + "in") |
− | *take into account: lemma <code>≠</code> base form, composition rule = VA("into account"); (i.e, lemma = base form + "into account") | + | :*take into account: lemma <code>≠</code> base form, composition rule = VA("into account"); (i.e, lemma = base form + "into account") |
− | *behind one's back: lemma <code>≠</code> base form, composition rules = PA([back]); (i.e., lemma = base form + "back") | + | :*behind one's back: lemma <code>≠</code> base form, composition rules = PA([back]); (i.e., lemma = base form + "back") |
:The composition rules are further described in [[composition]] | :The composition rules are further described in [[composition]] | ||
Line 28: | Line 28: | ||
:The inflectional paradigm and the inflectional rules apply over the base form (and not over the lemma). | :The inflectional paradigm and the inflectional rules apply over the base form (and not over the lemma). | ||
:For instance: | :For instance: | ||
− | *coffee house: base form = "coffee house", paradigm = M2 (regular nouns that make the plural in -s); | + | :*coffee house: base form = "coffee house", paradigm = M2 (regular nouns that make the plural in -s); |
− | *give in: base form = "give", paradigm = M1 (irregular), inflectional rules = (PAS:="gave";PTP:="given";); | + | :*give in: base form = "give", paradigm = M1 (irregular), inflectional rules = (PAS:="gave";PTP:="given";); |
− | *take into account: base form = "take", paradigm = M1 (irregular), inflectional rules = (PAS:="took";PTP:="taken";); | + | :*take into account: base form = "take", paradigm = M1 (irregular), inflectional rules = (PAS:="took";PTP:="taken";); |
− | *behind <person>'s back: base form = "behind", paradigm = M0 (invariant) | + | :*behind <person>'s back: base form = "behind", paradigm = M0 (invariant) |
;Subcategorization frame | ;Subcategorization frame | ||
:The subcategorziation frame refers to the the base form (and not over the lemma). | :The subcategorziation frame refers to the the base form (and not over the lemma). | ||
:For instance: | :For instance: | ||
− | *coffee house: base form = "coffee house", frame = Y0 (avalent); | + | :*coffee house: base form = "coffee house", frame = Y0 (avalent); |
− | *give in: base form = "give", frame = Y38 (Somebody ----s something); | + | :*give in: base form = "give", frame = Y38 (Somebody ----s something); |
− | *take into account: base form = "take", frame = Y51 (Somebody ----s something PP); | + | :*take into account: base form = "take", frame = Y51 (Somebody ----s something PP); |
− | *behind <person>'s back: base form = "behind", frame = Y1 (irregular), subcategorization rules: PC(NA([back];ANM,GNT)); | + | :*behind <person>'s back: base form = "behind", frame = Y1 (irregular), subcategorization rules: PC(NA([back];ANM,GNT)); |
Revision as of 16:55, 21 June 2011
Multiword Expressions (MTW) are lexical structures made up of a sequence of two or more lexemes. They can be concatenated ("darkroom", "skinhead") or isolated by hyphens ("blue-green", "African-American") or blank spaces ("round table", "part of speech"). Multiword expressions can be continuous ("get over") or discontinuous ("get <something> together"). They correspond to compounds ("fireman", "hardware"), phrases ("in spite of", "take into account"), idioms ("kick the bucket", "play cat and mouse"), fragments of sentences ("and so on", "whatever the case") or sentences ("Every evil is followed by some good", "No flies enter a mouth that is shut"). Multiword expressions may also include acronyms (such as "UNESCO"), multiple-word contractions (such as "don't") and blends (such as "sitcom") that are still analysable (differently from "radar" and "motel", which are represented as simple words). Classical compounds ("agriculture", "photograph") and their derivations ("agricultural", "photographically") are treated as simple words if they do not include more than one free morpheme. Phrasal verbs ("give in", "come across") are treated as multiword expressions.
How to treat multiword expressions in the UNLarium
- Lemma
- The lemma of a continuous multiword expression is the multiword expression itself ("part of speech");
- The lemma of a discontinuous multiword expression must include the obligatory variables ("behind <person>'s back");
- The lemma of a continuous/discontinuous multiword expression is the multiword expression itself ("take into account", "bring back").
- Base form
- The base form is the same as the lemma, except in case of multiword expressions that involve discontinuity or infixation, i.e., where variations cannot be generated by simple prefixation and/or suffixation rules. In these cases, the base form will correspond to the lemma of the longest common denominator between all the possible variations of the word. The base form must necessarily belong to the same category of the lemma.
- For instance:
- coffee house (continuous multiword expression without infixation: "coffee house">"coffee houses"): BF=lemma="coffee house"
- give in (continuous multiword expression with infixation: "give in">"gave in"): BF="give" AND lemma="give in"
- behind one's back (discontinuous multiword expression without infixation: "behind my back", "behind his back", etc.): BF="behind" AND lemma="behind <person>'s back"
- take into account (discontinuous multiword expression with infixation: "take it into account", "took that into account"): BF="take" AND lemma="take into account"
- Composition rules
- Composition rules are rules that are applied over the base form to generate the lemma. They are used only when the lemma is different from the base form.
- For instance:
- coffee house: lemma = base form, composition rule = NULL
- give in: lemma (= give in)
≠
base form (= give), composition rule = VH([in]); (i.e., lemma = base form + "in") - take into account: lemma
≠
base form, composition rule = VA("into account"); (i.e, lemma = base form + "into account") - behind one's back: lemma
≠
base form, composition rules = PA([back]); (i.e., lemma = base form + "back")
- The composition rules are further described in composition
- Inflectional paradigm
- The inflectional paradigm and the inflectional rules apply over the base form (and not over the lemma).
- For instance:
- coffee house: base form = "coffee house", paradigm = M2 (regular nouns that make the plural in -s);
- give in: base form = "give", paradigm = M1 (irregular), inflectional rules = (PAS:="gave";PTP:="given";);
- take into account: base form = "take", paradigm = M1 (irregular), inflectional rules = (PAS:="took";PTP:="taken";);
- behind <person>'s back: base form = "behind", paradigm = M0 (invariant)
- Subcategorization frame
- The subcategorziation frame refers to the the base form (and not over the lemma).
- For instance:
- coffee house: base form = "coffee house", frame = Y0 (avalent);
- give in: base form = "give", frame = Y38 (Somebody ----s something);
- take into account: base form = "take", frame = Y51 (Somebody ----s something PP);
- behind <person>'s back: base form = "behind", frame = Y1 (irregular), subcategorization rules: PC(NA([back];ANM,GNT));