Base Form
(→LRUs and BFs) |
Logosfabula (Talk | contribs) m (→Examples) |
||
(42 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
− | + | '''Base Form''', or simply BF, is the form used to generate all variants of a given [[morphology|lexeme]]. | |
− | + | The [[lemma]] is not always the most adequate form used to generate the inflections of a given lexeme. Consider, for instance, the case of “take into account”, which is actually a discontinuous item, since we can have any noun phrase between “take” and “into account”: “take <u>that</u> into account”, “take <u>it</u> into account”, “take <u>the decision of proliferating dictionary fields</u> into account”, etc. In order to be prepared to process all those possibilities, we have to create a different lexical entity: the '''base form'''. In the case of “take into account”, the base form will be “take”. From the base form, we will be able not only to associate the lexeme to an existing inflectional paradigm (“take”) but also to treat discontinuity and order issues through simple and deterministic generation rules. | |
− | + | == How to create a BF == | |
− | + | The BF is the same as the lemma, except in case of multi-word expressions that involve discontinuity or infixation, i.e., where variations cannot be generated by simple prefixation and/or suffixation rules. In these cases, the BF will correspond to the lemma of the '''longest common denominator''' between all the possible variations of the LRU. | |
− | + | ||
− | + | === Examples === | |
− | + | *house (simple word): BF=lemma="house"<br> | |
− | + | *mouse (simple word with infixation: "mouse">"mice"): BF=lemma="mouse"<br> | |
− | + | *coffee house (multi-word expression without infixation: "coffee house">"coffee houses"): BF=lemma="coffee house"<br> | |
− | == | + | *give in (multi-word expression with infixation: "give in">"gave in"): BF="give" <code>≠</code> lemma="give in"<br> |
− | + | *behind one's back (discontinuous multi-word expression without infixation: "behind my back", "behind his back", etc): BF="behind" <code>≠</code> lemma="behind <person>'s back"<br> | |
− | * | + | *take into account (discontinuous multi-word LRU with infixation: "take it into account", "took that into account"): BF="take" <code>≠</code> lemma="take into account" |
− | * | + | |
− | * | + | |
− | + | ||
− | + | ||
== The use of BF == | == The use of BF == | ||
− | The use of BFs is derived from a practical limitation rather than from a logical necessity. In order to be efficient and to avoid overcharging the system, generation rules have to be as general and few as possible, | + | The use of BFs is derived from a practical limitation rather than from a logical necessity. In order to be efficient and to avoid overcharging the system, generation rules have to be as general and few as possible, what limits considerably the possibility of creating infixation rules. The alternative is to reduce infixable compounds and multi-word expressions to the longest common denominator (i.e., to “hyper-regularise” them) in order to treat infixation as a special case of prefixation or suffixation. |
− | In English, the use of BF is limited to | + | In English, the use of BF is limited to phrasal verbs (such as "give in" and "bring <thing> back”), verbal phrases ("play with fire") and other discontinuous expressions (such as "behind <person>'s back"). The need of BFs is more noteworthy in highly-inflective languages where compounds and complex multi-words may be reordered or infixed. Consider, for instance, the case of “lingua” (= “language”), in Latin. As a case-inflectional language, Latin normally has 12 different forms for each noun: |
{| border="1" align="center" cellpadding="5" | {| border="1" align="center" cellpadding="5" | ||
!case | !case | ||
Line 53: | Line 48: | ||
|lingu'''is''' | |lingu'''is''' | ||
|} | |} | ||
− | For single | + | For single words, as “lingua”, the process of case-inflection is relatively simple, because it is extremely regular and will always correspond to a suffix. In multi-word expressions, however, the process can be quite more complicated, because of infixation and agreement. For “lingua franca”, for instance, we will have again 12 different forms, but generating them is no longer as simple as adding suffixes to the right of the string. |
{| border="1" align="center" cellpadding="5" | {| border="1" align="center" cellpadding="5" | ||
!case | !case | ||
Line 83: | Line 78: | ||
|lingu'''is''' franc'''is''' | |lingu'''is''' franc'''is''' | ||
|} | |} | ||
− | In order to avoid listing all variations of “lingua franca” | + | In order to avoid listing all variations of “lingua franca” or creating a very specific rule which would apply only in this case, we reduce “lingua franca” to “lingua” and create a special rule for generating “franca” later on. The lemma will be then “lingua franca”, but the BF will be only “lingua”. |
== Examples == | == Examples == | ||
− | {| border=1 align=center cellpadding=5 | + | {| border="1" align="center" cellpadding="5" |
− | ! | + | !Word forms |
− | ! | + | !Lemma |
!Base Form (BF) | !Base Form (BF) | ||
|- | |- | ||
|apple, apples | |apple, apples | ||
|apple | |apple | ||
− | | | + | |apple |
|- | |- | ||
|city, cities | |city, cities | ||
|city | |city | ||
− | | | + | |city |
+ | |- | ||
+ | |glasses | ||
+ | |glasses | ||
+ | |glasses | ||
|- | |- | ||
|rosa, rosae, rosam, rosas, rosarum, rosis | |rosa, rosae, rosam, rosas, rosarum, rosis | ||
|rosa | |rosa | ||
− | | | + | |rosa |
|- | |- | ||
|beautiful | |beautiful | ||
|beautiful | |beautiful | ||
− | | | + | |beautiful |
|- | |- | ||
|hermoso, hermosa, hermosos, hermosas | |hermoso, hermosa, hermosos, hermosas | ||
|hermoso | |hermoso | ||
− | | | + | |hermoso |
|- | |- | ||
|sum, es, est, sumus, estis, sunt, eram, fui… | |sum, es, est, sumus, estis, sunt, eram, fui… | ||
|esse | |esse | ||
− | | | + | |esse |
|- | |- | ||
|part of speech, parts of speech | |part of speech, parts of speech | ||
|part of speech | |part of speech | ||
− | | | + | |part |
|- | |- | ||
|skinhead, skinheads | |skinhead, skinheads | ||
|skinhead | |skinhead | ||
− | | | + | |skinhead |
|- | |- | ||
|give in, gives in, gave in, given in, … | |give in, gives in, gave in, given in, … | ||
|give in | |give in | ||
− | | | + | |give |
|- | |- | ||
− | |pars orationis, partes | + | |pars orationis, partes orationis, partem orationis, partis orationis, … |
+ | |pars orationis | ||
|pars | |pars | ||
|- | |- | ||
− | |bring [sth] back, brings [sth] | + | |bring [sth] back, brings [sth] back, bringing [sth] back, brought [sth] back, ... |
+ | |bring back | ||
|bring | |bring | ||
+ | |- | ||
+ | |play with fire, plays with fire, playing with fire, ... | ||
+ | |play with fire | ||
+ | |play | ||
|} | |} |
Latest revision as of 22:30, 11 March 2014
Base Form, or simply BF, is the form used to generate all variants of a given lexeme.
The lemma is not always the most adequate form used to generate the inflections of a given lexeme. Consider, for instance, the case of “take into account”, which is actually a discontinuous item, since we can have any noun phrase between “take” and “into account”: “take that into account”, “take it into account”, “take the decision of proliferating dictionary fields into account”, etc. In order to be prepared to process all those possibilities, we have to create a different lexical entity: the base form. In the case of “take into account”, the base form will be “take”. From the base form, we will be able not only to associate the lexeme to an existing inflectional paradigm (“take”) but also to treat discontinuity and order issues through simple and deterministic generation rules.
Contents |
How to create a BF
The BF is the same as the lemma, except in case of multi-word expressions that involve discontinuity or infixation, i.e., where variations cannot be generated by simple prefixation and/or suffixation rules. In these cases, the BF will correspond to the lemma of the longest common denominator between all the possible variations of the LRU.
Examples
- house (simple word): BF=lemma="house"
- mouse (simple word with infixation: "mouse">"mice"): BF=lemma="mouse"
- coffee house (multi-word expression without infixation: "coffee house">"coffee houses"): BF=lemma="coffee house"
- give in (multi-word expression with infixation: "give in">"gave in"): BF="give"
≠
lemma="give in"
- behind one's back (discontinuous multi-word expression without infixation: "behind my back", "behind his back", etc): BF="behind"
≠
lemma="behind <person>'s back"
- take into account (discontinuous multi-word LRU with infixation: "take it into account", "took that into account"): BF="take"
≠
lemma="take into account"
The use of BF
The use of BFs is derived from a practical limitation rather than from a logical necessity. In order to be efficient and to avoid overcharging the system, generation rules have to be as general and few as possible, what limits considerably the possibility of creating infixation rules. The alternative is to reduce infixable compounds and multi-word expressions to the longest common denominator (i.e., to “hyper-regularise” them) in order to treat infixation as a special case of prefixation or suffixation.
In English, the use of BF is limited to phrasal verbs (such as "give in" and "bring <thing> back”), verbal phrases ("play with fire") and other discontinuous expressions (such as "behind <person>'s back"). The need of BFs is more noteworthy in highly-inflective languages where compounds and complex multi-words may be reordered or infixed. Consider, for instance, the case of “lingua” (= “language”), in Latin. As a case-inflectional language, Latin normally has 12 different forms for each noun:
case | singular | plural |
---|---|---|
nominative | lingua | linguae |
vocative | lingua | linguae |
accusative | linguam | linguas |
genitive | linguae | linguarum |
dative | linguae | linguis |
ablative | lingua | linguis |
For single words, as “lingua”, the process of case-inflection is relatively simple, because it is extremely regular and will always correspond to a suffix. In multi-word expressions, however, the process can be quite more complicated, because of infixation and agreement. For “lingua franca”, for instance, we will have again 12 different forms, but generating them is no longer as simple as adding suffixes to the right of the string.
case | singular | plural |
---|---|---|
nominative | lingua franca | linguae francae |
vocative | lingua franca | linguae francae |
accusative | linguam francam | linguas francas |
genitive | linguae francae | linguarum francarum |
dative | linguae francae | linguis francis |
ablative | lingua franca | linguis francis |
In order to avoid listing all variations of “lingua franca” or creating a very specific rule which would apply only in this case, we reduce “lingua franca” to “lingua” and create a special rule for generating “franca” later on. The lemma will be then “lingua franca”, but the BF will be only “lingua”.
Examples
Word forms | Lemma | Base Form (BF) |
---|---|---|
apple, apples | apple | apple |
city, cities | city | city |
glasses | glasses | glasses |
rosa, rosae, rosam, rosas, rosarum, rosis | rosa | rosa |
beautiful | beautiful | beautiful |
hermoso, hermosa, hermosos, hermosas | hermoso | hermoso |
sum, es, est, sumus, estis, sunt, eram, fui… | esse | esse |
part of speech, parts of speech | part of speech | part |
skinhead, skinheads | skinhead | skinhead |
give in, gives in, gave in, given in, … | give in | give |
pars orationis, partes orationis, partem orationis, partis orationis, … | pars orationis | pars |
bring [sth] back, brings [sth] back, bringing [sth] back, brought [sth] back, ... | bring back | bring |
play with fire, plays with fire, playing with fire, ... | play with fire | play |