Base Form

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(How to create a BF)
(Examples)
Line 129: Line 129:
 
|-
 
|-
 
|pars orationis, partes orationes, partem orationis, partis orationis, …
 
|pars orationis, partes orationes, partem orationis, partis orationis, …
 +
|pars orationis
 
|pars
 
|pars
 
|-
 
|-
 
|bring [sth] back, brings [sth] bak, bringing [sth] back, brought [sth] back
 
|bring [sth] back, brings [sth] bak, bringing [sth] back, brought [sth] back
 +
|bring back
 
|bring
 
|bring
 
|}
 
|}

Revision as of 17:54, 7 January 2010

In the UNLarium framework, base form (BF) is the form used to generate all variants (inflections) of a given LRU.

Contents

LRUs and BFs

A single concept may have several different realisations in a given language. These variations are of two types:

  • Internal variations, i.e., related to the same lexical realisation unit, such as in “to die”, “die”, “dies”, “dying”, “died”, etc, which are represented by the same lemma “die” as a single LRU; and
  • External variations, such as “die”, “decease”, “pass away”, “perish”, etc, which are represented by different lemmas and different LRUs in the UNLarium.

As the LRU is the basic unit of the UNLarium, each external variation will correspond to a different entry, but internal variations will be represented inside the same entry and will be generated automatically through inflectional and/or subcategorization rules. In many cases, however, the LRU, which is actually a lemma, is not the most adequate form to guide the process of generating the internal variations. In such cases, we will need a “base form”, i.e., a lexical realisation that is more suitable for automatic processing.

Consider, for instance, the case of the LRU “take into account”, which is actually a discontinuous item, since we can have any noun phrase between “take” and “into account”: “take that into account”, “take it into account”, “take the decision of proliferating dictionary fields into account”, etc. In order to be prepared to process all those possibilities, we have to create a different lexical entity, which will be exactly the base form. In the case of “take into account”, the base form will be “take”. From the base form, we will be able not only to associate the LRU to an existing inflectional paradigm (“take”) but also to treat discontinuity and order issues through simple and deterministic generation rules.

How to create a BF

There are four main rules for creating BFs.

  • A BF should be created only in case of compound or complex LRUs, i.e., LRUs that contain more than one word, whether concatenated or separated by hyphen or spaces.
  • A BF should be created if and only if it is indispensable to the generation of the internal variations of a given LRU, i.e., if the variations cannot be generated by simple prefixation and/or suffixation rules.
  • A BF should preserve the lexical category of the corresponding LRU. If the LRU is a verb, so will be the BF.
  • A BF should correspond to the lemma of the longest common denominator between all the possible variations of the corresponding LRU.

In all other cases, the base form will be equal to the LRU, and does not need to be informed.

The use of BF

The use of BFs is derived from a practical limitation rather than from a logical necessity. In order to be efficient and to avoid overcharging the system, generation rules have to be as general and few as possible, and that limits considerably the possibility of creating infixation rules. The alternative is to reduce infixable compounds and complex LRUs to the highest common denominator (i.e., to “hyper-regularise” them) in order to treat infixation as a special case of prefixation or suffixation.

In English, the use of BF is limited to separable phrasal verbs (such as “bring (sth) back” or “look (sth) up”). The need of BFs is more noteworthy in highly-inflective languages where compounds and complex LRUs may be reordered or infixed. Consider, for instance, the case of the simple LRU “lingua” (= “language”), in Latin. As a case-inflectional language, Latin normally has 12 different forms for each noun:

case singular plural
nominative lingua linguae
vocative lingua linguae
accusative linguam linguas
genitive linguae linguarum
dative linguae linguis
ablative lingua linguis

For single-word LRUs, as “lingua”, the process of case-inflection is relatively simple, because it is extremely regular and will always correspond to a suffix. In complex LRUs, however, the process can be quite more complicated, because of infixation and agreement. For “lingua franca”, for instance, we will have again 12 different forms, but generating them is no longer as simple as adding suffixes to the right of the LRU.

case singular plural
nominative lingua franca linguae francae
vocative lingua franca linguae francae
accusative linguam francam linguas francas
genitive linguae francae linguarum francarum
dative linguae francae linguis francis
ablative lingua franca linguis francis

In order to avoid listing all variations of “lingua franca” inside the UNLarium or creating a very specific rule which would apply only in this case, we reduce “lingua franca” to “lingua” and create a special (subcategorization) rule for generating “franca” later on. The LRU will be then “lingua franca”, but the BF will be only “lingua”.

Examples

Lexical Realisations Lexical Realisation Unit (LRU) Base Form (BF)
apple, apples apple NONE (=LRU)
city, cities city NONE (=LRU)
rosa, rosae, rosam, rosas, rosarum, rosis rosa NONE (=LRU)
beautiful beautiful NONE (=LRU)
hermoso, hermosa, hermosos, hermosas hermoso NONE (=LRU)
sum, es, est, sumus, estis, sunt, eram, fui… esse NONE (=LRU)
part of speech, parts of speech part of speech NONE (=LRU)
skinhead, skinheads skinhead NONE (=LRU)
give in, gives in, gave in, given in, … give in NONE (=LRU)
pars orationis, partes orationes, partem orationis, partis orationis, … pars orationis pars
bring [sth] back, brings [sth] bak, bringing [sth] back, brought [sth] back bring back bring
Software