FRIDA

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(Instructions)
Line 37: Line 37:
  
 
== Instructions ==
 
== Instructions ==
 +
;Lexical Category
 +
:Whenever the lexical category for a given lemma is provided, check whether it is correct. If it is not correct, decline the entry and report the problem by clicking over the yellow triangle at the right of the main entry. If the lexical category is not provided, select the most likely category. Do not worry about homonyms: provide one single category for a given main entry.
 +
;Lemma
 +
:Do not change the lemma. If it is not correct (i.e., if it is misspelled or cannot be considered to be a lexical unit), decline the entry and report the problem by clicking over the yellow triangle at the right of the main entry.
 +
;Provide as many UW's as necessary to each lemma, but do not include very rare or unusual cases. And check the order: the most likely senses must appear first.
 +
;Base Form
 +
:You have to worry about the base form only in case of multiword expressions 1) whose inflections cannot be formed by simple affixation or 2) which are discontinuous. In these cases, provide the corresponding composition rules.
 +
;Inflection
 +
:Select AND TEST the inflectional paradigm that generates the inflections of the base form. Any errors here will be propagated to the dictionary, so be careful. And pay attention to the cases below:
 +
:*LOCALIZED IRREGULARITY: if the word is mostly regular and its irregularity is localized in some few and specific rules (more than one possible plural for nouns, or defective verbs that are not used in a given person, for instance, but follow the general rules for all the others), assign the word to the corresponding paradigm and list, in the box "inflectional rules", its irregularities;
 +
:*NON-EXISTING PARADIGM: if the word is regular or semi-regular (in the sense that there are several other words in the same case), and cannot be associated to any existing paradigm, press the button REQUEST A NEW PARADIGM and provide the corresponding details;
 +
:*IRREGULAR WORDS: if the word is irregular (i.e., it has a quite unusual and specific morphological behavior), choose the option IRREGULAR and provide the corresponding inflectional rules.
 +
;Subcategorization
 +
:Subcategorization is only required when the word REQUIRES a complement or a specifier (indirect transitive verbs that select an specific preposition, for instance). In this case, you have to inform the corresponding subcategorization frame. If the subcategorization frame is not available, press the button REQUEST A NEW SUBCATEGORIZATION FRAME and provide the corresponding details.

Revision as of 16:27, 5 February 2014

The project FRIDA (Français, Rumantsch, Italiano and Deutsch for Analysis) is devoted to the creation of NL-UNL (analysis) dictionaries for the official languages of Switzerland.

Goal

The project FRIDA has two main goals:

  1. To provide several word-to-concept monolingual databases (i.e., encoding or reader's dictionaries). These dictionaries are expected to be used in UNLization, i.e., in generating UNL graphs out of natural language documents, especially through IAN.
  2. To find concepts that are not enclosed in the WordNet3.0 and should be incorporated to the UNL Dictionary.

Repository

The whole FRIDA contains, for each language, 30,000 lemmas, and is divided into 6 different repositories according to the frequency of use of lemmas.

  • FRIDA-A1 contains the list of the 5,000 most frequent lemmas of the language (including articles, prepositions, conjunctions, auxiliary verbs, etc.);
  • FRIDA-A2 contains the list of the following 5,000 most frequent lemmas of the language (including articles, prepositions, conjunctions, auxiliary verbs, etc.);

And so on, up to FRIDA-C2, according to the table below.


Repository # of lemmas
FRIDA-A1 5,000
FRIDA-A2 5,000
FRIDA-B1 5,000
FRIDA-B2 5,000
FRIDA-C1 5,000
FRIDA-C2 5,000

Instructions

Lexical Category
Whenever the lexical category for a given lemma is provided, check whether it is correct. If it is not correct, decline the entry and report the problem by clicking over the yellow triangle at the right of the main entry. If the lexical category is not provided, select the most likely category. Do not worry about homonyms: provide one single category for a given main entry.
Lemma
Do not change the lemma. If it is not correct (i.e., if it is misspelled or cannot be considered to be a lexical unit), decline the entry and report the problem by clicking over the yellow triangle at the right of the main entry.
Provide as many UW's as necessary to each lemma, but do not include very rare or unusual cases. And check the order
the most likely senses must appear first.
Base Form
You have to worry about the base form only in case of multiword expressions 1) whose inflections cannot be formed by simple affixation or 2) which are discontinuous. In these cases, provide the corresponding composition rules.
Inflection
Select AND TEST the inflectional paradigm that generates the inflections of the base form. Any errors here will be propagated to the dictionary, so be careful. And pay attention to the cases below:
  • LOCALIZED IRREGULARITY: if the word is mostly regular and its irregularity is localized in some few and specific rules (more than one possible plural for nouns, or defective verbs that are not used in a given person, for instance, but follow the general rules for all the others), assign the word to the corresponding paradigm and list, in the box "inflectional rules", its irregularities;
  • NON-EXISTING PARADIGM: if the word is regular or semi-regular (in the sense that there are several other words in the same case), and cannot be associated to any existing paradigm, press the button REQUEST A NEW PARADIGM and provide the corresponding details;
  • IRREGULAR WORDS: if the word is irregular (i.e., it has a quite unusual and specific morphological behavior), choose the option IRREGULAR and provide the corresponding inflectional rules.
Subcategorization
Subcategorization is only required when the word REQUIRES a complement or a specifier (indirect transitive verbs that select an specific preposition, for instance). In this case, you have to inform the corresponding subcategorization frame. If the subcategorization frame is not available, press the button REQUEST A NEW SUBCATEGORIZATION FRAME and provide the corresponding details.
Software