Projects

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(MIR)
Line 1: Line 1:
 
== MIR ==
 
== MIR ==
The project [[MIR]] (Multilingual Infra-stRucture
+
The project [[MIR]] (Multilingual Infra-stRucture) aims at creating a general-purpose multilingual lexicon to be used in natural language processing. MIR is a centralized repository of lexical data based on the UNL Core Dictionary 1.0, which has been extracted out of the WordNet3.0. It contains 117,659 entries representing different sets of synonyms (or synsets) of the English language, which have been associated to lexical items of several different languages, as in many wordnet-based initiatives. Differently from other wordnets, however, the MIR  intends to provide a concept-to-word database (i.e., a semasiological, decoding or writer's dictionary) instead of a word-to-concept lexicon (onomosialogical, encoding, reader's dictionary).
  
 
== IGLU ==
 
== IGLU ==

Revision as of 19:46, 13 December 2010

Contents

MIR

The project MIR (Multilingual Infra-stRucture) aims at creating a general-purpose multilingual lexicon to be used in natural language processing. MIR is a centralized repository of lexical data based on the UNL Core Dictionary 1.0, which has been extracted out of the WordNet3.0. It contains 117,659 entries representing different sets of synonyms (or synsets) of the English language, which have been associated to lexical items of several different languages, as in many wordnet-based initiatives. Differently from other wordnets, however, the MIR intends to provide a concept-to-word database (i.e., a semasiological, decoding or writer's dictionary) instead of a word-to-concept lexicon (onomosialogical, encoding, reader's dictionary).

IGLU

The project IGLU intends to map WordNet glosses from English into UNL. The project is divided into two main phases: the first one (iGLU#1) addresses a subset of 27,255 synsets and is supposed to be carried out in a predominantly human basis; the second one (iGLU#2) focuses on the remaining 90,404 synsets and it is expected to be mainly automatic. In iGLU#1, linguists are supposed to UNL-ize WordNet definitions through the UNL Editor, a graph-based UNL authoring tool available at the UNLdev. Decisions are stored in a UNL-ization memory, which comprises mappings between lexical items of English and Universal Words. Information on attributes and relations are also encoded. These data will be used in the second phase, when the UNL-ization process is expected to be performed by IAN - the UNDL Foundation Interactive ANalyzer -, under development. IAN requires much less human intervention than the UNL Editor, and it is a first step towards a fully-automatic natural language analysis system. Results of the project iGLU are expected to be used not only in compiling the UNL-ization memory, but also in populating the UNL Knowledge Base, which is an essential part of the architecture of the UNL system. It will improve the quality of word sense disambiguation and enhance the capability of information retrieval and extraction through UNL.

LE PETIT PRINCE

The project Le Petit Prince (or LPP) aims at UNLizing the integral text of Le Petit Prince, a French novel published by Antoine de Saint-Exupéry in 1943. The main goal is to set standards and guidelines for human UNLization, and to test several tools that have been developed at the UNDL Foundation. The resulting UNL document is also planned to be used in the evaluation of UNL-based translations, and as a training material for VALERIE, the Virtual Learning Environment for UNL.

LACE

The main goal of the project LACE is to build language modules out of data automatically extracted from comparable corpora. The results are expected to be incorporated in the architecture of UNL-based systems as supplementary resources for natural language disambiguation, both in analysis and generation, and will be used for improving the performance of applications in machine translation, summarization, information retrieval and semantic reasoning.

CRATYLUS

The project Cratylus aims at UNLizing the integral text of Cratylus (360 BC), written by the Greek philosopher Plato (427? BC-347? BC). Cratylus is one of the most well-known Platonic dialogues, and an outstanding cornerstone in the history of language studies. The text was used mainly to provide some standards for UNLization.

EOLSS

The project EOLSS aims at multilingualizing, via UNL, the content of 30 articles of the Encyclopedia of Water, one of the many encyclopedias of the Encyclopedia of Life Support Systems (EOLSS), an integrated compendium of several encyclopaedias, which attempts to forge pathways between disciplines and to foster the transdisciplinary relations between subjects especially related to the life supporting systems.

LIS

The Library Information System (LIS) is an information retrieval system that aims at performing multilingual search over bibliographical metadata. The main goal of the project is to UNLize a small set of MARC21 records and to provide the resources necessary to generate it into at least five different languages other than Arabic.

Software