Projects

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(IGLU)
Line 24: Line 24:
  
 
== MIR ==
 
== MIR ==
The project MIR (Multilingual Infra-stRucture) aims at creating a general-purpose multilingual lexicon to be used in natural language processing. MIR is a centralized repository of lexical data based on the UNL Core Dictionary 1.0, which has been extracted out of the WordNet3.0. It contains 117,659 entries representing different sets of synonyms (or synsets) of the English language, which have been associated to lexical items of several different languages, as in many wordnet-based initiatives. Differently from other wordnets, however, the MIR  intends to provide a concept-to-word database (i.e., a semasiological, decoding or writer's dictionary) instead of a word-to-concept lexicon (onomosialogical, encoding, reader's dictionary).
+
The project MIR (Multilingual Infra-stRucture) aims at creating a general-purpose multilingual lexicon to be used in natural language processing. MIR is a centralized repository of lexical data based on the UNL Dictionary, which has been extracted out of the WordNet3.0. It contains 117,659 entries representing different sets of synonyms (or synsets) of the English language, which have been associated to lexical items of several different languages, as in many wordnet-based initiatives. Differently from other wordnets, however, the MIR  intends to provide a concept-to-word database (i.e., a semasiological, decoding or writer's dictionary) instead of a word-to-concept lexicon (onomosialogical, encoding, reader's dictionary).

Revision as of 14:04, 17 September 2012

Contents

BRUNO

The project BRUNO aims at providing basic resources for UNL-oriented processing.

CRATYLUS

The project Cratylus aims at UNLizing the integral text of Cratylus (360 BC), written by the Greek philosopher Plato (427? BC-347? BC). Cratylus is one of the most well-known Platonic dialogues, and an outstanding cornerstone in the history of language studies. The text was used mainly to provide some standards for UNLization.

EOLSS

The project EOLSS aims at multilingualizing, via UNL, the content of 30 articles of the Encyclopedia of Water, one of the many encyclopedias of the Encyclopedia of Life Support Systems (EOLSS), an integrated compendium of several encyclopaedias, which attempts to forge pathways between disciplines and to foster the transdisciplinary relations between subjects especially related to the life supporting systems.

IGLU

The project IGLU intends to map WordNet glosses from English into UNL. The project is divided into two main phases: the first one (iGLU#1) addresses a subset of 27,255 synsets and is supposed to be carried out in a predominantly human basis; the second one (iGLU#2) focuses on the remaining 90,404 synsets and it is expected to be mainly automatic. In iGLU#1, linguists are supposed to UNL-ize WordNet definitions through the UNL Editor, a graph-based UNL authoring tool available at the UNLdev. Decisions are stored in a UNL-ization memory, which comprises mappings between lexical items of English and Universal Words. Information on attributes and relations are also encoded. These data will be used in the second phase, when the UNL-ization process is expected to be performed by IAN - the UNDL Foundation Interactive ANalyzer -, under development. IAN requires much less human intervention than the UNL Editor, and it is a first step towards a fully-automatic natural language analysis system. Results of the project iGLU are expected to be used not only in compiling the UNL-ization memory, but also in populating the UNL Knowledge Base, which is an essential part of the architecture of the UNL system. It will improve the quality of word sense disambiguation and enhance the capability of information retrieval and extraction through UNL.

LACE

The main goal of the project LACE is to build language modules out of data automatically extracted from comparable corpora. The results are expected to be incorporated in the architecture of UNL-based systems as supplementary resources for natural language disambiguation, both in analysis and generation, and will be used for improving the performance of applications in machine translation, summarization, information retrieval and semantic reasoning. The project has been developed under the CADMOS consortium (University of Geneva, University of Lausanne and École Politechnique Fédérale de Lausanne), and is supported by the Wilsdorf Foundation.

LE PETIT PRINCE

The project Le Petit Prince (or LPP) aims at UNLizing the integral text of Le Petit Prince, a French novel published by Antoine de Saint-Exupéry in 1943. The main goal is to set standards and guidelines for human UNLization, and to test several tools that have been developed at the UNDL Foundation. The resulting UNL document is also planned to be used in the evaluation of UNL-based translations, and as a training material for VALERIE, the Virtual Learning Environment for UNL.

LEWIS & SHORT

The project Lewis & Short aims at mapping lemmas extracted from the Lewis & Short Latin Dictionary (1879) into UNL. The project is coordinated by the UNL Center at the University of Patras, in Greece, under the supervision of Dr. Olga Vartzioti.

LIS

The Library Information System (LIS) is an information retrieval system that aims at performing multilingual search over bibliographical metadata. The main goal of the project is to UNLize a small set of MARC21 records and to provide the resources necessary to generate it into at least five different languages other than Arabic. The project has been developed by the UNL Center at the Library of Alexandria.

MIR

The project MIR (Multilingual Infra-stRucture) aims at creating a general-purpose multilingual lexicon to be used in natural language processing. MIR is a centralized repository of lexical data based on the UNL Dictionary, which has been extracted out of the WordNet3.0. It contains 117,659 entries representing different sets of synonyms (or synsets) of the English language, which have been associated to lexical items of several different languages, as in many wordnet-based initiatives. Differently from other wordnets, however, the MIR intends to provide a concept-to-word database (i.e., a semasiological, decoding or writer's dictionary) instead of a word-to-concept lexicon (onomosialogical, encoding, reader's dictionary).

Software