IV UNL Olympiad

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(Preparatory Phase)
(Languages)
Line 51: Line 51:
  
 
=== Languages ===
 
=== Languages ===
The list of languages participating in the IV UNL Olympiad depends on the conclusion of the project AESOP-A1 and will be informed on 7 Oct 2014.
+
The list of languages participating in the IV UNL Olympiad is the following:
 +
*Armenian
 +
*Baatonum
 +
*Bengali
 +
*Bosnian
 +
*Bulgarian
 +
*Czech
 +
*French
 +
*Georgian
 +
*German
 +
*Greek (Ancient)
 +
*Greek (Modern)
 +
*Hindi
 +
*Khmer
 +
*Latin
 +
*Oriya
 +
*Panjabi
 +
*Persian
 +
*Portuguese
 +
*Russian
 +
*Sinhala
 +
*Tamil
 +
*Telugu
 +
*Ukrainian
 +
*Vietnamese
  
 
=== Instructions ===
 
=== Instructions ===

Revision as of 13:01, 7 October 2014

The UNL Olympiad is a series of competitions organised by the UNDL Foundation in order to foster the development of UNL-driven resources (dictionaries, grammars and corpora). The fourth edition of the Olympiad is devoted to the development of grammars and dictionaries for UNLizing (with IAN) and NLizing (with EUGENE) the corpus derived from the project AESOP-A1.

Contents

Important dates

Preparatory phase

  • AESOP-A1: until 22 Sep 2014
  • Open discussion: 23-30 Sep 2014
  • Official release of the corpus: 7 Oct 2014

Competition

  • Deadline for uploading the grammars and dictionaries: 7 Nov 2014
  • First results: 23 Nov 2014
  • Open discussion of the results: 24-30 Nov 2014
  • Final results: 7 Dec 2014

Preparatory Phase (Concluded)

Participation in the preparatory phase is open to all candidates and is not compulsory, provided that at least one user of the working language completes the project AESOP-A1.

AESOP-A1

The first phase of the IV UNL Olympiad is the project AESOP-A1, which is open and funded for all languages. This project will set the corpus and the reference for UNLization and NLization. It consists of 13 UNL graphs that must be NLized (i.e., generated into natural language, manually). This process must be done only once for each language, i.e., it is not necessary (nor possible) that all users address the 13 UNL graphs. The progress report of the project AESOP-A1 with the number of available entries is available at UNLWEB>PROJECTS>AESOP-A1>PROGRESS REPORT.

Open Discussion

The second phase of the IV UNL Olympiad will consist in an open discussion of the results of the project AESOP-A1. All users will be able to propose the inclusion, suppression or modification of the NLizations proposed to the UNL graphs, in order to avoid any biases and privileges for specific users.

Official Release

The official corpus and set of languages, resulting from the preparatory phases, was released on 7 Oct 2014.

Competition

Goals

The IV UNL Olympiad has two main goals:

  • To prepare the dictionaries and grammars for UNLizing, with IAN, the corpus AESOP-A1; and/or
  • To prepare the dictionaries and grammars for NLizing, with EUGENE, the corpus AESOP-A1.

UNLization is the process of representing, into UNL, the information conveyed by a natural language document. NLization, conversely, is the process of representing, in natural language, the information conveyed by a UNL document. These are done, respectively, with IAN and EUGENE, which are engines available at the UNLdev.

Modalities

The competition is organised in two modalities:

  • Best UNLization Grammar for <LANGUAGE>
  • Best NLization Grammar for <LANGUAGE>

Where <LANGUAGE> is one of the languages participating in this Olympiad (see the complete list below).
Candidates may participate in one or two modalities, i.e., they may work with the UNLization grammar, with the NLization grammar, or with both.
Candidates may also participate in one or more languages, provided that they belong to the official list.

Prizes

Prizes are awarded to the best grammars of each modality (UNLization and NLization) for each language[1]:

  • 1st place: Gold Medal
  • 2nd place: Silver Medal
  • 3rd place: Bronze Medal

Additionally, the 10 best UNLization grammars among all languages and the 10 best NLization grammars among all languages will be awarded USD500.00 each.[2]

Corpus

The official corpus will be available for download at UNLWEB>UNLARIUM>CORPUS>AESOP-A1>EXPORT on 7 Oct 2014.

Languages

The list of languages participating in the IV UNL Olympiad is the following:

  • Armenian
  • Baatonum
  • Bengali
  • Bosnian
  • Bulgarian
  • Czech
  • French
  • Georgian
  • German
  • Greek (Ancient)
  • Greek (Modern)
  • Hindi
  • Khmer
  • Latin
  • Oriya
  • Panjabi
  • Persian
  • Portuguese
  • Russian
  • Sinhala
  • Tamil
  • Telugu
  • Ukrainian
  • Vietnamese

Instructions

  1. The competition is free and open to any participant, but it is limited to the set of languages listed above.
  2. In order to apply, candidates must upload the grammar and dictionary files to www.unlweb.net/unlversity until the deadline
  3. The corpus must be extracted from the UNLarium (at UNLWEB>UNLARIUM>CORPUS>AESOP-A1>EXPORT) and may not undergo any change. The goal of the UNLization modality is to UNLize ALL sentences from the NL corpus; the goal of the NLization modality is to NLize ALL graphs from the UNL corpus.[3]
  4. The dictionary files must comply with the Dictionary Specs and may only bring features present in the Tagset. They should not contain temporary words.
  5. The grammar files must comply with the Grammar Specs and must be as generic possible. They should not target the specific sentences of the corpus, but the general structures presented there.
  6. The F-measure of the grammars must be equal or greater than 0.9[4].
  7. The files must be original. Grammars whose similarity proves to go beyond any reasonable doubt will be discarded, unless provided by the same author (for different languages).

Evaluation

Grammars will be evaluated and ranked according to the following criteria:

  • Best F-measure
  • Scalability (i.e., extendibility, or the capacity of being reused to other corpora), in case of grammars with the same F-Measure
  • Date of submission, in case of grammars with the same F-Measure and equally scalable

Notes

  1. This means that for each language there will be awarded up to 6 prizes: Best UNLization Grammar, Second Best UNLization Grammar, Third Best UNLization Grammar, Best NLization Grammar, Second Best UNLization Grammar and Third Best NLization Grammar
  2. The value of USD500.00 will be paid only to the 10 best UNLization or NLization grammars in general, and not to the 10 best UNLization/NLizations of each language.
  3. The goal of the Olympiad is to provide ONE POSSIBLE MAPPING for each structure, i.e., to map each natural language to at least one valid UNL graph, and to map the UNL graph into at least one valid NL sentence. This means that, if the natural language is ambiguous, and may be mapped into several different UNL graphs, the UNLization will be considered valid if the resulting UNL graph is one of the possible candidates according to the corpus. Conversely, whenever the same UNL graph may be mapped into several different NL sentences, the NLization is considered valid if the resulting NL sentence is one of the possible mappings according to the corpus.
  4. The F-measure may be calculate at UNLWEB>UNLARIUM>TOOLS>F-MEASURE
Software