IV UNL Olympiad

From UNL Wiki

(Difference between revisions)

Revision as of 13:21, 5 December 2014

The UNL Olympiad is a series of competitions organised by the UNDL Foundation in order to foster the development of UNL-driven resources (dictionaries, grammars and corpora). The fourth edition of the Olympiad is devoted to the development of grammars and dictionaries for UNLizing (with IAN) and NLizing (with EUGENE) the corpus derived from the project AESOP-A1.

Important dates

Preparatory phase

~~AESOP-A1: until 22 Sep 2014~~
~~Open discussion: 23-30 Sep 2014~~
~~Official release of the corpus: 7 Oct 2014~~

Competition

~~Deadline for uploading the grammars and dictionaries: 15 Nov 2014~~
~~First results: 23 Nov 2014~~
~~Open discussion of the results: 24-30 Nov 2014~~
~~Final results: 7 Dec 2014~~

Preparatory Phase (Concluded)

Participation in the preparatory phase is open to all candidates and is not compulsory, provided that at least one user of the working language completes the project AESOP-A1.

AESOP-A1

The first phase of the IV UNL Olympiad is the project AESOP-A1, which is open and funded for all languages. This project will set the corpus and the reference for UNLization and NLization. It consists of 13 UNL graphs that must be NLized (i.e., generated into natural language, manually). This process must be done only once for each language, i.e., it is not necessary (nor possible) that all users address the 13 UNL graphs. The progress report of the project AESOP-A1 with the number of available entries is available at UNLWEB>PROJECTS>AESOP-A1>PROGRESS REPORT.

Open Discussion

The second phase of the IV UNL Olympiad will consist in an open discussion of the results of the project AESOP-A1. All users will be able to propose the inclusion, suppression or modification of the NLizations proposed to the UNL graphs, in order to avoid any biases and privileges for specific users.

Official Release

The official corpus and set of languages, resulting from the preparatory phases, was released on 7 Oct 2014.

Competition

Goals

The IV UNL Olympiad has two main goals:

To prepare the dictionaries and grammars for UNLizing, with IAN, the corpus AESOP-A1; and/or
To prepare the dictionaries and grammars for NLizing, with EUGENE, the corpus AESOP-A1.

UNLization is the process of representing, into UNL, the information conveyed by a natural language document. NLization, conversely, is the process of representing, in natural language, the information conveyed by a UNL document. These are done, respectively, with IAN and EUGENE, which are engines available at the UNLdev.

Modalities

The competition is organised in two modalities:

Best UNLization Grammar for <LANGUAGE>
Best NLization Grammar for <LANGUAGE>

Where <LANGUAGE> is one of the languages participating in this Olympiad (see the complete list below).
Candidates may participate in one or two modalities, i.e., they may work with the UNLization grammar, with the NLization grammar, or with both.
Candidates may also participate in one or more languages, provided that they belong to the official list.

Prizes

Prizes are awarded to the best grammars of each modality (UNLization and NLization) for each language^[1]:

1st place: Gold Medal
2nd place: Silver Medal
3rd place: Bronze Medal

Additionally, the 10 best UNLization grammars among all languages and the 10 best NLization grammars among all languages will be awarded USD500.00 each.^[2]

Corpus

The official corpus will be available for download at UNLWEB>UNLARIUM>CORPUS>AESOP-A1>EXPORT on 7 Oct 2014.

Languages

The list of languages participating in the IV UNL Olympiad is the following:

Armenian
Baatonum
Bengali
Bosnian
Bulgarian
Czech
Estonian
French
Georgian
German
Greek (Ancient)
Greek (Modern)
Hindi
Khmer
Latin
Oriya
Panjabi
Persian
Portuguese
Russian
Sinhala
Tamil
Telugu
Ukrainian
Vietnamese

Instructions

The competition is free and open to any participant, but it is limited to the set of languages listed above.
The OFFICIAL CORPUS must be extracted from the UNLarium (at UNLWEB>UNLARIUM>CORPUS>AESOP-A1>EXPORT>WORKING LANGUAGE). The NL corpus is used in UNLization (with IAN); the UNL corpus is used in NLization (with EUGENE).
Candidates must build their WORKING CORPUS out of the official corpus by selecting one NL sentence for each UNL graph. Note that, in the official corpus, the same UNL graph may have several candidate NL sentences in the same language. Candidates must select only one (per graph) to work with. This means that the UNLization will involve 13 NL sentences (i.e., candidates are expected to provide the dictionaries and grammars to map 13 sentences from their working language to UNL), and the NLization will involve 13 UNL graphs (i.e., candidates are expected to provide the dictionaries and grammars to map 13 UNL graphs to their working language).
The goal of the UNLization modality is to UNLize ALL sentences from the NL WORKING CORPUS (not from the official corpus); the goal of the NLization modality is to NLize ALL graphs from the UNL WORKING CORPUS.^[3]
Absolutely no change can be made to any sentence from the official corpus (either in natural language or in UNL), i.e., candidates can only choose among the official sentences but cannot alter them, and must use them as they are.
In order to apply, candidates must upload the grammar and dictionary files to www.unlweb.net/unlversity until the deadline
The dictionary files must comply with the Dictionary Specs and may only bring features present in the Tagset. They should not contain temporary words.
The grammar files must comply with the Grammar Specs and must be as generic possible. They should not target the specific sentences of the corpus, but the general structures presented there.
The F-measure of the grammars must be equal or greater than 0.9^[4].
The files must be original. Grammars whose similarity proves to go beyond any reasonable doubt will be discarded, unless provided by the same author (for different languages).

Evaluation

Grammars will be evaluated and ranked according to the following criteria:

Best F-measure
Scalability (i.e., extendibility, or the capacity of being reused to other corpora), in case of grammars with the same F-Measure
Date of submission, in case of grammars with the same F-Measure and equally scalable

Final Results

UNLization (Qualified Grammars Only)

Submission	Author	L Pair	F-Measure	Medal	Position	Files
Submission	Author	L Pair	F-Measure	Medal	Position	Corpus	Dic	T-grammar	D-grammar	Output
22-10-2014	Sergiy Prots	ukr>unl	1.000	GOLDEN	1	[1]	[2]	[3]	[4]	[5]
25-10-2014	Sergiy Prots	rus>unl	1.000	GOLDEN	2	[6]	[7]	[8]	[9]	[10]
10-11-2014	Yordanka Stancheva	bul>unl	0.923	GOLDEN	5	[11]	[12]	[13]	[14]	[15]
11-11-2014	Parteek Kumar	pan>unl	1.000	GOLDEN	3	[16]	[17]	[18]	[19]	[20]
13-11-2014	Sergiy Prots	ger>unl	1.000	GOLDEN	4	[21]	[22]	[23]	[24]	[25]
14-11-2014	Parteek Kumar	hin>unl	0.923	GOLDEN	6	[26]	[27]	[28]	[29]	[30]

NLization (Qualified Grammars Only)

Submission	Author	L Pair	F-Measure	Medal	Position	Files
Submission	Author	L Pair	F-Measure	Medal	Position	Corpus	Dic	T-grammar	D-grammar	Output
11-11-2014	Parteek Kumar	unl>pan	1.000	GOLDEN	1	[31]	[32]	[33]	[34]	[35]
15-11-2014	Parteek Kumar	unl>hin	1.000	GOLDEN	2	[36]	[37]	[38]	[39]	[40]

Notes

↑ This means that for each language there will be awarded up to 6 prizes: Best UNLization Grammar, Second Best UNLization Grammar, Third Best UNLization Grammar, Best NLization Grammar, Second Best UNLization Grammar and Third Best NLization Grammar
↑ The value of USD500.00 will be paid only to the 10 best UNLization or NLization grammars in general, and not to the 10 best UNLization/NLizations of each language.
↑ The goal of the Olympiad is to provide ONE POSSIBLE MAPPING for each structure, i.e., to map natural language sentences to at least one valid UNL graph, and to map the UNL graph into at least one valid NL sentence. This means that, if the natural language is ambiguous, and may be mapped into several different UNL graphs, the UNLization will be considered valid if the resulting UNL graph is one of the possible candidates according to the OFFICIAL CORPUS. Conversely, whenever the same UNL graph may be mapped into several different NL sentences, the NLization is considered valid if the resulting NL sentence is one of the possible mappings according to the OFFICIAL CORPUS.
↑ The F-measure may be calculate at UNLWEB>UNLARIUM>TOOLS>F-MEASURE

[0] This means that for each language there will be awarded up to 6 prizes: Best UNLization Grammar, Second Best UNLization Grammar, Third Best UNLization Grammar, Best NLization Grammar, Second Best UNLization Grammar and Third Best NLization Grammar

[1] The value of USD500.00 will be paid only to the 10 best UNLization or NLization grammars in general, and not to the 10 best UNLization/NLizations of each language.

[2] The goal of the Olympiad is to provide ONE POSSIBLE MAPPING for each structure, i.e., to map natural language sentences to at least one valid UNL graph, and to map the UNL graph into at least one valid NL sentence. This means that, if the natural language is ambiguous, and may be mapped into several different UNL graphs, the UNLization will be considered valid if the resulting UNL graph is one of the possible candidates according to the OFFICIAL CORPUS. Conversely, whenever the same UNL graph may be mapped into several different NL sentences, the NLization is considered valid if the resulting NL sentence is one of the possible mappings according to the OFFICIAL CORPUS.

[3] The F-measure may be calculate at UNLWEB>UNLARIUM>TOOLS>F-MEASURE

[1]

[2]

[3]

[4]

IV UNL Olympiad

Revision as of 13:21, 5 December 2014

Contents

Important dates

Preparatory Phase (Concluded)

AESOP-A1

Open Discussion

Official Release

Competition

Goals

Modalities

Prizes

Corpus

Languages

Instructions

Evaluation

Final Results

UNLization (Qualified Grammars Only)

NLization (Qualified Grammars Only)

Notes

Views

Personal tools

Search

UNL

Lingware

Software

UNL Program

Navigation

Toolbox

Print/export