UC-B1
UC-B1 is an experimental corpus used to refine the initial versions of the grammar for sentence-based UNLization and NLization, using IAN and EUGENE, respectively. It comprises 5 very short texts in English and their corresponding graphs in UNL.
Contents |
The Corpus
UC-B1 consists of 5 texts, which are translations of Aesop's fables to English. Most of them have been derived from the standard version by George Fyler Townsend (available at The Project Gutenberg), but they have suffered slight changes in order to become more suitable for natural language processing.
Text | Title | English* | UNL** | Number of sentences |
---|---|---|---|---|
Text 1 | The Hare and the Tortoise | UCB1_t1_eng.txt | UCB1_t1_unl.txt | 13 |
Text 2 | The Bat and The Weasels | UCB1_t2_eng.txt | to be provided soon | 10 |
Text 3 | The Father and his Sons | UCB1_t3_eng.txt | to be provided soon | 11 |
Text 4 | The Ants and the Grasshopper | UCB1_t4_eng.txt | to be provided soon | 10 |
Text 5 | The Man and the Lion | UCB1_t5_eng.txt | to be provided soon | 11 |
*To be manually translated to your target language in order to be used as the input for UNLization (IAN)
**To be used as the input for NLization (EUGENE)
***To be used for the natural language generation dictionary
Goals
- To provide the dictionary and grammars necessary to UNLize your translated version of UC-B1 (with IAN)
- To provide the dictionary and grammars necessary to NLize, to your target language, the UNL version of UC-B1 (with EUGENE)
Samples and Examples
The following resources have been used to deal with UC-A1 in English and may be used as a sample of what is expected to be provided
- UNLization
- ENG-UNL Dictionary (English dictionary used for the UNLization of UC-B1) (Documentation available at English Dictionary)
- Default Dictionary (Dictionary used to handle blank spaces, punctuation signs and other language-independent information) (Documentation available at Default Dictionary)
- ENG-UNL T-Grammar (Transformation grammar used for the UNLization of UC-B1) (Documentation available at English Grammar)
- ENG-UNL D-Grammar (Disambiguation grammar used for the UNLization of UC-B1) (Documentation available at English Grammar)
- Default T-Grammar (Default transformation grammar for UNLization) (Documentation available at Default Grammar)
- NLization
- UNL-ENG Dictionary (English dictionary used for the NLization of UC-B1) (Documentation available at English Dictionary)
- Default Dictionary (Dictionary used to handle blank spaces, punctuation signs and other language-independent information) (Documentation available at Default Dictionary)
- UNL-ENG T-Grammar (Transformation grammar used for the NLization of UC-B1) (Documentation available at English Grammar)
- Default T-Grammar (Default transformation grammar for NLization) (Documentation available at Default Grammar)
Recommended Readings
Before starting the activity, and in order to fully understand what is expected to be done, it is important for you to be acquainted with the following documentation:
- Tagset, because you are expected to use only the tags included in the tagset
- UNL Dictionary Specs, which is essential to understand the dictionary structure
- UNL Grammar Specs, which is essential to understand the grammar structure
It is also interesting to make a test drive with IAN and EUGENE.