UC-B1
(→The Corpus) |
(→Methodology) |
||
(16 intermediate revisions by one user not shown) | |||
Line 1: | Line 1: | ||
− | UC- | + | UC-B1 is an experimental corpus used to refine the initial versions of the grammar for sentence-based [[UNLization]] and [[NLization]], using [[IAN]] and [[EUGENE]], respectively. It comprises 5 very short texts in English and their corresponding graphs in UNL. |
== The Corpus == | == The Corpus == | ||
− | UC- | + | UC-B1 consists of 5 texts, which are translations of Aesop's fables to English. Most of them have been derived from the standard version by George Fyler Townsend (available at [http://www.gutenberg.org/cache/epub/21/pg21.txt The Project Gutenberg]), but they have suffered slight changes in order to become more suitable for natural language processing. <br /> |
{|border=1 align=center cellpadding=2 | {|border=1 align=center cellpadding=2 | ||
Line 9: | Line 9: | ||
!English* | !English* | ||
!UNL** | !UNL** | ||
− | |||
!Number of sentences | !Number of sentences | ||
|- | |- | ||
|Text 1 | |Text 1 | ||
|align=center|The Hare and the Tortoise | |align=center|The Hare and the Tortoise | ||
− | |align=center|[http://www.unlweb.net/resources/ | + | |align=center|[http://www.unlweb.net/resources/corpus/UCB1/UCB1_t1_eng.txt UCB1_t1_eng.txt] |
− | + | |align=center|[http://www.unlweb.net/resources/corpus/UCB1/UCB1_t1_unl.txt UCB1_t1_unl.txt] | |
− | |align=center|[http://www.unlweb.net/resources/ | + | |
|align=center|13 | |align=center|13 | ||
|- | |- | ||
|Text 2 | |Text 2 | ||
|align=center|The Bat and The Weasels | |align=center|The Bat and The Weasels | ||
− | |align=center|[http://www.unlweb.net/resources/ | + | |align=center|[http://www.unlweb.net/resources/corpus/UCB1/UCB1_t2_eng.txt UCB1_t2_eng.txt] |
− | + | ||
|align=center|to be provided soon | |align=center|to be provided soon | ||
|align=center|10 | |align=center|10 | ||
Line 28: | Line 25: | ||
|Text 3 | |Text 3 | ||
|align=center|The Father and his Sons | |align=center|The Father and his Sons | ||
− | |align=center|[http://www.unlweb.net/resources/ | + | |align=center|[http://www.unlweb.net/resources/corpus/UCB1/UCB1_t3_eng.txt UCB1_t3_eng.txt] |
− | + | ||
|align=center|to be provided soon | |align=center|to be provided soon | ||
|align=center|11 | |align=center|11 | ||
Line 35: | Line 31: | ||
|Text 4 | |Text 4 | ||
|align=center|The Ants and the Grasshopper | |align=center|The Ants and the Grasshopper | ||
− | |align=center|[http://www.unlweb.net/resources/ | + | |align=center|[http://www.unlweb.net/resources/corpus/UCB1/UCB1_t4_eng.txt UCB1_t4_eng.txt] |
− | + | ||
|align=center|to be provided soon | |align=center|to be provided soon | ||
|align=center|10 | |align=center|10 | ||
Line 42: | Line 37: | ||
|Text 5 | |Text 5 | ||
|align=center|The Man and the Lion | |align=center|The Man and the Lion | ||
− | |align=center|[http://www.unlweb.net/resources/ | + | |align=center|[http://www.unlweb.net/resources/corpus/UCB1/UCB1_t5_eng.txt UCB1_t5_eng.txt] |
− | + | ||
|align=center|to be provided soon | |align=center|to be provided soon | ||
|align=center|11 | |align=center|11 | ||
Line 51: | Line 45: | ||
<nowiki>**</nowiki>To be used as the input for NLization (EUGENE)<br /> | <nowiki>**</nowiki>To be used as the input for NLization (EUGENE)<br /> | ||
<nowiki>***</nowiki>To be used for the natural language generation dictionary | <nowiki>***</nowiki>To be used for the natural language generation dictionary | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
== Goals == | == Goals == | ||
− | #To provide the dictionary and grammars necessary to '''UNLize''' your translated version of UC- | + | #To provide the dictionary and grammars necessary to '''UNLize''' your translated version of UC-B1 (with [[IAN]]) |
− | #To provide the dictionary and grammars necessary to '''NLize''', to your target language, the UNL version of UC- | + | #To provide the dictionary and grammars necessary to '''NLize''', to your target language, the UNL version of UC-B1 (with [[EUGENE]]) |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
== Samples and Examples == | == Samples and Examples == | ||
− | + | The following resources have been used to deal with UC-A1 in English and may be used as a sample of what is expected to be provided | |
+ | *UNLization | ||
+ | **[http://www.unlweb.net/resources/dic/UCB1/eng_unl_dic.txt ENG-UNL Dictionary] (English dictionary used for the UNLization of UC-B1) (Documentation available at [[English_dictionary|English Dictionary]]) | ||
+ | **[http://www.unlweb.net/resources/dic/default_dic.txt Default Dictionary] (Dictionary used to handle blank spaces, punctuation signs and other language-independent information) (Documentation available at [[Default Dictionary]]) | ||
+ | **[http://www.unlweb.net/resources/grammar/UCB1/eng_unl_tgrammar.txt ENG-UNL T-Grammar] (Transformation grammar used for the UNLization of UC-B1) (Documentation available at [[English_grammar|English Grammar]]) | ||
+ | **[http://www.unlweb.net/resources/grammar/UCB1/eng_unl_dgrammar.txt ENG-UNL D-Grammar] (Disambiguation grammar used for the UNLization of UC-B1) (Documentation available at [[English_grammar|English Grammar]]) | ||
+ | **[http://www.unlweb.net/resources/grammar/nl_unl_tgrammar.txt Default T-Grammar] (Default transformation grammar for UNLization) (Documentation available at [[Default Grammar]]) | ||
+ | *NLization | ||
+ | **[http://www.unlweb.net/resources/dic/UCB1/unl_eng_dic.txt UNL-ENG Dictionary] (English dictionary used for the NLization of UC-B1) (Documentation available at [[English_dictionary|English Dictionary]]) | ||
+ | **[http://www.unlweb.net/resources/dic/default_dic.txt Default Dictionary] (Dictionary used to handle blank spaces, punctuation signs and other language-independent information) (Documentation available at [[Default Dictionary]]) | ||
+ | **[http://www.unlweb.net/resources/dic/UCB1/unl_eng_tgrammar.txt UNL-ENG T-Grammar] (Transformation grammar used for the NLization of UC-B1) (Documentation available at [[English_grammar|English Grammar]]) | ||
+ | **[http://www.unlweb.net/resources/grammar/unl_nl_tgrammar.txt Default T-Grammar] (Default transformation grammar for NLization) (Documentation available at [[Default Grammar]]) | ||
== Recommended Readings == | == Recommended Readings == |
Latest revision as of 17:51, 17 April 2013
UC-B1 is an experimental corpus used to refine the initial versions of the grammar for sentence-based UNLization and NLization, using IAN and EUGENE, respectively. It comprises 5 very short texts in English and their corresponding graphs in UNL.
Contents |
The Corpus
UC-B1 consists of 5 texts, which are translations of Aesop's fables to English. Most of them have been derived from the standard version by George Fyler Townsend (available at The Project Gutenberg), but they have suffered slight changes in order to become more suitable for natural language processing.
Text | Title | English* | UNL** | Number of sentences |
---|---|---|---|---|
Text 1 | The Hare and the Tortoise | UCB1_t1_eng.txt | UCB1_t1_unl.txt | 13 |
Text 2 | The Bat and The Weasels | UCB1_t2_eng.txt | to be provided soon | 10 |
Text 3 | The Father and his Sons | UCB1_t3_eng.txt | to be provided soon | 11 |
Text 4 | The Ants and the Grasshopper | UCB1_t4_eng.txt | to be provided soon | 10 |
Text 5 | The Man and the Lion | UCB1_t5_eng.txt | to be provided soon | 11 |
*To be manually translated to your target language in order to be used as the input for UNLization (IAN)
**To be used as the input for NLization (EUGENE)
***To be used for the natural language generation dictionary
Goals
- To provide the dictionary and grammars necessary to UNLize your translated version of UC-B1 (with IAN)
- To provide the dictionary and grammars necessary to NLize, to your target language, the UNL version of UC-B1 (with EUGENE)
Samples and Examples
The following resources have been used to deal with UC-A1 in English and may be used as a sample of what is expected to be provided
- UNLization
- ENG-UNL Dictionary (English dictionary used for the UNLization of UC-B1) (Documentation available at English Dictionary)
- Default Dictionary (Dictionary used to handle blank spaces, punctuation signs and other language-independent information) (Documentation available at Default Dictionary)
- ENG-UNL T-Grammar (Transformation grammar used for the UNLization of UC-B1) (Documentation available at English Grammar)
- ENG-UNL D-Grammar (Disambiguation grammar used for the UNLization of UC-B1) (Documentation available at English Grammar)
- Default T-Grammar (Default transformation grammar for UNLization) (Documentation available at Default Grammar)
- NLization
- UNL-ENG Dictionary (English dictionary used for the NLization of UC-B1) (Documentation available at English Dictionary)
- Default Dictionary (Dictionary used to handle blank spaces, punctuation signs and other language-independent information) (Documentation available at Default Dictionary)
- UNL-ENG T-Grammar (Transformation grammar used for the NLization of UC-B1) (Documentation available at English Grammar)
- Default T-Grammar (Default transformation grammar for NLization) (Documentation available at Default Grammar)
Recommended Readings
Before starting the activity, and in order to fully understand what is expected to be done, it is important for you to be acquainted with the following documentation:
- Tagset, because you are expected to use only the tags included in the tagset
- UNL Dictionary Specs, which is essential to understand the dictionary structure
- UNL Grammar Specs, which is essential to understand the grammar structure
It is also interesting to make a test drive with IAN and EUGENE.