RC-A1
From UNL Wiki
(Difference between revisions)
(→Resources) |
(→The corpus500) |
||
Line 71: | Line 71: | ||
|- | |- | ||
|13 | |13 | ||
+ | |pronouns | ||
+ | |[http://www.unlweb.net/resources/corpus500/pronouns_org.txt numbers_org.txt] | ||
+ | |[http://www.unlweb.net/resources/corpus500/pronouns_unl.txt numbers_unl.txt] | ||
+ | |- | ||
+ | |14 | ||
|expressions of time | |expressions of time | ||
|[http://www.unlweb.net/resources/corpus500/time_org.txt time_org.txt] | |[http://www.unlweb.net/resources/corpus500/time_org.txt time_org.txt] | ||
|[http://www.unlweb.net/resources/corpus500/time_unl.txt time_unl.txt] | |[http://www.unlweb.net/resources/corpus500/time_unl.txt time_unl.txt] | ||
|- | |- | ||
− | | | + | |15 |
|relative clauses | |relative clauses | ||
|[http://www.unlweb.net/resources/corpus500/relatives_org.txt relatives_org.txt] | |[http://www.unlweb.net/resources/corpus500/relatives_org.txt relatives_org.txt] | ||
|[http://www.unlweb.net/resources/corpus500/relatives_unl.txt relatives_unl.txt] | |[http://www.unlweb.net/resources/corpus500/relatives_unl.txt relatives_unl.txt] | ||
|- | |- | ||
− | | | + | |16 |
|special issues | |special issues | ||
|[http://www.unlweb.net/resources/corpus500/problems_org.txt problems_org.txt] | |[http://www.unlweb.net/resources/corpus500/problems_org.txt problems_org.txt] | ||
Line 88: | Line 93: | ||
**[http://www.unlweb.net/resources/corpus500/corpus500_eng.txt Corpus500 in English], experimental corpus in English (500 sentences), to be manually translated to the target languages, in order to be used as the input for [[IAN]] | **[http://www.unlweb.net/resources/corpus500/corpus500_eng.txt Corpus500 in English], experimental corpus in English (500 sentences), to be manually translated to the target languages, in order to be used as the input for [[IAN]] | ||
**[http://www.unlweb.net/resources/corpus500/corpus500_unl.txt Corpus500 in UNL], experimental corpus in UNL (500 graphs), to be used as the input for [[EUGENE]] | **[http://www.unlweb.net/resources/corpus500/corpus500_unl.txt Corpus500 in UNL], experimental corpus in UNL (500 graphs), to be used as the input for [[EUGENE]] | ||
+ | |||
== Resources == | == Resources == | ||
The following resources have been used to deal with Corpus 500 in English and may be used as a sample of what is expected to be provided | The following resources have been used to deal with Corpus 500 in English and may be used as a sample of what is expected to be provided |
Revision as of 20:38, 23 July 2012
The Corpus500 is an experimental corpus used to prepare the initial versions of the grammar for sentence-based UNLization and NLization, using IAN and EUGENE, respectively. It comprises a list of 500 sentences in English and their corresponding graphs in UNL, and is supposed to cover very basic linguistic phenomena.
The corpus500
- Corpus 500 according to the complexity of the graphs
Order | Description | Analysis (English original) | Generation (UNL) |
---|---|---|---|
1 | Temporary entries | temp_org.txt | temp_unl.txt |
2 | Entries with no attribute or relation | attribute0_org.txt | attribute0_unl.txt |
3 | one-attribute entries | attribute1_org.txt | attribute1_unl.txt |
4 | two-attribute entries | attribute2_org.txt | attribute2_unl.txt |
5 | three-attribute entries | attribute3_org.txt | attribute3_unl.txt |
6 | one-relation entries | relation1_org.txt | relation1_unl.txt |
7 | two-relation entries | relation2_org.txt | relation2_unl.txt |
8 | three-relation entries | relation3_org.txt | relation3_unl.txt |
9 | four-relation entries | relation4_org.txt | relation4_unl.txt |
10 | five-relation entries | relation5_org.txt | relation5_unl.txt |
11 | six-relation entries | relation6_org.txt | relation6_unl.txt |
12 | numbers and numerals | numbers_org.txt | numbers_unl.txt |
13 | pronouns | numbers_org.txt | numbers_unl.txt |
14 | expressions of time | time_org.txt | time_unl.txt |
15 | relative clauses | relatives_org.txt | relatives_unl.txt |
16 | special issues | problems_org.txt | problems_unl.txt |
- The whole corpus in one single file
- Corpus500 in English, experimental corpus in English (500 sentences), to be manually translated to the target languages, in order to be used as the input for IAN
- Corpus500 in UNL, experimental corpus in UNL (500 graphs), to be used as the input for EUGENE
Resources
The following resources have been used to deal with Corpus 500 in English and may be used as a sample of what is expected to be provided
- Analysis
- EN-UNL Dictionary (English dictionary used for the UNLization of the Corpus500)
- EN-UNL T-Grammar (Transformation grammar used for the UNLization of the Corpus500)
- EN-UNL D-Grammar (Disambiguation grammar used for the UNLization of the Corpus500)
- Generation
- UNL-EN Dictionary (English dictionary used for the NLization of the Corpus500)
- UNL-EN T-Grammar (Transformation grammar used for the NLization of the Corpus500)
- UNL-EN D-Grammar (Disambiguation grammar used for the NLization of the Corpus500)