RC-A1

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(The corpus500)
(The corpus500)
Line 2: Line 2:
  
 
== The corpus<sup>500</sup> ==  
 
== The corpus<sup>500</sup> ==  
 
+
*Corpus 500 according to the complexity of the graphs
*The whole corpus in one single file
+
**[http://www.unlweb.net/resources/geneva2012/corpus_eng.txt Corpus500 in English], experimental corpus in English (500 sentences), to be manually translated to the target languages, in order to be used as the input for [[IAN]]
+
**[http://www.unlweb.net/resources/geneva2012/corpus_unl.txt Corpus500 in UNL], experimental corpus in UNL (500 graphs), to be used as the input for [[EUGENE]]
+
*Corpus 500 according to the complexity of the graphs (the same as above, but split in different files)
+
 
{| border="1" cellpadding="2" align=center
 
{| border="1" cellpadding="2" align=center
 
|+Corpus
 
|+Corpus
Line 94: Line 90:
 
|[http://www.unlweb.net/resources/geneva2012/problems.txt problems_unl.txt]
 
|[http://www.unlweb.net/resources/geneva2012/problems.txt problems_unl.txt]
 
|}
 
|}
 +
*The whole corpus in one single file
 +
**[http://www.unlweb.net/resources/geneva2012/corpus_eng.txt Corpus500 in English], experimental corpus in English (500 sentences), to be manually translated to the target languages, in order to be used as the input for [[IAN]]
 +
**[http://www.unlweb.net/resources/geneva2012/corpus_unl.txt Corpus500 in UNL], experimental corpus in UNL (500 graphs), to be used as the input for [[EUGENE]]

Revision as of 20:21, 23 July 2012

The Corpus500 is an experimental corpus used to prepare the initial versions of the grammar for sentence-based UNLization and NLization, using IAN and EUGENE, respectively. It comprises a list of 500 sentences in English and their corresponding graphs in UNL, and is supposed to cover very basic linguistic phenomena.

The corpus500

  • Corpus 500 according to the complexity of the graphs
Corpus
Order Description Analysis (English original) Generation (UNL)
0 Training Corpus (Corpus 50) Corpus 50 Corpus 50
1 Temporary entries temp_org.txt temp_unl.txt
2 Entries with no attribute or relation attribute0_org.txt attribute0_unl.txt
3 one-attribute entries attribute1_org.txt attribute1_unl.txt
4 two-attribute entries attribute2_org.txt attribute2_unl.txt
5 three-attribute entries attribute3_org.txt attribute3_unl.txt
6 one-relation entries relation1_org.txt relation1_unl.txt
7 two-relation entries relation2_org.txt relation2_unl.txt
8 three-relation entries relation3_org.txt relation3_unl.txt
9 four-relation entries relation4_org.txt relation4_unl.txt
10 five-relation entries relation5_org.txt relation5_unl.txt
11 six-relation entries relation6_org.txt relation6_unl.txt
12 numbers and numerals numbers_org.txt numbers_unl.txt
13 expressions of time time_org.txt time_unl.txt
14 relative clauses relatives_org.txt relatives_unl.txt
15 special issues problems_org.txt problems_unl.txt
  • The whole corpus in one single file
    • Corpus500 in English, experimental corpus in English (500 sentences), to be manually translated to the target languages, in order to be used as the input for IAN
    • Corpus500 in UNL, experimental corpus in UNL (500 graphs), to be used as the input for EUGENE
Software