RC-A1

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(The corpus500)
(The corpus500)
Line 72: Line 72:
 
|12
 
|12
 
|numbers and numerals
 
|numbers and numerals
|[http://www.unlweb.net/resources/geneva2012/numbers.txt numbers_org.txt]
+
|[http://www.unlweb.net/resources/geneva2012/numbers_org.txt numbers_org.txt]
|[http://www.unlweb.net/resources/geneva2012/numbers.txt numbers_unl.txt]
+
|[http://www.unlweb.net/resources/geneva2012/numbers_unl.txt numbers_unl.txt]
 
|-
 
|-
 
|13
 
|13
 
|expressions of time
 
|expressions of time
|[http://www.unlweb.net/resources/geneva2012/time.txt time_org.txt]
+
|[http://www.unlweb.net/resources/geneva2012/time_org.txt time_org.txt]
|[http://www.unlweb.net/resources/geneva2012/time.txt time_unl.txt]
+
|[http://www.unlweb.net/resources/geneva2012/time_unl.txt time_unl.txt]
 
|-
 
|-
 
|14
 
|14
 
|relative clauses
 
|relative clauses
|[http://www.unlweb.net/resources/geneva2012/relatives.txt relatives_org.txt]
+
|[http://www.unlweb.net/resources/geneva2012/relatives_org.txt relatives_org.txt]
|[http://www.unlweb.net/resources/geneva2012/relatives.txt relatives_unl.txt]
+
|[http://www.unlweb.net/resources/geneva2012/relatives_unl.txt relatives_unl.txt]
 
|-
 
|-
 
|15
 
|15
 
|special issues
 
|special issues
|[http://www.unlweb.net/resources/geneva2012/problems.txt problems_org.txt]
+
|[http://www.unlweb.net/resources/geneva2012/problems_org.txt problems_org.txt]
|[http://www.unlweb.net/resources/geneva2012/problems.txt problems_unl.txt]
+
|[http://www.unlweb.net/resources/geneva2012/problems_unl.txt problems_unl.txt]
 
|}
 
|}
 
*The whole corpus in one single file
 
*The whole corpus in one single file
 
**[http://www.unlweb.net/resources/geneva2012/corpus_eng.txt Corpus500 in English], experimental corpus in English (500 sentences), to be manually translated to the target languages, in order to be used as the input for [[IAN]]
 
**[http://www.unlweb.net/resources/geneva2012/corpus_eng.txt Corpus500 in English], experimental corpus in English (500 sentences), to be manually translated to the target languages, in order to be used as the input for [[IAN]]
 
**[http://www.unlweb.net/resources/geneva2012/corpus_unl.txt Corpus500 in UNL], experimental corpus in UNL (500 graphs), to be used as the input for [[EUGENE]]
 
**[http://www.unlweb.net/resources/geneva2012/corpus_unl.txt Corpus500 in UNL], experimental corpus in UNL (500 graphs), to be used as the input for [[EUGENE]]

Revision as of 20:22, 23 July 2012

The Corpus500 is an experimental corpus used to prepare the initial versions of the grammar for sentence-based UNLization and NLization, using IAN and EUGENE, respectively. It comprises a list of 500 sentences in English and their corresponding graphs in UNL, and is supposed to cover very basic linguistic phenomena.

The corpus500

  • Corpus 500 according to the complexity of the graphs
Corpus
Order Description Analysis (English original) Generation (UNL)
0 Training Corpus (Corpus 50) Corpus 50 Corpus 50
1 Temporary entries temp_org.txt temp_unl.txt
2 Entries with no attribute or relation attribute0_org.txt attribute0_unl.txt
3 one-attribute entries attribute1_org.txt attribute1_unl.txt
4 two-attribute entries attribute2_org.txt attribute2_unl.txt
5 three-attribute entries attribute3_org.txt attribute3_unl.txt
6 one-relation entries relation1_org.txt relation1_unl.txt
7 two-relation entries relation2_org.txt relation2_unl.txt
8 three-relation entries relation3_org.txt relation3_unl.txt
9 four-relation entries relation4_org.txt relation4_unl.txt
10 five-relation entries relation5_org.txt relation5_unl.txt
11 six-relation entries relation6_org.txt relation6_unl.txt
12 numbers and numerals numbers_org.txt numbers_unl.txt
13 expressions of time time_org.txt time_unl.txt
14 relative clauses relatives_org.txt relatives_unl.txt
15 special issues problems_org.txt problems_unl.txt
  • The whole corpus in one single file
    • Corpus500 in English, experimental corpus in English (500 sentences), to be manually translated to the target languages, in order to be used as the input for IAN
    • Corpus500 in UNL, experimental corpus in UNL (500 graphs), to be used as the input for EUGENE
Software