RC-A1

From UNL Wiki
Revision as of 20:21, 23 July 2012 by Martins (Talk | contribs)
Jump to: navigation, search

The Corpus500 is an experimental corpus used to prepare the initial versions of the grammar for sentence-based UNLization and NLization, using IAN and EUGENE, respectively. It comprises a list of 500 sentences in English and their corresponding graphs in UNL, and is supposed to cover very basic linguistic phenomena.

The corpus500

  • Corpus 500 according to the complexity of the graphs
Corpus
Order Description Analysis (English original) Generation (UNL)
0 Training Corpus (Corpus 50) Corpus 50 Corpus 50
1 Temporary entries temp_org.txt temp_unl.txt
2 Entries with no attribute or relation attribute0_org.txt attribute0_unl.txt
3 one-attribute entries attribute1_org.txt attribute1_unl.txt
4 two-attribute entries attribute2_org.txt attribute2_unl.txt
5 three-attribute entries attribute3_org.txt attribute3_unl.txt
6 one-relation entries relation1_org.txt relation1_unl.txt
7 two-relation entries relation2_org.txt relation2_unl.txt
8 three-relation entries relation3_org.txt relation3_unl.txt
9 four-relation entries relation4_org.txt relation4_unl.txt
10 five-relation entries relation5_org.txt relation5_unl.txt
11 six-relation entries relation6_org.txt relation6_unl.txt
12 numbers and numerals numbers_org.txt numbers_unl.txt
13 expressions of time time_org.txt time_unl.txt
14 relative clauses relatives_org.txt relatives_unl.txt
15 special issues problems_org.txt problems_unl.txt
  • The whole corpus in one single file
    • Corpus500 in English, experimental corpus in English (500 sentences), to be manually translated to the target languages, in order to be used as the input for IAN
    • Corpus500 in UNL, experimental corpus in UNL (500 graphs), to be used as the input for EUGENE
Software