RC-A1

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(The corpus500)
Line 9: Line 9:
 
!Analysis (English original)
 
!Analysis (English original)
 
!Generation (UNL)
 
!Generation (UNL)
|-
 
|0
 
|Training Corpus (Corpus 50)
 
|[http://www.unlweb.net/resources/mumbai2012/corpus50_eng.txt Corpus 50]
 
|[http://www.unlweb.net/resources/mumbai2012/corpus50_unl.txt Corpus 50]
 
 
|-
 
|-
 
|1
 
|1
 
|Temporary entries
 
|Temporary entries
|[http://www.unlweb.net/resources/geneva2012/temp_org.txt temp_org.txt]
+
|[http://www.unlweb.net/resources/corpus500/temp_org.txt temp_org.txt]
|[http://www.unlweb.net/resources/geneva2012/temp_unl.txt temp_unl.txt]
+
|[http://www.unlweb.net/resources/corpus500/temp_unl.txt temp_unl.txt]
 
|-
 
|-
 
|2
 
|2
 
|Entries with no attribute or relation
 
|Entries with no attribute or relation
|[http://www.unlweb.net/resources/geneva2012/attribute0_org.txt attribute0_org.txt]
+
|[http://www.unlweb.net/resources/corpus500/attribute0_org.txt attribute0_org.txt]
|[http://www.unlweb.net/resources/geneva2012/attribute0_unl.txt attribute0_unl.txt]
+
|[http://www.unlweb.net/resources/corpus500/attribute0_unl.txt attribute0_unl.txt]
 
|-
 
|-
 
|3
 
|3
 
|one-attribute entries
 
|one-attribute entries
|[http://www.unlweb.net/resources/geneva2012/attribute1_org.txt attribute1_org.txt]
+
|[http://www.unlweb.net/resources/corpus500/attribute1_org.txt attribute1_org.txt]
|[http://www.unlweb.net/resources/geneva2012/attribute1_unl.txt attribute1_unl.txt]
+
|[http://www.unlweb.net/resources/corpus500/attribute1_unl.txt attribute1_unl.txt]
 
|-
 
|-
 
|4
 
|4
 
|two-attribute entries
 
|two-attribute entries
|[http://www.unlweb.net/resources/geneva2012/attribute2_org.txt attribute2_org.txt]
+
|[http://www.unlweb.net/resources/corpus500/attribute2_org.txt attribute2_org.txt]
|[http://www.unlweb.net/resources/geneva2012/attribute2_unl.txt attribute2_unl.txt]
+
|[http://www.unlweb.net/resources/corpus500/attribute2_unl.txt attribute2_unl.txt]
 
|-
 
|-
 
|5
 
|5
 
|three-attribute entries
 
|three-attribute entries
|[http://www.unlweb.net/resources/geneva2012/attribute3_org.txt attribute3_org.txt]
+
|[http://www.unlweb.net/resources/corpus500/attribute3_org.txt attribute3_org.txt]
|[http://www.unlweb.net/resources/geneva2012/attribute3_unl.txt attribute3_unl.txt]
+
|[http://www.unlweb.net/resources/corpus500/attribute3_unl.txt attribute3_unl.txt]
 
|-
 
|-
 
|6
 
|6
 
|one-relation entries
 
|one-relation entries
|[http://www.unlweb.net/resources/geneva2012/relation1_org.txt relation1_org.txt]
+
|[http://www.unlweb.net/resources/corpus500/relation1_org.txt relation1_org.txt]
|[http://www.unlweb.net/resources/geneva2012/relation1_unl.txt relation1_unl.txt]
+
|[http://www.unlweb.net/resources/corpus500/relation1_unl.txt relation1_unl.txt]
 
|-
 
|-
 
|7
 
|7
 
|two-relation entries
 
|two-relation entries
|[http://www.unlweb.net/resources/geneva2012/relation2_org.txt relation2_org.txt]
+
|[http://www.unlweb.net/resources/corpus500/relation2_org.txt relation2_org.txt]
|[http://www.unlweb.net/resources/geneva2012/relation2_unl.txt relation2_unl.txt]
+
|[http://www.unlweb.net/resources/corpus500/relation2_unl.txt relation2_unl.txt]
 
|-
 
|-
 
|8
 
|8
 
|three-relation entries
 
|three-relation entries
|[http://www.unlweb.net/resources/geneva2012/relation3_org.txt relation3_org.txt]
+
|[http://www.unlweb.net/resources/corpus500/relation3_org.txt relation3_org.txt]
|[http://www.unlweb.net/resources/geneva2012/relation3_unl.txt relation3_unl.txt]
+
|[http://www.unlweb.net/resources/corpus500/relation3_unl.txt relation3_unl.txt]
 
|-
 
|-
 
|9
 
|9
 
|four-relation entries
 
|four-relation entries
|[http://www.unlweb.net/resources/geneva2012/relation4_org.txt relation4_org.txt]
+
|[http://www.unlweb.net/resources/corpus500/relation4_org.txt relation4_org.txt]
|[http://www.unlweb.net/resources/geneva2012/relation4_unl.txt relation4_unl.txt]
+
|[http://www.unlweb.net/resources/corpus500/relation4_unl.txt relation4_unl.txt]
 
|-
 
|-
 
|10
 
|10
 
|five-relation entries
 
|five-relation entries
|[http://www.unlweb.net/resources/geneva2012/relation5_org.txt relation5_org.txt]
+
|[http://www.unlweb.net/resources/corpus500/relation5_org.txt relation5_org.txt]
|[http://www.unlweb.net/resources/geneva2012/relation5_unl.txt relation5_unl.txt]
+
|[http://www.unlweb.net/resources/corpus500/relation5_unl.txt relation5_unl.txt]
 
|-
 
|-
 
|11
 
|11
 
|six-relation entries
 
|six-relation entries
|[http://www.unlweb.net/resources/geneva2012/relation6_org.txt relation6_org.txt]
+
|[http://www.unlweb.net/resources/corpus500/relation6_org.txt relation6_org.txt]
|[http://www.unlweb.net/resources/geneva2012/relation6_unl.txt relation6_unl.txt]
+
|[http://www.unlweb.net/resources/corpus500/relation6_unl.txt relation6_unl.txt]
 
|-
 
|-
 
|12
 
|12
 
|numbers and numerals
 
|numbers and numerals
|[http://www.unlweb.net/resources/geneva2012/numbers_org.txt numbers_org.txt]
+
|[http://www.unlweb.net/resources/corpus500/numbers_org.txt numbers_org.txt]
|[http://www.unlweb.net/resources/geneva2012/numbers_unl.txt numbers_unl.txt]
+
|[http://www.unlweb.net/resources/corpus500/numbers_unl.txt numbers_unl.txt]
 
|-
 
|-
 
|13
 
|13
 
|expressions of time
 
|expressions of time
|[http://www.unlweb.net/resources/geneva2012/time_org.txt time_org.txt]
+
|[http://www.unlweb.net/resources/corpus500/time_org.txt time_org.txt]
|[http://www.unlweb.net/resources/geneva2012/time_unl.txt time_unl.txt]
+
|[http://www.unlweb.net/resources/corpus500/time_unl.txt time_unl.txt]
 
|-
 
|-
 
|14
 
|14
 
|relative clauses
 
|relative clauses
|[http://www.unlweb.net/resources/geneva2012/relatives_org.txt relatives_org.txt]
+
|[http://www.unlweb.net/resources/corpus500/relatives_org.txt relatives_org.txt]
|[http://www.unlweb.net/resources/geneva2012/relatives_unl.txt relatives_unl.txt]
+
|[http://www.unlweb.net/resources/corpus500/relatives_unl.txt relatives_unl.txt]
 
|-
 
|-
 
|15
 
|15
 
|special issues
 
|special issues
|[http://www.unlweb.net/resources/geneva2012/problems_org.txt problems_org.txt]
+
|[http://www.unlweb.net/resources/corpus500/problems_org.txt problems_org.txt]
|[http://www.unlweb.net/resources/geneva2012/problems_unl.txt problems_unl.txt]
+
|[http://www.unlweb.net/resources/corpus500/problems_unl.txt problems_unl.txt]
 
|}
 
|}
 
*The whole corpus in one single file
 
*The whole corpus in one single file
**[http://www.unlweb.net/resources/geneva2012/corpus_eng.txt Corpus500 in English], experimental corpus in English (500 sentences), to be manually translated to the target languages, in order to be used as the input for [[IAN]]
+
**[http://www.unlweb.net/resources/corpus500/corpus500_eng.txt Corpus500 in English], experimental corpus in English (500 sentences), to be manually translated to the target languages, in order to be used as the input for [[IAN]]
**[http://www.unlweb.net/resources/geneva2012/corpus_unl.txt Corpus500 in UNL], experimental corpus in UNL (500 graphs), to be used as the input for [[EUGENE]]
+
**[http://www.unlweb.net/resources/corpus500/corpus500_unl.txt Corpus500 in UNL], experimental corpus in UNL (500 graphs), to be used as the input for [[EUGENE]]
 +
== Resources ==
 +
The following resources have been used to deal with Corpus 500 in English and may be used as a sample of what is to be provided
 +
*Analysis
 +
*[http://www.unlweb.net/resources/corpus500/eng_ana_dic.txt EN-UNL Dictionary] (English dictionary used for the UNLization of the Corpus500)
 +
*[http://www.unlweb.net/resources/corpus500/eng_ana_tgrammar.txt EN-UNL T-Grammar] (Transformation grammar used for the UNLization of the Corpus500)
 +
*[http://www.unlweb.net/resources/corpus500/eng_ana_tgrammar.txt EN-UNL D-Grammar] (Disambiguation grammar used for the UNLization of the Corpus500)
 +
*Generation
 +
*[http://www.unlweb.net/resources/corpus500/eng_gen_dic.txt UNL-EN Dictionary] (English dictionary used for the NLization of the Corpus500)
 +
*[http://www.unlweb.net/resources/corpus500/eng_gen_tgrammar.txt UNL-EN T-Grammar] (Transformation grammar used for the NLization of the Corpus500)
 +
*[http://www.unlweb.net/resources/corpus500/eng_gen_dgrammar.txt UNL-EN D-Grammar] (Disambiguation grammar used for the NLization of the Corpus500)

Revision as of 20:34, 23 July 2012

The Corpus500 is an experimental corpus used to prepare the initial versions of the grammar for sentence-based UNLization and NLization, using IAN and EUGENE, respectively. It comprises a list of 500 sentences in English and their corresponding graphs in UNL, and is supposed to cover very basic linguistic phenomena.

The corpus500

  • Corpus 500 according to the complexity of the graphs
Corpus
Order Description Analysis (English original) Generation (UNL)
1 Temporary entries temp_org.txt temp_unl.txt
2 Entries with no attribute or relation attribute0_org.txt attribute0_unl.txt
3 one-attribute entries attribute1_org.txt attribute1_unl.txt
4 two-attribute entries attribute2_org.txt attribute2_unl.txt
5 three-attribute entries attribute3_org.txt attribute3_unl.txt
6 one-relation entries relation1_org.txt relation1_unl.txt
7 two-relation entries relation2_org.txt relation2_unl.txt
8 three-relation entries relation3_org.txt relation3_unl.txt
9 four-relation entries relation4_org.txt relation4_unl.txt
10 five-relation entries relation5_org.txt relation5_unl.txt
11 six-relation entries relation6_org.txt relation6_unl.txt
12 numbers and numerals numbers_org.txt numbers_unl.txt
13 expressions of time time_org.txt time_unl.txt
14 relative clauses relatives_org.txt relatives_unl.txt
15 special issues problems_org.txt problems_unl.txt
  • The whole corpus in one single file
    • Corpus500 in English, experimental corpus in English (500 sentences), to be manually translated to the target languages, in order to be used as the input for IAN
    • Corpus500 in UNL, experimental corpus in UNL (500 graphs), to be used as the input for EUGENE

Resources

The following resources have been used to deal with Corpus 500 in English and may be used as a sample of what is to be provided

  • Analysis
  • EN-UNL Dictionary (English dictionary used for the UNLization of the Corpus500)
  • EN-UNL T-Grammar (Transformation grammar used for the UNLization of the Corpus500)
  • EN-UNL D-Grammar (Disambiguation grammar used for the UNLization of the Corpus500)
  • Generation
  • UNL-EN Dictionary (English dictionary used for the NLization of the Corpus500)
  • UNL-EN T-Grammar (Transformation grammar used for the NLization of the Corpus500)
  • UNL-EN D-Grammar (Disambiguation grammar used for the NLization of the Corpus500)
Software