|
|
(113 intermediate revisions by one user not shown) |
Line 1: |
Line 1: |
− | The Corpus<sup>500</sup> is an experimental corpus used to prepare the initial versions of the grammar for sentence-based [[UNLization]] and [[NLization]], using [[IAN]] and [[EUGENE]], respectively. It comprises a list of 500 sentences in English and their corresponding graphs in UNL, and is supposed to cover very basic linguistic phenomena. | + | The UC-A1 is an experimental corpus used to prepare the initial versions of the grammar for sentence-based [[UNLization]] and [[NLization]], using [[IAN]] and [[EUGENE]], respectively. It comprises a list of 50 structures in UNL, and is supposed to cover very basic linguistic phenomena. |
| | | |
− | == The corpus<sup>500</sup> == | + | == The corpus == |
− | | + | The corpus UCA1 was extracted from a simplified and translated version of "The Hare and the Tortoise", by Aesop. |
− | *The whole corpus in one single file
| + | *[http://www.unlweb.net/resources/UCA1/uca1_eng.txt UC-A1 in English], to be translated (manually) in order to be used as the input for the UNLization process (with [[IAN]]) |
− | **[http://www.unlweb.net/resources/geneva2012/corpus_eng.txt Corpus500 in English], experimental corpus in English (500 sentences), to be manually translated to the target languages, in order to be used as the input for [[IAN]]
| + | *[http://www.unlweb.net/resources/UCA1/uca1_unl.txt UC-A1 in UNL], to be used "as is", as the input for the NLization process (with [[EUGENE]]) |
− | **[http://www.unlweb.net/resources/geneva2012/corpus_unl.txt Corpus500 in UNL], experimental corpus in UNL (500 graphs), to be used as the input for [[EUGENE]]
| + | |
− | *Corpus 500 according to the complexity of the graphs (the same as above, but split in different files)
| + | |
− | {| border="1" cellpadding="2" align=center
| + | |
− | |+Corpus
| + | |
− | !Order
| + | |
− | !Description
| + | |
− | !Analysis (English original)
| + | |
− | !Generation (UNL)
| + | |
− | |-
| + | |
− | |0
| + | |
− | |Training Corpus (Corpus 50)
| + | |
− | |[http://www.unlweb.net/resources/mumbai2012/corpus50_eng.txt Corpus 50]
| + | |
− | |[http://www.unlweb.net/resources/mumbai2012/corpus50_unl.txt Corpus 50]
| + | |
− | |-
| + | |
− | |1
| + | |
− | |Temporary entries
| + | |
− | |[http://www.unlweb.net/resources/geneva2012/temp_org.txt temp_org.txt]
| + | |
− | |[http://www.unlweb.net/resources/geneva2012/temp_unl.txt temp_unl.txt]
| + | |
− | |-
| + | |
− | |2
| + | |
− | |Entries with no attribute or relation
| + | |
− | |[http://www.unlweb.net/resources/geneva2012/attribute0_org.txt attribute0_org.txt]
| + | |
− | |[http://www.unlweb.net/resources/geneva2012/attribute0_unl.txt attribute0_unl.txt]
| + | |
− | |-
| + | |
− | |3
| + | |
− | |one-attribute entries
| + | |
− | |[http://www.unlweb.net/resources/geneva2012/attribute1_org.txt attribute1_org.txt]
| + | |
− | |[http://www.unlweb.net/resources/geneva2012/attribute1_unl.txt attribute1_unl.txt]
| + | |
− | |-
| + | |
− | |4
| + | |
− | |two-attribute entries
| + | |
− | |[http://www.unlweb.net/resources/geneva2012/attribute2_org.txt attribute2_org.txt]
| + | |
− | |[http://www.unlweb.net/resources/geneva2012/attribute2_unl.txt attribute2_unl.txt]
| + | |
− | |-
| + | |
− | |5
| + | |
− | |three-attribute entries
| + | |
− | |[http://www.unlweb.net/resources/geneva2012/attribute3_org.txt attribute3_org.txt]
| + | |
− | |[http://www.unlweb.net/resources/geneva2012/attribute3_unl.txt attribute3_unl.txt]
| + | |
− | |-
| + | |
− | |6
| + | |
− | |one-relation entries
| + | |
− | |[http://www.unlweb.net/resources/geneva2012/relation1_org.txt relation1_org.txt]
| + | |
− | |[http://www.unlweb.net/resources/geneva2012/relation1_unl.txt relation1_unl.txt]
| + | |
− | |-
| + | |
− | |7
| + | |
− | |two-relation entries
| + | |
− | |[http://www.unlweb.net/resources/geneva2012/relation2_org.txt relation2_org.txt]
| + | |
− | |[http://www.unlweb.net/resources/geneva2012/relation2_unl.txt relation2_unl.txt]
| + | |
− | |-
| + | |
− | |8
| + | |
− | |three-relation entries
| + | |
− | |[http://www.unlweb.net/resources/geneva2012/relation3_org.txt relation3_org.txt]
| + | |
− | |[http://www.unlweb.net/resources/geneva2012/relation3_unl.txt relation3_unl.txt]
| + | |
− | |-
| + | |
− | |9
| + | |
− | |four-relation entries
| + | |
− | |[http://www.unlweb.net/resources/geneva2012/relation4_org.txt relation4_org.txt]
| + | |
− | |[http://www.unlweb.net/resources/geneva2012/relation4_unl.txt relation4_unl.txt]
| + | |
− | |-
| + | |
− | |10
| + | |
− | |five-relation entries
| + | |
− | |[http://www.unlweb.net/resources/geneva2012/relation5_org.txt relation5_org.txt]
| + | |
− | |[http://www.unlweb.net/resources/geneva2012/relation5_unl.txt relation5_unl.txt]
| + | |
− | |-
| + | |
− | |11
| + | |
− | |six-relation entries
| + | |
− | |[http://www.unlweb.net/resources/geneva2012/relation6_org.txt relation6_org.txt]
| + | |
− | |[http://www.unlweb.net/resources/geneva2012/relation6_unl.txt relation6_unl.txt]
| + | |
− | |-
| + | |
− | |12
| + | |
− | |numbers and numerals
| + | |
− | |[http://www.unlweb.net/resources/geneva2012/numbers.txt numbers_org.txt]
| + | |
− | |[http://www.unlweb.net/resources/geneva2012/numbers.txt numbers_unl.txt]
| + | |
− | |-
| + | |
− | |13
| + | |
− | |expressions of time
| + | |
− | |[http://www.unlweb.net/resources/geneva2012/time.txt time_org.txt]
| + | |
− | |[http://www.unlweb.net/resources/geneva2012/time.txt time_unl.txt]
| + | |
− | |-
| + | |
− | |14
| + | |
− | |relative clauses
| + | |
− | |[http://www.unlweb.net/resources/geneva2012/relatives.txt relatives_org.txt]
| + | |
− | |[http://www.unlweb.net/resources/geneva2012/relatives.txt relatives_unl.txt]
| + | |
− | |-
| + | |
− | |15
| + | |
− | |special issues
| + | |
− | |[http://www.unlweb.net/resources/geneva2012/problems.txt problems_org.txt]
| + | |
− | |[http://www.unlweb.net/resources/geneva2012/problems.txt problems_unl.txt]
| + | |
− | |}
| + | |
The UC-A1 is an experimental corpus used to prepare the initial versions of the grammar for sentence-based UNLization and NLization, using IAN and EUGENE, respectively. It comprises a list of 50 structures in UNL, and is supposed to cover very basic linguistic phenomena.
The corpus UCA1 was extracted from a simplified and translated version of "The Hare and the Tortoise", by Aesop.