RC-A1

From UNL Wiki

(Difference between revisions)

Revision as of 00:12, 23 July 2012

The Corpus⁵⁰⁰ is an experimental corpus used to prepare the initial versions of the grammar for sentence-based UNLization and NLization, using IAN and EUGENE, respectively. It comprises a list of 500 sentences in English and their corresponding graphs in UNL, and is supposed to cover very basic linguistic phenomena.

The corpus⁵⁰⁰

The whole corpus in one single file
- Corpus500 in English, experimental corpus in English (500 sentences), to be manually translated to the target languages, in order to be used as the input for IAN
- Corpus500 in UNL, experimental corpus in UNL (500 graphs), to be used as the input for EUGENE
Corpus 500 according to the complexity of the graphs (the same as above, but split in different files)

Corpus
Order	Description	Analysis (English original)	Generation (UNL)
0	Training Corpus (Corpus 50)	Corpus 50	Corpus 50
1	Temporary entries	temp_org.txt	temp_unl.txt
2	Entries with no attribute or relation	attribute0_org.txt	attribute0_unl.txt
3	one-attribute entries	attribute1_org.txt	attribute1_unl.txt
4	two-attribute entries	attribute2_org.txt	attribute2_unl.txt
5	three-attribute entries	attribute3_org.txt	attribute3_unl.txt
6	one-relation entries	relation1_org.txt	relation1_unl.txt
7	two-relation entries	relation2_org.txt	relation2_unl.txt
8	three-relation entries	relation3_org.txt	relation3_unl.txt
9	four-relation entries	relation4_org.txt	relation4_unl.txt
10	five-relation entries	relation5_org.txt	relation5_unl.txt
11	six-relation entries	relation6_org.txt	relation6_unl.txt
12	numbers and numerals	numbers_org.txt	numbers_unl.txt
13	expressions of time	time_org.txt	time_unl.txt
14	relative clauses	relatives_org.txt	relatives_unl.txt
15	special issues	problems_org.txt	problems_unl.txt

@@ Line 5: / Line 5: @@
 *The whole corpus in one single file
 **[http://www.unlweb.net/resources/geneva2012/corpus_eng.txt Corpus500 in English], experimental corpus in English (500 sentences), to be manually translated to the target languages, in order to be used as the input for IAN
-**[http://www.unlweb.net/resources/geneva2012/corpus_unl.txt Corpus500 in UNL], experimental corpus in UNL (500 sentences), to be used as the input for EUGENE
+**[http://www.unlweb.net/resources/geneva2012/corpus_unl.txt Corpus500 in UNL], experimental corpus in UNL (500 graphs), to be used as the input for EUGENE
 *Corpus 500 according to the complexity of the graphs (the same as above, but split in different files)
 {| border="1" cellpadding="2" align=center

RC-A1

Revision as of 00:12, 23 July 2012

The corpus⁵⁰⁰

Views

Personal tools

Search

UNL

Lingware

Software

UNL Program

Navigation

Toolbox

Print/export

RC-A1

Revision as of 00:12, 23 July 2012

The corpus500

Views

Personal tools

Search

UNL

Lingware

Software

UNL Program

Navigation

Toolbox

Print/export

The corpus⁵⁰⁰