VIII UNL School

From UNL Wiki

(Difference between revisions)

Revision as of 14:09, 13 February 2012

Reference corpus in English (500 sentences), to be manually translated to the target languages, in order to be used as the input for IAN
Reference corpus in UNL (500 graphs), to be used as the input for EUGENE
Reference corpus according to the complexity of the graphs (the same as above, but split in different files)

Corpus
Order	Description	Analysis (English original)	Generation (UNL)	Word list (English original)
1	Temporary entries	temp_org.txt	temp_unl.txt	temp_dic.txt
2	Entries with no attribute or relation	attribute0_org.txt	attribute0_unl.txt	attribute0_dic.txt
3	one-attribute entries	attribute1_org.txt	attribute1_unl.txt	attribute1_dic.txt
4	two-attribute entries	attribute2_org.txt	attribute2_unl.txt	attribute2_dic.txt
5	three-attribute entries	attribute3_org.txt	attribute3_unl.txt	attribute3_dic.txt
6	one-relation entries	relation1_org.txt	relation1_unl.txt	relation1_dic.txt
7	two-relation entries	relation2_org.txt	relation2_unl.txt	relation2_dic.txt
8	three-relation entries	relation3_org.txt	relation3_unl.txt	relation3_dic.txt
9	four-relation entries	relation4_org.txt	relation4_unl.txt	relation4_dic.txt
10	five-relation entries	relation5_org.txt	relation5_unl.txt	relation5_dic.txt
11	six-relation entries	relation6_org.txt	relation6_unl.txt	relation6_dic.txt
12	numbers and numerals	numbers_org.txt	numbers_unl.txt	numbers_dic.txt
13	expressions of time	time_org.txt	time_unl.txt	time_dic.txt
14	relative clauses	relatives_org.txt	relatives_unl.txt	relatives_dic.txt
15	special issues	problems_org.txt	problems_unl.txt	problems_dic.txt

The manual translated version of the 500 sentences of the reference corpus (corpus_LID.txt)
The analysis dictionary used to analyze those 500 sentences (ana_dic_LID.txt)
The analysis grammar used to analyze those 500 sentences (ana_gra_LID.txt)
The analysis disambiguation grammar, if any, used to analyze those 500 sentences (ana_dis_LID.txt)
The UNL output for those 500 sentences generated from the dictionary and grammars above (ana_out_LID.txt)

The generation dictionary used to generate the reference corpus onto natural language (gen_dic_LID.txt)
The generation grammar, including inflectional paradigms, used to generate the reference corpus onto natural language (gen_gra_LID.txt)
The generation disambiguation grammar used to generate the reference corpus onto natural language (gen_dis_LID.txt)
The natural language output generated from the dictionary and grammars above (gen_out_LID.txt)

LID is to be replaced by the ISO639-2 two-character code of the language (en = English, el = Greek, etc.)

Feb 06th, 2012 - Monday: 09:00-10:00 Introduction; 10:00-12:00 I – Corpus; 14:00-17:00 II – UNL-NL dictionary
Feb 07th, 2012 - Tuesday: 09:00-12:00 III – Morphology (inflectional paradigms); 14:00-17:00 IV – NL dictionary
Feb 08th, 2012- Wednesday: 09:00-12:00 V – UNL-NL grammar (I); 14:00-17:00 V – UNL-NL grammar (II)
Feb 09th, 2012 - Thursday: 09:00-12:00 VI – NL-UNL grammar (I); 14:00-17:00 VI – NL-UNL grammar (II)
Feb 10th, 2012 - Friday: 09:00-12:00 Evaluation; 14:00-17:00 Discussion

@@ Line 118: / Line 118: @@
 *The generation disambiguation grammar used to generate the reference corpus onto natural language (gen_dis_LID.txt)
 *The natural language output generated from the dictionary and grammars above (gen_out_LID.txt)
-:'''LID''' is to be replaced by the ISO639-2 two-character code of the language (en = English, el = Greek, etc)
+:'''LID''' is to be replaced by the ISO639-2 two-character code of the language (en = English, el = Greek, etc.)
 == Presentations ==