XII UNL School

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(IMPORTANT DATES)
(IMPORTANT DATES)
 
(65 intermediate revisions by one user not shown)
Line 1: Line 1:
The UNDL Foundation invites applications for the XII UNL School, to take place in Geneva, Switzerland, from June 17th to 21th, 2013. This is an intermediate-level workshop dedicated to the improvement of grammatical resources already existing in the UNL framework. The UNDL Foundation will pay the travel and accommodation expenses for the selected candidates not living in Geneva.
+
The UNDL Foundation invites applications for the XII UNL School, to take place in Geneva, Switzerland, from July 1st to 5th, 2013. This is an intermediate-level workshop dedicated to the improvement of grammatical resources already existing in the UNL framework. The UNDL Foundation will pay the travel and accommodation expenses for the selected candidates not living in Geneva.
  
 
== IMPORTANT DATES ==
 
== IMPORTANT DATES ==
  
*12/05/2013: Deadline for the applications
+
*<strike>12/05/2013: Deadline for the applications</strike>
*20/05/2013: Notification of accepted candidates
+
*<strike>20/05/2013: Notification of accepted candidates</strike>
*1-5/07/2013: XII UNL School
+
*<strike>1-5/07/2013: XII UNL School</strike>
 +
*30/09/2012: Deadline for the post-workshop tasks
  
== REQUISITES ==
+
== GOALS ==
 +
#To compile the corpus NC-A1
 +
#To prepare the basic modules for the UNLization of the corpus NC-A1
 +
#To prepare the basic modules for the NLization of the corpus NC-A1
  
The UNDL Foundation will only consider applications complying strictly with the three requisites below:
+
== PROGRAM ==
*Candidates must have successfully completed the grammars to UNL-ize and NL-ize the 400 sentences of corpora [[UC-A]] (= UCA1+UCA2).
+
*1/07/2013: Normalization Grammar
*Candidates must have completed CLEA250, CLEA500 and CLEA750, available at [[VALERIE]]; and
+
*2/07/2013: Closed-Class Dictionary
*Candidates must have an university degree in Linguistics, Computer Science or related field.
+
*3/07/2013: Open-Class Word List
 +
*4/07/2013: NC-A1
 +
*5/07/2013: Evaluation and discussion
  
== APPLICATION ==
+
== MATERIAL ==
 +
*1/07/2013
 +
**[http://www.unlweb.net/school/geneva2013/day1.pdf Presentation]
 +
**[http://www.unlweb.net/school/geneva2013/exercise1.txt Exercise #1] (text to be normalized)
 +
**[http://www.unlweb.net/school/geneva2013/normalization_eng.txt Exercise #2] (normalization grammar for English)
 +
*2/07/2013
 +
**[http://www.unlweb.net/school/geneva2013/day2.pdf Presentation]
 +
**[http://www.unlweb.net/school/geneva2013/eng_dic.txt Exercise #3] (English Closed-Class Dictionary)
 +
*3/07/2013
 +
**[http://www.unlweb.net/school/geneva2013/day3.pdf Presentation]
 +
**[[#WORD FORMS| Exercise #4]] (Open-Class Word List)
 +
*4/07/2013
 +
**[http://www.unlweb.net/school/geneva2013/day4.pdf Presentation]
 +
*5/07/2013
 +
**[http://www.unlweb.net/school/geneva2013/day5.pdf Presentation]
  
In order to apply, candidates must fill in the form available at [http://www.unlweb.net/school/registration ] before 23:59:59 (UTC) of May 12th, 2013.
+
== PARTICIPANTS ==
 
+
*Kim Sokphyrum (Khmer)
== SELECTION ==
+
*Marwa Saber (Arabic)
 
+
*Muhammad Zulhelmy Bin Mohd Rosman (Malay)
The UNDL Foundation will select 10 candidates, one per language, according to the best F-measure (weighted average of the precision and recall) of the analysis and generation modules. In case two or more candidates provide modules equally good in terms of analysis and generation, the selection process will consider, in this order:
+
*Ofelia Hovhannisyan (Armenian)
#Previous participation in any UNL School;
+
*Parameswarappa S (Kannada)
#Strongest experience (in terms of UNLdots) in the UNLweb;
+
*Parteek Kumar (Panjabi)
#Strongest experience in natural language processing; and
+
*Ronaldo Martins (UNL)
#Highest academic degree.
+
*Sameh Alansary (Arabic)
 +
*Serhii Prots (Ukrainian)
 +
*Suos Samak (Khmer)
 +
*Teng Wei Min (Chinese)
 +
*Yordanka Stancheva (Bulgarian)
  
 
== VENUE ==
 
== VENUE ==
 +
UNDL Foundation Office, Geneva
  
UNDL Foundation, Geneva, Switzerland
+
== POST-WORKSHOP TASKS ==
 
+
Deadline = 30/09/2013
== SUPPORT ==
+
*Open-Class Word List (3,000 word forms)
 
+
*Corpus NC-A1
The UNDL Foundation will pay the following expenses for the selected candidates not living in Geneva.:
+
**Original corpus: 5-10 original articles from the Wikipedia about culture-specific subjects (minimum of 5,000 words), in separate files, in plain text format with UTF-8 encoding
*a round-trip plane, bus or train ticket from/to Geneva in economic class
+
**List of at least 1,000 noun phrases appearing in the corpus with the following characteristics:
*6 (six) nights at Ibis Budget Genève Petit-Lancy (June 16-21)
+
***the length of the NP must be equal or greater than 2 words (one-word NP's must be excluded): <strike>Geneva</strike>
*CHF300.00 (three hundred Swiss francs) for other general expenses
+
***NP's must not contain foreign words: <strike>the city of Genève</strike> (note that "the city of Geneva" is OK)
 
+
***NP's must be continuous (there cannot be any extra-content, e.g., parentheses, inside the NP): <strike>the second most populous city in Switzerland (after Zurich)</strike> (note that the NP will be "the second most populous city in Switzerland")
== CERTIFICATION ==
+
***NP's must not contain verbs, even when used as nouns, adjectives or adverbs: <strike>French-'''speaking''' part of Switzerland</strike>, <strike>numerous international organizations, '''including''' the headquarters of many of the agencies of the United Nations and the Red Cross</strike> (in the latter case, there will be 2 NP's: "numerous international organizations" and "the headquarters... Red Cross")
 +
***NP's must be original (no change should be made to the original text from the Wikipedia)
 +
***NP's must ignore nesting (only the longest NP must be considered): "the headquarters of many of the agencies of the United Nations and the Red Cross" must be treated as a single NP (the inner NP's, such as "the agencies of the United Nations and the Red Cross" must not be extracted from the longer NP)
 +
***NP's must be unique (repetitions must be ignored)
 +
***NP's must be provided one per line in a plain text file, with UTF-8 encoding.
 +
The completion of the post-workshop tasks is not mandatory but any intermediate-level workshop will only accept candidates having finished all A1 activities described in [[FoR-UNL]].
  
The UNDL Foundation will issue a Certificate of Participation, upon evaluation, for all the participants.
+
== FOLLOW-UP ==
 +
The following projects will be open upon the accomplishment of the post-workshop tasks
 +
*BRUNO-A1 (open only for languages where number of subcategorization frames (all languages) > 15 and number of paradigms (inflectional languages) > 15): 2,000 entries (around 4,000 UNLdots)
 +
*NC-A1: 1,000 entries (3,000 UNLdots)
  
== THE UNL AND THE UNDL FOUNDATION ==
+
== ADDITIONAL MATERIAL ==
 +
=== Open Class Word List ===
 +
Extracted from the most frequent words in Wikipedia
  
The UNDL Foundation is a non-profit organization based in Geneva, Switzerland, which has received, from the United Nations, the mandate for implementing the Universal Networking Language (UNL). The UNL is an artificial language that has been used for several different tasks in natural language engineering, such as machine translation, multilingual document generation, summarization, information retrieval and semantic reasoning. It has been, since 1996, a unique initiative to reduce language barriers and strengthen cross-cultural communication in the framework of the UN.
+
{|table border=1 cellpadding=5
 +
!Language
 +
!File
 +
|-
 +
|Arabic
 +
|[http://www.unlweb.net/school/geneva2013/ar_words.xls ar_words.xls]
 +
|-
 +
|Armenian
 +
|[http://www.unlweb.net/school/geneva2013/hy_words.xls hy_words.xls]
 +
|-
 +
|Bulgarian
 +
|[http://www.unlweb.net/school/geneva2013/bg_words.xls bg_words.xls]
 +
|-
 +
|Chinese
 +
|[http://www.unlweb.net/school/geneva2013/zh_words.xls zh_words.xls]
 +
|-
 +
|Kannada
 +
|[http://www.unlweb.net/school/geneva2013/kn_words.xls kn_words.xls]
 +
|-
 +
|Khmer
 +
|[http://www.unlweb.net/school/geneva2013/km_words.xls km_words.xls]
 +
|-
 +
|Malay
 +
|[http://www.unlweb.net/school/geneva2013/ms_words.xls ms_words.xls]
 +
|-
 +
|Punjabi
 +
|[http://www.unlweb.net/school/geneva2013/pa_words.xls pa_words.xls]
 +
|-
 +
|Ukrainian
 +
|[http://www.unlweb.net/school/geneva2013/uk_words.xls uk_words.xls]
 +
|}
  
== FURTHER INFORMATION ==
+
=== NP Examples ===
 +
{|table border=1 cellpadding=5
 +
!width="50%"|original text
 +
!width="50%"|NP
 +
|-
 +
|Geneva is the second most populous city in Switzerland (after Zurich) and is the most populous city of Romandy, the French-speaking part of Switzerland. Situated where the Rhone exits Lake Geneva, it is the capital of the Republic and Canton of Geneva. The municipality (ville de Genève) has a population (as of March 2013) of 194,245, and the canton (République et Canton de Genève, which includes the city) has 472,530 residents. In 2007, the urban area, or agglomération franco-valdo-genevoise (Great Geneva or Grand Genève in French) had 1,240,000 inhabitants in 189 municipalities in both Switzerland and France.
 +
|<strike>Geneva</strike> (length = 1)<br />
 +
the second most populous city in Switzerland<br />
 +
<strike>Zurich</strike> (length = 1)<br />
 +
the most populous city of Romandy<br />
 +
<strike>the French-speaking part of Switzerland</strike> (verb)<br />
 +
<strike>Switzerland</strike> (length = 1)<br />
 +
the Rhone<br />
 +
Lake Geneva<br />
 +
the capital of the Republic and Canton of Geneva<br />
 +
The municipality<br />
 +
<strike>ville de Genève</strike> (foreign language)<br />
 +
a population<br />
 +
the canton<br />
 +
<strike>République et Canton de Genève</strike> (foreign language)<br />
 +
the city<br />
 +
472,530 residents<br />
 +
the urban area<br />
 +
<strike>agglomération franco-valdo-genevoise</strike> (foreign language)<br />
 +
<strike>Great Geneva or Grand Genève in French</strike> (foreign language)<br />
 +
1,240,000 inhabitants in 189 municipalities in both Switzerland and France<br />
 +
|}
  
For further information, please contact:
+
=== SSS Examples ===
 +
{|table border=1 cellpadding=5
 +
!sentence
 +
!SSS
 +
|-
 +
|book
 +
|NH(book)
 +
|-
 +
|the book
 +
|NS(book;the)
 +
|-
 +
|beautiful book
 +
|NA(book;beautiful)
 +
|-
 +
|book of John
 +
|NA(book;:01)<br/>PC:01(of;John)
 +
|-
 +
|the book of John
 +
|NS(book;the)<br/>NA(book;:01)<br/>PC:01(of;John)
 +
|-
 +
|the beautiful book of John
 +
|NS(book;the)<br/>NA(book;beautiful)<br/>NA(book;:01)<br/>PC:01(of;John)
 +
|-
 +
|the book of Math of John
 +
|NS(book;the)<br/>NA(book;:01)<br/>PC:01(of;Math)<br/>NA(book;:02)<br />PC:02(of;John)
 +
|-
 +
|the book about the construction of Babel
 +
|NS(book;the)<br/>NA(book;:01)<br/>PC:01(about;:02)<br/>NS:02(construction;the)<br/>NA:02(construction;:03)<br/>PC:03(of;Babel)
 +
|}
  
Ronaldo Martins, PhD<br />
+
=== UNL Simplified Examples ===
Language Resources Manager<br />
+
{|table border=1 cellpadding=5
UNDL Foundation<br />
+
!sentence
r.martins@undlfoundation.org
+
!UNL
 +
|-
 +
|book
 +
|book
 +
|-
 +
|the book
 +
|book.@def
 +
|-
 +
|beautiful book
 +
|mod(book;beautiful)
 +
|-
 +
|book of John
 +
|pos(book;John)
 +
|-
 +
|the book of John
 +
|pos(book.@def;John)
 +
|-
 +
|the beautiful book of John
 +
|mod(book.@def;beautiful)<br />pos(book.@def;John)
 +
|-
 +
|the book of Math of John
 +
|cnt(book.@def;Math)<br />pos(book.@def;John)
 +
|-
 +
|the book about the construction of Babel
 +
|cnt(book.@def;:01)<br />obj(construction.@def;Babel)
 +
|}

Latest revision as of 19:01, 15 July 2013

The UNDL Foundation invites applications for the XII UNL School, to take place in Geneva, Switzerland, from July 1st to 5th, 2013. This is an intermediate-level workshop dedicated to the improvement of grammatical resources already existing in the UNL framework. The UNDL Foundation will pay the travel and accommodation expenses for the selected candidates not living in Geneva.

Contents

IMPORTANT DATES

  • 12/05/2013: Deadline for the applications
  • 20/05/2013: Notification of accepted candidates
  • 1-5/07/2013: XII UNL School
  • 30/09/2012: Deadline for the post-workshop tasks

GOALS

  1. To compile the corpus NC-A1
  2. To prepare the basic modules for the UNLization of the corpus NC-A1
  3. To prepare the basic modules for the NLization of the corpus NC-A1

PROGRAM

  • 1/07/2013: Normalization Grammar
  • 2/07/2013: Closed-Class Dictionary
  • 3/07/2013: Open-Class Word List
  • 4/07/2013: NC-A1
  • 5/07/2013: Evaluation and discussion

MATERIAL

PARTICIPANTS

  • Kim Sokphyrum (Khmer)
  • Marwa Saber (Arabic)
  • Muhammad Zulhelmy Bin Mohd Rosman (Malay)
  • Ofelia Hovhannisyan (Armenian)
  • Parameswarappa S (Kannada)
  • Parteek Kumar (Panjabi)
  • Ronaldo Martins (UNL)
  • Sameh Alansary (Arabic)
  • Serhii Prots (Ukrainian)
  • Suos Samak (Khmer)
  • Teng Wei Min (Chinese)
  • Yordanka Stancheva (Bulgarian)

VENUE

UNDL Foundation Office, Geneva

POST-WORKSHOP TASKS

Deadline = 30/09/2013

  • Open-Class Word List (3,000 word forms)
  • Corpus NC-A1
    • Original corpus: 5-10 original articles from the Wikipedia about culture-specific subjects (minimum of 5,000 words), in separate files, in plain text format with UTF-8 encoding
    • List of at least 1,000 noun phrases appearing in the corpus with the following characteristics:
      • the length of the NP must be equal or greater than 2 words (one-word NP's must be excluded): Geneva
      • NP's must not contain foreign words: the city of Genève (note that "the city of Geneva" is OK)
      • NP's must be continuous (there cannot be any extra-content, e.g., parentheses, inside the NP): the second most populous city in Switzerland (after Zurich) (note that the NP will be "the second most populous city in Switzerland")
      • NP's must not contain verbs, even when used as nouns, adjectives or adverbs: French-speaking part of Switzerland, numerous international organizations, including the headquarters of many of the agencies of the United Nations and the Red Cross (in the latter case, there will be 2 NP's: "numerous international organizations" and "the headquarters... Red Cross")
      • NP's must be original (no change should be made to the original text from the Wikipedia)
      • NP's must ignore nesting (only the longest NP must be considered): "the headquarters of many of the agencies of the United Nations and the Red Cross" must be treated as a single NP (the inner NP's, such as "the agencies of the United Nations and the Red Cross" must not be extracted from the longer NP)
      • NP's must be unique (repetitions must be ignored)
      • NP's must be provided one per line in a plain text file, with UTF-8 encoding.

The completion of the post-workshop tasks is not mandatory but any intermediate-level workshop will only accept candidates having finished all A1 activities described in FoR-UNL.

FOLLOW-UP

The following projects will be open upon the accomplishment of the post-workshop tasks

  • BRUNO-A1 (open only for languages where number of subcategorization frames (all languages) > 15 and number of paradigms (inflectional languages) > 15): 2,000 entries (around 4,000 UNLdots)
  • NC-A1: 1,000 entries (3,000 UNLdots)

ADDITIONAL MATERIAL

Open Class Word List

Extracted from the most frequent words in Wikipedia

Language File
Arabic ar_words.xls
Armenian hy_words.xls
Bulgarian bg_words.xls
Chinese zh_words.xls
Kannada kn_words.xls
Khmer km_words.xls
Malay ms_words.xls
Punjabi pa_words.xls
Ukrainian uk_words.xls

NP Examples

original text NP
Geneva is the second most populous city in Switzerland (after Zurich) and is the most populous city of Romandy, the French-speaking part of Switzerland. Situated where the Rhone exits Lake Geneva, it is the capital of the Republic and Canton of Geneva. The municipality (ville de Genève) has a population (as of March 2013) of 194,245, and the canton (République et Canton de Genève, which includes the city) has 472,530 residents. In 2007, the urban area, or agglomération franco-valdo-genevoise (Great Geneva or Grand Genève in French) had 1,240,000 inhabitants in 189 municipalities in both Switzerland and France. Geneva (length = 1)

the second most populous city in Switzerland
Zurich (length = 1)
the most populous city of Romandy
the French-speaking part of Switzerland (verb)
Switzerland (length = 1)
the Rhone
Lake Geneva
the capital of the Republic and Canton of Geneva
The municipality
ville de Genève (foreign language)
a population
the canton
République et Canton de Genève (foreign language)
the city
472,530 residents
the urban area
agglomération franco-valdo-genevoise (foreign language)
Great Geneva or Grand Genève in French (foreign language)
1,240,000 inhabitants in 189 municipalities in both Switzerland and France

SSS Examples

sentence SSS
book NH(book)
the book NS(book;the)
beautiful book NA(book;beautiful)
book of John NA(book;:01)
PC:01(of;John)
the book of John NS(book;the)
NA(book;:01)
PC:01(of;John)
the beautiful book of John NS(book;the)
NA(book;beautiful)
NA(book;:01)
PC:01(of;John)
the book of Math of John NS(book;the)
NA(book;:01)
PC:01(of;Math)
NA(book;:02)
PC:02(of;John)
the book about the construction of Babel NS(book;the)
NA(book;:01)
PC:01(about;:02)
NS:02(construction;the)
NA:02(construction;:03)
PC:03(of;Babel)

UNL Simplified Examples

sentence UNL
book book
the book book.@def
beautiful book mod(book;beautiful)
book of John pos(book;John)
the book of John pos(book.@def;John)
the beautiful book of John mod(book.@def;beautiful)
pos(book.@def;John)
the book of Math of John cnt(book.@def;Math)
pos(book.@def;John)
the book about the construction of Babel cnt(book.@def;:01)
obj(construction.@def;Babel)
Software