EUGENE
From UNL Wiki
(Difference between revisions)
(→Requirements) |
(→Functioning) |
||
Line 13: | Line 13: | ||
*Segmentation, i.e., the division of the input document into a series of isolated graphs, which are processed one at a time | *Segmentation, i.e., the division of the input document into a series of isolated graphs, which are processed one at a time | ||
*[[Tokenization]], i.e., the identification of the tokens ([[UW]]s, [[relation]]s and [[attribute]]s) of each graph of the input document | *[[Tokenization]], i.e., the identification of the tokens ([[UW]]s, [[relation]]s and [[attribute]]s) of each graph of the input document | ||
− | *Transformation, i.e., the | + | *Transformation, i.e., the application of the transformation rules of the grammar over each tokenized graph in order to generate a natural language sentence |
Revision as of 23:42, 22 July 2012
EUGENE is a natural language generation system. It generates natural language sentences out of semantic networks represented in the UNL format. In its current release, it is a Java based web application available at the UNLdev.
Requirements
As a universal engine, EUGENE must be parameterized to the target languages with the following files, to be provided through EUGENE's interface:
- The input document in the UNL document structure, i.e., the universal semantic network to be generated in natural language
- The UNL-NL (generation) dictionary, i.e., a lexical database where UWs are mapped into natural language entries, along with the corresponding features, to be provided according to the UNL Dictionary Specs
- The UNL-NL (generation) transformation grammar, i.e., a set of of transformation rules used to convert the UNL graphs into natural langauge sentences, to be provided according to the UNL Grammar Specs
- The UNL-NL (generation) disambiguation grammar, i.e, a set of disambiguation rules used to improve the results of the tokenization and of the transformation
to be provided according to the UNL Grammar Specs, to be provided according to the UNL Grammar Specs
Functioning
EUGENE performs the three following movements over the input file:
- Segmentation, i.e., the division of the input document into a series of isolated graphs, which are processed one at a time
- Tokenization, i.e., the identification of the tokens (UWs, relations and attributes) of each graph of the input document
- Transformation, i.e., the application of the transformation rules of the grammar over each tokenized graph in order to generate a natural language sentence