UNL System
(→Modules) |
(→UNLization) |
||
Line 6: | Line 6: | ||
UNLization is the process of representing the content of a natural language structure into UNL. It may involve the following modules: | UNLization is the process of representing the content of a natural language structure into UNL. It may involve the following modules: | ||
[[File:UNLization.png|thumb|left|200px]] | [[File:UNLization.png|thumb|left|200px]] | ||
− | + | The input for the UNLization is a natural language document or set of documents. The input document is first segmented, according to the boundaries defined in the [[NL-UNL T-Grammar]]. Next, each processing unit is [[tokenization|tokenized]], according to the [[NL-UNL Dictionary]] and the [[NL-UNL Memory]]. The tokenization process may be controlled by the [[NL-UNL D-Grammar]]. The resulting tokenized string is then syntactically and semantically analyzed with the [[NL-UNL T-Grammar]], which may be improved by the [[NL-UNL D-Grammar]], the [[NL Memory]], the [[UNL Memory]], the [[UNL Knowledge Base]] and the [[UNL Ontology]]. The output of the process is a UNL document, which may reflect either the deep or the surface structure of the text, of each sentence or of each word of the natural language input document, depending on the structure of the grammar. | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
== NLization == | == NLization == |
Revision as of 14:06, 24 September 2012
The UNL System is the set of basic modules of UNL and the rules to combine them.
Contents |
UNLization
Main article: UNLization
UNLization is the process of representing the content of a natural language structure into UNL. It may involve the following modules:
The input for the UNLization is a natural language document or set of documents. The input document is first segmented, according to the boundaries defined in the NL-UNL T-Grammar. Next, each processing unit is tokenized, according to the NL-UNL Dictionary and the NL-UNL Memory. The tokenization process may be controlled by the NL-UNL D-Grammar. The resulting tokenized string is then syntactically and semantically analyzed with the NL-UNL T-Grammar, which may be improved by the NL-UNL D-Grammar, the NL Memory, the UNL Memory, the UNL Knowledge Base and the UNL Ontology. The output of the process is a UNL document, which may reflect either the deep or the surface structure of the text, of each sentence or of each word of the natural language input document, depending on the structure of the grammar.
NLization
Main article: NLization
NLization is the process of generating a natural language document out of UNL. It may involve the following modules:
- The UNL Ontology
- The UNL Knowledge Base
- The UNL Memory
- The NL Memory
- The UNL-NL Dictionary
- The UNL-NL Memory
- The UNL-NL Transformation Grammar
- The UNL-NL Disambiguation Grammar
- A NLizer
Modules
- Lexical Databases
- The UNL Ontology, the UNL Knowledge Base and the UNL Memory contain semantic relations between UW's along with a degree of necessity, i.e., the possibility of occurrence. These repositories include ontological relations ("a kind of" and "an instance of"), in case of UNL Ontology; necessary (ontological and thematic) relations (such as "is the agent of", "is the place where", "is the moment when"), in case of UNL Knowledge Base; and any typical relations, in case of the UNL Memory.
- The UNL-NL Dictionary and the NL-UNL Dictionary contain mappings between UW's and natural language entries. They are bilingual lexica where UW's are translated into lexical realisation units of a given natural language.
- The UNL-NL Memory and the NL-UNL Memory contain mappings between UNL and natural language entries.
- The NL Memory contains syntactic relations between natural language entries
- Grammars
- The UNL-NL D-Grammar and the UNL-NL T-Grammar are the set of rules used for natural language generation (out of UNL). The NL-UNL D-Grammar and the NL-UNL T-Grammar are the set of rules used for natural language analysis (to UNL). The T-Grammar is the repository of the transformation rules, which are used to manipulate data structures; the D-Grammar is the repository of disambiguation rules, which are used to control the use of transformation rules.
Specs
- Tagset
- The Tagset is a harmonized system for representing linguistic constants.
- Dictionary Specs
- The Dictionary Specs state the syntax of dictionaries.
- Grammar Specs
- The Grammar Specs state the syntax of grammar rules.
- KB Specs
- The UNL Knowledge Base Specs state the syntax of knowledge bases.
- Memory Specs
- The Memory Specs state the syntax of example bases.