Grammar

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
Line 1: Line 1:
 
In the UNL framework, a '''grammar''' is a set of rules that are used to generate UNL out of natural language, and natural language out of UNL. Along with the [[UNL-NL Dictionary|UNL<->NL dictionaries]], they constitute the basic resource for [[UNLization]] and [[NLization]].
 
In the UNL framework, a '''grammar''' is a set of rules that are used to generate UNL out of natural language, and natural language out of UNL. Along with the [[UNL-NL Dictionary|UNL<->NL dictionaries]], they constitute the basic resource for [[UNLization]] and [[NLization]].
  
== Networks, Trees and Lists ==
+
== Types of grammar ==
Natural language sentences and UNL graphs are supposed to convey the same amount of information in different structures: whereas the former arranges data as an ordered list of words, the latter organizes it as a network. In that sense, going from natural language into UNL and from UNL into natural language is ultimately a matter of transforming lists into networks and vice-versa.
+
In the UNL framework there are three types of grammar:
+
*[[N-Grammar]], or Normalization Grammar, is used to prepare the natural language input for processing.
The UNL framework assumes that such transformation can be carried out progressively, i.e., through a transitional data structure: the tree, which could be used as an interface between lists and networks. Accordingly, there are seven different types of rules (LL, TT, NN, LT, TL, TN, NT), as indicated below:
+
*[[T-Grammar]], or Transformation Grammar, is used to transform to transform natural language into UNL or UNL into natural language.
 
+
*[[D-Grammar]], or Disambiguation Grammar, is used to to improve the performance of transformation rules by constraining or forcing their applicability. The Disambiguation Rules follows the formalism:  
*'''ANALYSIS''' (NL-UNL)
+
**LL - List Processing (list-to-list)
+
**LT - Surface-Structure Formation (list-to-tree)
+
**TT - Syntactic Processing (tree-to-tree)
+
**TN - Deep-Structure Formation (tree-to-network)
+
**NN - Semantic Processing (network-to-network)
+
 
+
*'''GENERATION''' (UNL-NL)
+
**NN - Semantic Processing (network-to-network)
+
**NT - Deep-Structure Formation (network-to-tree)
+
**TT - Syntactic Processing (tree-to-tree)
+
**TL - Surface-Structure Formation (tree-to-list)
+
**LL - List Processing (list-to-list)
+
 
+
The '''NL original sentence''' is supposed to be preprocessed, by the LL rules, in order to become an ordered list. Next, the resulting '''list structure''' is parsed with the LT rules, so as to unveil its '''surface syntactic structure''', which is already a tree. The tree structure is further processed by the TT rules in order to expose its inner organization, the '''deep syntactic structure''', which is supposed to be more suitable to the semantic interpretation. Then, this deep syntactic structure is projected into a semantic network by the TN rules. The resultant '''semantic network''' is then post-edited by the NN rules in order to comply with UNL standards and generate the '''UNL Graph'''.
+
 
+
The reverse process is carried out during natural language generation. The '''UNL graph''' is preprocessed by the NN rules in order to become a more easily tractable semantic network. The resulting '''network structure''' is converted, by the NT rules, into a syntactic structure, which is still distant from the surface structure, as it is directly derived from the semantic arrangement. This '''deep syntactic structure''' is subsequently transformed into a '''surface syntactic structure''' by the TT rules. The surface syntactic structure undergoes many other changes according to the TL rules, which generate a NL-like '''list structure'''. This list structure is finally realized as a '''natural language sentence''' by the LL rules.
+
 
+
As sentences are complex structures that may contain nested or embedded phrases, both the analysis and the generation processes may be '''interleaved''' rather than pipelined. This means that the natural flow described above is only "normal" and not "necessary". During natural language generation, a LL rule may apply prior to a TT rule, or a NN rule may be applied after a TL rule. Rules are recursive and must be applied in the order defined in the grammar as long as their conditions are true, regardless of the state.
+
 
+
== Types of rules ==
+
''Main article: [[Grammar Specs]]''
+
 
+
In the UNL framework there are two basic types of rules:
+
*'''Transformation rules''', or [[T-rule]]s, are used to manipulate data structures, i.e., to transform lists into trees, trees into lists, trees into networks, networks into trees, etc. They follow the very general formalism
+
α:=β;
+
where the left side α is a condition statement, and the right side β is an action to be performed over α.  
+
*'''Disambiguation rules''', or [[D-rule]]s, are used to improve the performance of transformation rules by constraining or forcing their applicability. The Disambiguation Rules follows the formalism:  
+
α=P;
+
where the left side α is a statement and the right side P is an integer from 0 to 255 that indicates the probability of occurrence of α.
+
  
 
== Direction ==
 
== Direction ==
 
In the UNL framework, grammars are not bidirectional, although they share the same syntax:
 
In the UNL framework, grammars are not bidirectional, although they share the same syntax:
 +
*The '''N-Grammar''' constains the normalization rules for natural natural analysis
 
*The '''UNL-NL T-Grammar''' contains the transformation rules used for natural language generation
 
*The '''UNL-NL T-Grammar''' contains the transformation rules used for natural language generation
 
*The '''UNL-NL D-Grammar''' contains the disambiguation rules used for improving the results of the UNL-NL T-Grammar
 
*The '''UNL-NL D-Grammar''' contains the disambiguation rules used for improving the results of the UNL-NL T-Grammar

Revision as of 16:41, 31 May 2013

In the UNL framework, a grammar is a set of rules that are used to generate UNL out of natural language, and natural language out of UNL. Along with the UNL<->NL dictionaries, they constitute the basic resource for UNLization and NLization.

Contents

Types of grammar

In the UNL framework there are three types of grammar:

  • N-Grammar, or Normalization Grammar, is used to prepare the natural language input for processing.
  • T-Grammar, or Transformation Grammar, is used to transform to transform natural language into UNL or UNL into natural language.
  • D-Grammar, or Disambiguation Grammar, is used to to improve the performance of transformation rules by constraining or forcing their applicability. The Disambiguation Rules follows the formalism:

Direction

In the UNL framework, grammars are not bidirectional, although they share the same syntax:

  • The N-Grammar constains the normalization rules for natural natural analysis
  • The UNL-NL T-Grammar contains the transformation rules used for natural language generation
  • The UNL-NL D-Grammar contains the disambiguation rules used for improving the results of the UNL-NL T-Grammar
  • The NL-UNL T-Grammar contains the transformation rules used for natural language analysis
  • The NL-UNL D-Grammar contains the disambiguation rules used for tokenization and for improving the results of the NL-UNL T-Grammar

Units

In the UNL framework, grammars may target different processing units:

  • Text-driven grammars process the source document as a single unit (i.e., without any internal subdivision)
  • Sentence-driven grammars process each sentence or graph separately
  • Word-driven grammars process words in isolation

Text-driven grammars are normally used in summarization and simplification, when the rhetorical structure of the source document is important. Sentence-driven grammars are used mostly in translation, when the source document can be treated as a list of non-semantically related units, to be processed one at a time. Word-driven grammars are used in information retrieval and opinion mining, when each word or node can be treated in isolation.
All these grammars share the same type of rule.

Recall

Grammars may target the whole source document or only parts of it (e.g. main clauses):

  • Chunk grammars target only a part of the source document
  • Full grammars target the whole source document

Precision

Grammars may target the deep or the surface structure of the source document:

  • Deep grammars focus on the deep dependency relations of the source document and normally have three levels (network, tree and list)
  • Shallow grammars focus only on the surface dependency relations of the source document and normally have only two levels (network and list)

Assessment

Main article: F-measure

Grammars are evaluated through a weighted average of precision and recall.

Software