N-rule

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(Created page with " Normalization rules are used to prepare the natural language input for automatic processing. They constitute the preprocessing module that applies over the input as a strin...")
 
Line 1: Line 1:
 +
Normalization rules are used to prepare the natural language input for automatic processing. They constitute the preprocessing module that applies over the input as a string and runs prior to the [[tokenization]]. The set of n-rules forms the '''Normalization Grammar''', or '''N-Grammar'''.
  
 +
== Syntax ==
 +
Normalization Rules follow the very general formalism
 +
α:=β;
 +
where the left side α is a condition statement, and the right side β is an action to be performed over α.
  
 +
== Type of Normalization Rules ==
  
Normalization rules are used to prepare the natural language input for automatic processing. They constitute the preprocessing module that applies over the input as a string and runs prior to the [[tokenization]]. They have two roles:
+
== Roles of Normalization Rules ==
 +
Normalization roles They have two roles:
 
*to normalize the input text (to replace abbreviations by their extend forms, to extend contractions, etc.)
 
*to normalize the input text (to replace abbreviations by their extend forms, to extend contractions, etc.)
 
*to segment the natural language text into sentences (i.e., to create the tags <SHEAD> (beginning of a sentence), <STAIL> (end of a sentence), <CHEAD> (beginning of a scope) and <CTAIL> (end of a scope) inside the input text). These sentences are used as sentence and clause boundaries, and define the units of processing of the Transformation and Disambiguation grammars.
 
*to segment the natural language text into sentences (i.e., to create the tags <SHEAD> (beginning of a sentence), <STAIL> (end of a sentence), <CHEAD> (beginning of a scope) and <CTAIL> (end of a scope) inside the input text). These sentences are used as sentence and clause boundaries, and define the units of processing of the Transformation and Disambiguation grammars.
  
=== Examples of Normalization rules ===
+
== Examples of Normalization rules ==
 
*Segmentation
 
*Segmentation
 
**("/.*\./",%x):=(%x)(+STAIL,%y); (creates an STAIL node after any sequence of characters followed by "." (/.*\./);
 
**("/.*\./",%x):=(%x)(+STAIL,%y); (creates an STAIL node after any sequence of characters followed by "." (/.*\./);

Revision as of 16:29, 31 May 2013

Normalization rules are used to prepare the natural language input for automatic processing. They constitute the preprocessing module that applies over the input as a string and runs prior to the tokenization. The set of n-rules forms the Normalization Grammar, or N-Grammar.

Contents

Syntax

Normalization Rules follow the very general formalism

α:=β;

where the left side α is a condition statement, and the right side β is an action to be performed over α.

Type of Normalization Rules

Roles of Normalization Rules

Normalization roles They have two roles:

  • to normalize the input text (to replace abbreviations by their extend forms, to extend contractions, etc.)
  • to segment the natural language text into sentences (i.e., to create the tags <SHEAD> (beginning of a sentence), <STAIL> (end of a sentence), <CHEAD> (beginning of a scope) and <CTAIL> (end of a scope) inside the input text). These sentences are used as sentence and clause boundaries, and define the units of processing of the Transformation and Disambiguation grammars.

Examples of Normalization rules

  • Segmentation
    • ("/.*\./",%x):=(%x)(+STAIL,%y); (creates an STAIL node after any sequence of characters followed by "." (/.*\./);
    • ("/\(/",%x):=(+CHEAD,%y)(%x); (creates an CHEAD node before the opening of a parentheses (/\(/);
  • Normalization
    • ("an "):=("a "); ("an apple" > "a apple")
    • ("don't"):=("do not"); ("I don't see" > "I do not see")
Software