N-rule
From UNL Wiki
(Difference between revisions)
(Created page with " Normalization rules are used to prepare the natural language input for automatic processing. They constitute the preprocessing module that applies over the input as a strin...") |
|||
Line 1: | Line 1: | ||
+ | Normalization rules are used to prepare the natural language input for automatic processing. They constitute the preprocessing module that applies over the input as a string and runs prior to the [[tokenization]]. The set of n-rules forms the '''Normalization Grammar''', or '''N-Grammar'''. | ||
+ | == Syntax == | ||
+ | Normalization Rules follow the very general formalism | ||
+ | α:=β; | ||
+ | where the left side α is a condition statement, and the right side β is an action to be performed over α. | ||
+ | == Type of Normalization Rules == | ||
− | Normalization | + | == Roles of Normalization Rules == |
+ | Normalization roles They have two roles: | ||
*to normalize the input text (to replace abbreviations by their extend forms, to extend contractions, etc.) | *to normalize the input text (to replace abbreviations by their extend forms, to extend contractions, etc.) | ||
*to segment the natural language text into sentences (i.e., to create the tags <SHEAD> (beginning of a sentence), <STAIL> (end of a sentence), <CHEAD> (beginning of a scope) and <CTAIL> (end of a scope) inside the input text). These sentences are used as sentence and clause boundaries, and define the units of processing of the Transformation and Disambiguation grammars. | *to segment the natural language text into sentences (i.e., to create the tags <SHEAD> (beginning of a sentence), <STAIL> (end of a sentence), <CHEAD> (beginning of a scope) and <CTAIL> (end of a scope) inside the input text). These sentences are used as sentence and clause boundaries, and define the units of processing of the Transformation and Disambiguation grammars. | ||
− | + | == Examples of Normalization rules == | |
*Segmentation | *Segmentation | ||
**("/.*\./",%x):=(%x)(+STAIL,%y); (creates an STAIL node after any sequence of characters followed by "." (/.*\./); | **("/.*\./",%x):=(%x)(+STAIL,%y); (creates an STAIL node after any sequence of characters followed by "." (/.*\./); |
Revision as of 16:29, 31 May 2013
Normalization rules are used to prepare the natural language input for automatic processing. They constitute the preprocessing module that applies over the input as a string and runs prior to the tokenization. The set of n-rules forms the Normalization Grammar, or N-Grammar.
Contents |
Syntax
Normalization Rules follow the very general formalism
α:=β;
where the left side α is a condition statement, and the right side β is an action to be performed over α.
Type of Normalization Rules
Roles of Normalization Rules
Normalization roles They have two roles:
- to normalize the input text (to replace abbreviations by their extend forms, to extend contractions, etc.)
- to segment the natural language text into sentences (i.e., to create the tags <SHEAD> (beginning of a sentence), <STAIL> (end of a sentence), <CHEAD> (beginning of a scope) and <CTAIL> (end of a scope) inside the input text). These sentences are used as sentence and clause boundaries, and define the units of processing of the Transformation and Disambiguation grammars.
Examples of Normalization rules
- Segmentation
- ("/.*\./",%x):=(%x)(+STAIL,%y); (creates an STAIL node after any sequence of characters followed by "." (/.*\./);
- ("/\(/",%x):=(+CHEAD,%y)(%x); (creates an CHEAD node before the opening of a parentheses (/\(/);
- Normalization
- ("an "):=("a "); ("an apple" > "a apple")
- ("don't"):=("do not"); ("I don't see" > "I do not see")