N-rule
From UNL Wiki
(Difference between revisions)
(→Roles of Normalization Rules) |
|||
Line 7: | Line 7: | ||
== Roles of Normalization Rules == | == Roles of Normalization Rules == | ||
− | Normalization | + | Normalization rules have two roles: |
*to normalize the input text (to replace abbreviations by their extend forms, to extend contractions, etc.) | *to normalize the input text (to replace abbreviations by their extend forms, to extend contractions, etc.) | ||
*to segment the natural language text into sentences (i.e., to create the tags <SHEAD> (beginning of a sentence), <STAIL> (end of a sentence), <CHEAD> (beginning of a scope) and <CTAIL> (end of a scope) inside the input text). These tags are used as sentence and clause boundaries, and define the units of processing of the Transformation and Disambiguation grammars. | *to segment the natural language text into sentences (i.e., to create the tags <SHEAD> (beginning of a sentence), <STAIL> (end of a sentence), <CHEAD> (beginning of a scope) and <CTAIL> (end of a scope) inside the input text). These tags are used as sentence and clause boundaries, and define the units of processing of the Transformation and Disambiguation grammars. |
Revision as of 20:06, 31 May 2013
Normalization rules are used to prepare the natural language input for automatic processing. They constitute the preprocessing module that applies over the input as a string and runs prior to the tokenization. The set of n-rules forms the Normalization Grammar, or N-Grammar.
Contents |
Syntax
Normalization Rules follow the very general formalism
α:=β;
where the left side α is a condition statement, and the right side β is an action to be performed over α.
Roles of Normalization Rules
Normalization rules have two roles:
- to normalize the input text (to replace abbreviations by their extend forms, to extend contractions, etc.)
- to segment the natural language text into sentences (i.e., to create the tags <SHEAD> (beginning of a sentence), <STAIL> (end of a sentence), <CHEAD> (beginning of a scope) and <CTAIL> (end of a scope) inside the input text). These tags are used as sentence and clause boundaries, and define the units of processing of the Transformation and Disambiguation grammars.
Type of Normalization Rules
Normalization rules are string replacement rules. They are used to replace existing strings by new strings. The string to be replaced may be represented by a constant (between "double quotes") or by a regular expression (between /forward slashes/).
ACTION | RULE | DESCRIPTION |
---|---|---|
REPLACE | (%x):=(%y); | All the instances of the node %x will be replaced by the node %y |
Where %x and %y are nodes.
Examples of Normalization rules
- Segmentation
- ("/.*\./",%x):=(%x)(+STAIL,%y); (creates an STAIL node after any sequence of characters followed by "." (/.*\./);
- ("/\(/",%x):=(+CHEAD,%y)(%x); (creates an CHEAD node before the opening of a parentheses (/\(/);
- Normalization
- ("an "):=("a "); ("an apple" > "a apple")
- ("don't"):=("do not"); ("I don't see" > "I do not see")