N-rule

From UNL Wiki

Revision as of 16:58, 31 May 2013 by Martins (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Normalization rules are used to prepare the natural language input for automatic processing. They constitute the preprocessing module that applies over the input as a string and runs prior to the tokenization. They have two roles:

to normalize the input text (to replace abbreviations by their extend forms, to extend contractions, etc.)
to segment the natural language text into sentences (i.e., to create the tags <SHEAD> (beginning of a sentence), <STAIL> (end of a sentence), <CHEAD> (beginning of a scope) and <CTAIL> (end of a scope) inside the input text). These sentences are used as sentence and clause boundaries, and define the units of processing of the Transformation and Disambiguation grammars.

Examples of Normalization rules

Segmentation
- ("/.*\./",%x):=(%x)(+STAIL,%y); (creates an STAIL node after any sequence of characters followed by "." (/.*\./);
- ("/\(/",%x):=(+CHEAD,%y)(%x); (creates an CHEAD node before the opening of a parentheses (/\(/);
Normalization
- ("an "):=("a "); ("an apple" > "a apple")
- ("don't"):=("do not"); ("I don't see" > "I do not see")

N-rule

Examples of Normalization rules

Views

Personal tools

Search

UNL

Lingware

Software

UNL Program

Navigation

Toolbox

Print/export