Normalization
From UNL Wiki
(Difference between revisions)
Martins (Talk | contribs)
(Created page with "Normalization is the process of normalizing the input document in order to be better processed. It is carried by N-rules and includes: *replacing abbreviations by their co...")
Newer edit →
(Created page with "Normalization is the process of normalizing the input document in order to be better processed. It is carried by N-rules and includes: *replacing abbreviations by their co...")
Newer edit →
Revision as of 17:00, 16 July 2014
Normalization is the process of normalizing the input document in order to be better processed. It is carried by N-rules and includes:
- replacing abbreviations by their corresponding extended forms
- replacing short forms by their corresponding long forms
- replacing periphrases direct forms
- replacing contractions by their components
- defining processing units
Replacement
Replacement is carried by N-rules written as follows:
({SHEAD|" "})("don’t")({STAIL|" "}):=()("do not")(); ({SHEAD|" "})("art. ")({STAIL|" "}):=()("article")(); ({SHEAD|" "})("aux")({STAIL|" "}):=()("à les")();
Where:
- SHEAD = beginning of the sentence
- STAIL = end of the sentence
- ({SHEAD|" "}) indicates left context (i.e., either SHEAD or blank space)
- ({STAIL|" "}) indicates right context (i.e., either SHEAD or blank space)