D-rule
(→#FINAL) |
(→#FINAL) |
||
Line 112: | Line 112: | ||
which is not the expected result (we expect "this" to be tokenized as a determiner, rather than as a pronoun)<br /> | which is not the expected result (we expect "this" to be tokenized as a determiner, rather than as a pronoun)<br /> | ||
In order to control this process, we may create a D-rule such as: | In order to control this process, we may create a D-rule such as: | ||
− | + | *(R)(N)=0; (the sequence pronoun + noun is prohibited) | |
− | but the result of this would be:<br /> | + | but the result of this would be "book" as a verb, instead of "this" as a determiner, i.e.:<br /> |
("this",[this],<nowiki>[[00]]</nowiki>,R)("book",[book],<nowiki>[[to book]]</nowiki>,V) (i.e., [a] = pronoun and [book] = verb)<br /> | ("this",[this],<nowiki>[[00]]</nowiki>,R)("book",[book],<nowiki>[[to book]]</nowiki>,V) (i.e., [a] = pronoun and [book] = verb)<br /> | ||
because D-rules apply from left to right, and the system will try to replace first the rightmost nodes.<br /> | because D-rules apply from left to right, and the system will try to replace first the rightmost nodes.<br /> | ||
In order to prevent the system from replacing the rightmost nodes, we have to assign #FINAL to the nodes to be preserved: | In order to prevent the system from replacing the rightmost nodes, we have to assign #FINAL to the nodes to be preserved: | ||
− | + | *(R)(N,#FINAL)=0; (there cannot be a pronoun before a noun) | |
i.e., the machine will try to replace first the node without #FINAL and will get, then:<br /> | i.e., the machine will try to replace first the node without #FINAL and will get, then:<br /> | ||
("this",[this],<nowiki>[[]]</nowiki>,D)("book",[book],[[[[book]]]],N)<br /> | ("this",[this],<nowiki>[[]]</nowiki>,D)("book",[book],[[[[book]]]],N)<br /> |
Revision as of 21:08, 19 August 2013
D-rules or disambiguation rules are used to prevent wrong lexical choices, to provoke best matches and to check the consistency of graphs, trees and lists. The set of D-rules form the Disambiguation grammar, or D-Grammar.
Contents |
Syntax
D-rules follow the general syntax:
STATEMENT=P;
Where
STATEMENT is the left side (condition) of a L-rule or a S-rule; and
P, which can range from 0 (impossible) to 255 (necessary), is the probability of occurrence of the STATEMENT
Scope of Disambiguation Rules
Disambiguation rules may apply:
- Only during tokenization, in order to control the dictionary retrieval
- Only during transformation, in order to control the application of T-rules
- During tokenization and transformation
Types of Disambiguation Rules
There are two types of disambiguation rules:
- Linear disambiguation rules, when the rule applies over lists of nodes
- Non-linear disambiguation rules, when the rule applies over non-linear relations between nodes
Linear Disambiguation Rules
Linear disambiguation rules apply over the natural language list structure to constrain word selection (dictionary retrieval) or the application of both Tree-to-List (TL) and List-to-List (LL) Transformation Rules. They have the following format:
(node 1)(node 2)(...)(node n)=P;
Where (node 1), (node 2) and (node n) are nodes, and P is an integer (from 0 to 255).
Examples
- (ART)(VER)=0;
- An article (ART) may not precede a verb (VER).
- (ART)(NOU)=255;
- Articles (ART) always precede nouns (NOU).
Use
INPUT | TRANSFORMATION RULES | DISAMBIGUATION RULES | OUTPUT |
---|---|---|---|
X(A,B,C;D,E,F) | X(A;D)=(A)(D); (higher priority) X(A;F)=(F)(A); (lower priority) |
(B)(E)=0; | (D,E,F)(A,B,C) |
INPUT | DICTIONARY | DISAMBIGUATION RULES | OUTPUT |
---|---|---|---|
the book | [book] "22222" (POS=VER); (higher priority) [book] "11111" (POS=NOU); (lower priority) |
(ART)(BLK)(VER)=0; | [book] "1111" (POS=NOU); |
Non-Linear Disambiguation Rules
Non-linear disambiguation rules apply over the syntactic or the network structure to constrain the application of List-to-Tree (LT), Tree-to-Tree (TT), Tree-to-Network (TN) and Network-to-Network (NN) Transformation Rules. They have the following format:
REL1(arg1;arg2;...)REL2(arg3;arg4;...)...RELN(argx;argy;...)=P;
Where REL1, REL2 and REL2 are syntactic or semantic relations, with their corresponding arguments (arg1, arg2, ...), and P is an integer (from 0 to 255).
Examples
- VS(VER;ADJ)=0;
- An adjective (ADJ) may not be an specifier (VS) of a verb (VER).
- NS(NOU;DET)=255;
- Determiners (DET) are always specifiers (NS) of nouns (NOU).
- agt(VER;ADJ)=0;
- An adjective (ADJ) may not be an agent (agt) of a verb (VER).
- agt(VER;NOU)=255;
- Agents (agt) of verbs (VER) are always nouns (NOU).
Use
INPUT | TRANSFORMATION RULES | DISAMBIGUATION RULES | OUTPUT |
---|---|---|---|
(A,B,C)(D,E,F) | (A)(D)=X(A;D); (higher priority) (A)(E)=X(E;A); (lower priority) |
X(F;A)=255; | X(D,E,F;A,B,C) |
SYN(A,B,C;D,E,F) | SYN(A;D)=agt(;); (higher priority) SYN(A;E)=aoj(;); (lower priority) |
agt(A;F)=0; | aoj(A,B,C;D,E,F) |
agt(A,B,C;D,E,F) | agt(A;D)=X(A;D); (higher priority) agt(A;E)=Y(A;E); (lower priority) |
X(B;F)=0; | Y(A,B,C;D,E,F) |
#FINAL
The feature #FINAL is used to indicate which terms are not expected to be replaced in a D-rule.
Consider, for instance, the input string:
- a book
and the dictionary:
- [this]{1}"00"(R)<eng,0,0>; (this is my book)
- [this]{2}""(D)<eng,0,0>; (this book is mine)
- [book]{4}"book"(N)<eng,0,0>;
- [book]{3}"to book"(V)<eng,0,0>;
According to the order defined in the dictionary, the input string would be tokenized as
("this",[this],[[00]],R)("book",[book],[[book]],N) (i.e., [this] = pronoun and [book] = noun)
which is not the expected result (we expect "this" to be tokenized as a determiner, rather than as a pronoun)
In order to control this process, we may create a D-rule such as:
- (R)(N)=0; (the sequence pronoun + noun is prohibited)
but the result of this would be "book" as a verb, instead of "this" as a determiner, i.e.:
("this",[this],[[00]],R)("book",[book],[[to book]],V) (i.e., [a] = pronoun and [book] = verb)
because D-rules apply from left to right, and the system will try to replace first the rightmost nodes.
In order to prevent the system from replacing the rightmost nodes, we have to assign #FINAL to the nodes to be preserved:
- (R)(N,#FINAL)=0; (there cannot be a pronoun before a noun)
i.e., the machine will try to replace first the node without #FINAL and will get, then:
("this",[this],[[]],D)("book",[book],[[book]],N)
which is the expected result.
Formal Syntax of Disambiguation Rules
Disambiguation rules must comply with the following syntax
<DISAMBIGUATION RULE> ::= <NN RULE> | <TT RULE> | <LL RULE> <NN RULE> ::= (<SEM>)+ "=" [0-255]";" <TT RULE> ::= (<SYN>)+ "=" [0-255]";" <LL RULE> ::= "(" <NODE> ")" ( "(" <NODE> ")" )+ "=" [0-255]";" <SEM> ::= <TEXT> "(" <NODE> ";" <NODE> ")" <SYN> ::= <TEXT> "(" <NODE> ";" <NODE> ")" <NODE> ::= ( (<DESCRIPTION>)( "," <DESCRIPTION> )* )? <DESCRIPTION> ::= <STRING> | <ENTRY> | <FEATURE> | <RELATION> <STRING> ::= """<text>""" <ENTRY> ::= "["<entry>"]" <FEATURE> ::= <VALUE> | <ATTRIBUTE> | <ATTRIBUTE>"="<VALUE> <RELATION> ::= <SEM>|<SYN> <VALUE> ::= <TEXT> <ATTRIBUTE> ::= <TEXT> <TEXT> ::= any sequence of characters except whitespace | <REGULAR EXPRESSION> <REGULAR EXPRESSION> ::= "/"<PERL COMPATIBLE REGULAR EXPRESSIONS>"/"
Examples
- List structures
- (ART)(BLK)(VER)=0; (an article (ART) may not precede a verb (VER))
- (ART)(BLK)(NOU)=255; (articles (ART) always precede nouns (NOU))
- Syntactic and semantic structures
- agt(VER;ADJ)=0; (an adjective (ADJ) may not be an agent (agt) of a verb (VER))
- agt(VER;NOU)=255; (agents (agt) of verbs (VER) are always nouns (NOU))
- VS(VER;ADJ)=0; (an adjective (ADJ) may not be an specifier (VS) of a verb (VER))
- NS(NOU;DET)=255; (determiners (DET) are always specifiers (NS) of nouns (NOU))