D-rule

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
Line 9: Line 9:
 
STATEMENT is the left side (condition) of a [[L-rule]] or a [[S-rule]]; and<br />
 
STATEMENT is the left side (condition) of a [[L-rule]] or a [[S-rule]]; and<br />
 
P, which can range from 0 (impossible) to 255 (necessary), is the probability of occurrence of the STATEMENT<br />
 
P, which can range from 0 (impossible) to 255 (necessary), is the probability of occurrence of the STATEMENT<br />
 
== Scope of Disambiguation Rules ==
 
Disambiguation rules may apply:
 
*Only during [[tokenization]], in order to control the dictionary retrieval
 
*Only during [[transformation]], in order to control the application of [[T-rule]]s
 
*During tokenization and transformation
 
  
 
== Types of Disambiguation Rules ==
 
== Types of Disambiguation Rules ==
Line 46: Line 40:
 
;agt(VER;NOU)=255;
 
;agt(VER;NOU)=255;
 
:Agents (agt) of verbs (VER) are always nouns (NOU).
 
:Agents (agt) of verbs (VER) are always nouns (NOU).
 +
 +
== Scope of Disambiguation Rules ==
 +
Disambiguation rules may apply:
 +
*Only during [[tokenization]], in order to control the dictionary retrieval
 +
*Only during [[transformation]], in order to control the application of [[T-rule]]s
 +
*During tokenization and transformation
 +
 +
=== Tokenization ===
 +
:main article: [[tokenization]]
 +
During [[tokenization]], D-rules are used to resolve lexical ambiguities.<br />
 +
For instance, given the dictionary:
 +
*[ ]{}""(BLK)<eng,0,0>;
 +
*[a]{}""(POS=ART)<eng,0,0>;
 +
*[book]{}"to book(equ>to reserve)" (POS=VER)<eng,2,0>; (higher frequency)
 +
*[book]{}"book(icl>document)" (POS=NOU)<eng,1,0>; (lower frequency)
 +
The input string<br />
 +
:"a book"
 +
will be tokenized as<br />
 +
:("a",[a],<nowiki>[[]]</nowiki>,POS=ART)(" ",[ ],<nowiki>[[]]</nowiki>,BLK)("book",[book],<nowiki>[[to book(equ>to reserve)]]</nowiki>,POS=VER)<br />
 +
which is not correct, because "book" should be classified as ("book",[book],<nowiki>[[book(icl>document)]]</nowiki>,POS=NOU)<br />
 +
In order to induce the correct behavior, two types of D-rules could be used:
 +
*to prevent verbs from appearing after article + blank, i.e., (ART)(BLK)(VER)=0; or
 +
*to force possible nouns to appear after article + blank, i.e., (ART)(BLK)(NOU)=1;
 +
In both case the result will be:
 +
:("a",[a],<nowiki>[[]]</nowiki>,POS=ART)(" ",[ ],<nowiki>[[]]</nowiki>,BLK)("book",[book],<nowiki>[[book(icl>document)]]</nowiki>,POS=NOU)<br />
 +
which is the correct one.
 +
 +
=== Transformation ===
 +
In [[transformation]], D-rules are used to resolve syntactic and semantic ambiguities.<br />
 +
For instance, given the state:
 +
:("book",N)("of",P)("Peter",N)("about",P)("John",N)
 +
And the grammar:
 +
#(%x,N)(%y,P):=(NA(%x;%y),+N); (i.e., replace the sequence noun + preposition by a hyper-node containing a relation NA (noun adjunct) between them)
 +
#(%x,P)(%y,N):=(PC(%x;%y),+P); (i.e., replace the sequence preposition + noun by a hyper-node containing a relation PC (prepopsition complement) between them)
 +
The result of the application of the rules, in the order defined by the grammar, would be
 +
:(NA("book",N;"of",P)("NA("Peter",N;"about",P)("John",N)
 +
which is not correct, because the relation should be built over "book" and "of Peter", and "book" and "about John".<br />
 +
In order to induce the correct behavior, two types of D-rules could be used:
 +
*to prevent NA's from appearing before nouns, i.e., (NA(;))(N)=0;
 +
*to force PC's to apply first, i.e., PC(P;N)=1;
 +
In both cases the result will be:
 +
:("book",N)(PC("of",P;"Peter",N),P)(PC("about",P;"John",N),P) (after applying the rule #2 two times)
 +
:(NA("book",N;PC("of",P;"Peter",N),P),N)(PC("about",P;"John",N),P)(after applying the rule #1 for the first time)
 +
:(NA(NA("book",N;PC("of",P;"Peter",N),P),N;PC("about",P;"John",N),P),N)(after applying the rule #1 for the second time time)
 +
which is the correct output<ref>The sentence corresponds to [ [ [book][of Peter] ] [about John] ].</ref>
 +
 +
  
 
== #FINAL ==
 
== #FINAL ==
Line 70: Line 111:
 
which is exactly the expected result.
 
which is exactly the expected result.
  
==== Use ====
+
== Examples ==
 
+
*List structures
{|cellpadding="5" border="1" align="center"
+
**(ART)(BLK)(VER)=0;  (an article (ART) may not precede a verb (VER))
|+ Rule Disambiguation
+
**(ART)(BLK)(NOU)=255; (articles (ART) always precede nouns (NOU))
!INPUT
+
*Syntactic and semantic structures
!TRANSFORMATION RULES
+
**agt(VER;ADJ)=0;  (an adjective (ADJ) may not be an agent (agt) of a verb (VER))
!DISAMBIGUATION RULES
+
**agt(VER;NOU)=255; (agents (agt) of verbs (VER) are always nouns (NOU))
!OUTPUT
+
**VS(VER;ADJ)=0; (an adjective (ADJ) may not be an specifier (VS) of a verb (VER))
|-
+
**NS(NOU;DET)=255; (determiners (DET) are always specifiers (NS) of nouns (NOU))
|X(A,B,C;D,E,F)
+
|X(A;D)=(A)(D); (higher priority)<br />X(A;F)=(F)(A); (lower priority)
+
|(B)(E)=0;
+
|(D,E,F)(A,B,C)
+
|}
+
 
+
 
+
  
{|cellpadding="5" border="1" align="center"
+
== Properties ==
|+ Word Disambiguation
+
!INPUT
+
!DICTIONARY
+
!DISAMBIGUATION RULES
+
!OUTPUT
+
|-
+
|the book
+
|[book] "22222" (POS=VER); (higher priority)<br />[book] "11111" (POS=NOU); (lower priority)
+
|(ART)(BLK)(VER)=0;
+
|[book] "1111" (POS=NOU);
+
|}
+
{|cellpadding="5" border="1" align="center"
+
!INPUT
+
!TRANSFORMATION RULES
+
!DISAMBIGUATION RULES
+
!OUTPUT
+
|-
+
|(A,B,C)(D,E,F)
+
|(A)(D)=X(A;D); (higher priority)<br />(A)(E)=X(E;A); (lower priority)
+
|X(F;A)=255;
+
|X(D,E,F;A,B,C)
+
|-
+
|SYN(A,B,C;D,E,F)
+
|SYN(A;D)=agt(;); (higher priority)<br />SYN(A;E)=aoj(;); (lower priority)
+
|agt(A;F)=0;
+
|aoj(A,B,C;D,E,F)
+
|-
+
|agt(A,B,C;D,E,F)
+
|agt(A;D)=X(A;D); (higher priority)<br />agt(A;E)=Y(A;E); (lower priority)
+
|X(B;F)=0;
+
|Y(A,B,C;D,E,F)
+
|}
+
  
  
Line 141: Line 143:
 
  <nowiki><TEXT>                ::= any sequence of characters except whitespace | <REGULAR EXPRESSION></nowiki>
 
  <nowiki><TEXT>                ::= any sequence of characters except whitespace | <REGULAR EXPRESSION></nowiki>
 
  <REGULAR EXPRESSION>  ::= "/"<[http://www.pcre.org/ PERL COMPATIBLE REGULAR EXPRESSIONS]>"/"
 
  <REGULAR EXPRESSION>  ::= "/"<[http://www.pcre.org/ PERL COMPATIBLE REGULAR EXPRESSIONS]>"/"
 
== Examples ==
 
*List structures
 
**(ART)(BLK)(VER)=0;  (an article (ART) may not precede a verb (VER))
 
**(ART)(BLK)(NOU)=255; (articles (ART) always precede nouns (NOU))
 
*Syntactic and semantic structures
 
**agt(VER;ADJ)=0;  (an adjective (ADJ) may not be an agent (agt) of a verb (VER))
 
**agt(VER;NOU)=255; (agents (agt) of verbs (VER) are always nouns (NOU))
 
**VS(VER;ADJ)=0; (an adjective (ADJ) may not be an specifier (VS) of a verb (VER))
 
**NS(NOU;DET)=255; (determiners (DET) are always specifiers (NS) of nouns (NOU))
 

Revision as of 15:38, 27 August 2013

D-rules or disambiguation rules are used to prevent wrong lexical choices, to provoke best matches and to check the consistency of graphs, trees and lists. The set of D-rules form the Disambiguation grammar, or D-Grammar.

Contents

Syntax

D-rules follow the general syntax:

STATEMENT=P;

Where
STATEMENT is the left side (condition) of a L-rule or a S-rule; and
P, which can range from 0 (impossible) to 255 (necessary), is the probability of occurrence of the STATEMENT

Types of Disambiguation Rules

There are two types of disambiguation rules:

  • Linear disambiguation rules, when the rule applies over lists of nodes
  • Non-linear disambiguation rules, when the rule applies over non-linear relations between nodes

Linear Disambiguation Rules

Linear disambiguation rules apply over the natural language list structure to constrain word selection (dictionary retrieval) or the application of both Tree-to-List (TL) and List-to-List (LL) Transformation Rules. They have the following format:

(node 1)(node 2)(...)(node n)=P;

Where (node 1), (node 2) and (node n) are nodes, and P is an integer (from 0 to 255).

Examples

(ART)(VER)=0;
An article (ART) may not precede a verb (VER).
(ART)(NOU)=255;
Articles (ART) always precede nouns (NOU).

Non-Linear Disambiguation Rules

Non-linear disambiguation rules apply over the syntactic or the network structure to constrain the application of List-to-Tree (LT), Tree-to-Tree (TT), Tree-to-Network (TN) and Network-to-Network (NN) Transformation Rules. They have the following format:

REL1(arg1;arg2;...)REL2(arg3;arg4;...)...RELN(argx;argy;...)=P;

Where REL1, REL2 and REL2 are syntactic or semantic relations, with their corresponding arguments (arg1, arg2, ...), and P is an integer (from 0 to 255).

Examples

VS(VER;ADJ)=0;
An adjective (ADJ) may not be an specifier (VS) of a verb (VER).
NS(NOU;DET)=255;
Determiners (DET) are always specifiers (NS) of nouns (NOU).
agt(VER;ADJ)=0;
An adjective (ADJ) may not be an agent (agt) of a verb (VER).
agt(VER;NOU)=255;
Agents (agt) of verbs (VER) are always nouns (NOU).

Scope of Disambiguation Rules

Disambiguation rules may apply:

  • Only during tokenization, in order to control the dictionary retrieval
  • Only during transformation, in order to control the application of T-rules
  • During tokenization and transformation

Tokenization

main article: tokenization

During tokenization, D-rules are used to resolve lexical ambiguities.
For instance, given the dictionary:

  • [ ]{}""(BLK)<eng,0,0>;
  • [a]{}""(POS=ART)<eng,0,0>;
  • [book]{}"to book(equ>to reserve)" (POS=VER)<eng,2,0>; (higher frequency)
  • [book]{}"book(icl>document)" (POS=NOU)<eng,1,0>; (lower frequency)

The input string

"a book"

will be tokenized as

("a",[a],[[]],POS=ART)(" ",[ ],[[]],BLK)("book",[book],[[to book(equ>to reserve)]],POS=VER)

which is not correct, because "book" should be classified as ("book",[book],[[book(icl>document)]],POS=NOU)
In order to induce the correct behavior, two types of D-rules could be used:

  • to prevent verbs from appearing after article + blank, i.e., (ART)(BLK)(VER)=0; or
  • to force possible nouns to appear after article + blank, i.e., (ART)(BLK)(NOU)=1;

In both case the result will be:

("a",[a],[[]],POS=ART)(" ",[ ],[[]],BLK)("book",[book],[[book(icl>document)]],POS=NOU)

which is the correct one.

Transformation

In transformation, D-rules are used to resolve syntactic and semantic ambiguities.
For instance, given the state:

("book",N)("of",P)("Peter",N)("about",P)("John",N)

And the grammar:

  1. (%x,N)(%y,P):=(NA(%x;%y),+N); (i.e., replace the sequence noun + preposition by a hyper-node containing a relation NA (noun adjunct) between them)
  2. (%x,P)(%y,N):=(PC(%x;%y),+P); (i.e., replace the sequence preposition + noun by a hyper-node containing a relation PC (prepopsition complement) between them)

The result of the application of the rules, in the order defined by the grammar, would be

(NA("book",N;"of",P)("NA("Peter",N;"about",P)("John",N)

which is not correct, because the relation should be built over "book" and "of Peter", and "book" and "about John".
In order to induce the correct behavior, two types of D-rules could be used:

  • to prevent NA's from appearing before nouns, i.e., (NA(;))(N)=0;
  • to force PC's to apply first, i.e., PC(P;N)=1;

In both cases the result will be:

("book",N)(PC("of",P;"Peter",N),P)(PC("about",P;"John",N),P) (after applying the rule #2 two times)
(NA("book",N;PC("of",P;"Peter",N),P),N)(PC("about",P;"John",N),P)(after applying the rule #1 for the first time)
(NA(NA("book",N;PC("of",P;"Peter",N),P),N;PC("about",P;"John",N),P),N)(after applying the rule #1 for the second time time)

which is the correct output[1]


#FINAL

The feature #FINAL is used to indicate which nodes are not expected to be replaced in a D-rule.
Consider, for instance, the input string:

  • this book

and the dictionary:

  • [this]{1}"00"(R)<eng,0,0>; (this is my book)
  • [this]{2}""(D)<eng,0,0>; (this book is mine)
  • [book]{4}"book"(N)<eng,0,0>;
  • [book]{3}"to book"(V)<eng,0,0>;

According to the order defined in the dictionary, the input string would be tokenized as

  • (R)(N) (i.e., [this] = pronoun and [book] = noun)

which is not the expected result (we expect "this" to be tokenized as a determiner, rather than as a pronoun)
In order to prevent this tokenization, we may create a D-rule such as:

  • (R)(N)=0; (the sequence pronoun + noun is prohibited)

but the result of this rule would be "book" as a verb, instead of "this" as a determiner, i.e.:

  • (R)(V) (i.e., [this] = pronoun and [book] = verb)

because D-rules apply from left to right, and the system will try to replace first the rightmost nodes, if possible.
In order to prevent the system from replacing the rightmost nodes, we have to assign #FINAL to the nodes to be preserved:

  • (R)(N,#FINAL)=0; (there cannot be a pronoun before a noun)

In this case, the machine will try to replace first the node without #FINAL and will get, then:

  • (D)(N) (i.e., [this] = determiner and [book] = noun)

which is exactly the expected result.

Examples

  • List structures
    • (ART)(BLK)(VER)=0; (an article (ART) may not precede a verb (VER))
    • (ART)(BLK)(NOU)=255; (articles (ART) always precede nouns (NOU))
  • Syntactic and semantic structures
    • agt(VER;ADJ)=0; (an adjective (ADJ) may not be an agent (agt) of a verb (VER))
    • agt(VER;NOU)=255; (agents (agt) of verbs (VER) are always nouns (NOU))
    • VS(VER;ADJ)=0; (an adjective (ADJ) may not be an specifier (VS) of a verb (VER))
    • NS(NOU;DET)=255; (determiners (DET) are always specifiers (NS) of nouns (NOU))

Properties

Formal Syntax of Disambiguation Rules

Disambiguation rules must comply with the following syntax

<DISAMBIGUATION RULE> ::= <NN RULE> | <TT RULE> | <LL RULE> 
<NN RULE>             ::= (<SEM>)+ "=" [0-255]";"
<TT RULE>             ::= (<SYN>)+ "=" [0-255]";"
<LL RULE>             ::= "(" <NODE> ")" ( "(" <NODE> ")" )+ "=" [0-255]";"
<SEM>                 ::= <TEXT> "(" <NODE> ";" <NODE> ")"
<SYN>                 ::= <TEXT> "(" <NODE> ";" <NODE> ")"
<NODE>                ::= ( (<DESCRIPTION>)( "," <DESCRIPTION> )* )?
<DESCRIPTION>         ::= <STRING> | <ENTRY> | <FEATURE> | <RELATION>
<STRING>              ::= """<text>"""
<ENTRY>               ::= "["<entry>"]"
<FEATURE>             ::= <VALUE> | <ATTRIBUTE> | <ATTRIBUTE>"="<VALUE>
<RELATION>            ::= <SEM>|<SYN>
<VALUE>               ::= <TEXT>
<ATTRIBUTE>           ::= <TEXT>
<TEXT>                ::= any sequence of characters except whitespace | <REGULAR EXPRESSION>
<REGULAR EXPRESSION>  ::= "/"<PERL COMPATIBLE REGULAR EXPRESSIONS>"/"
Software