English grammar/Determiners
Articles ("a", "the"), demonstrative determiners ("this", "that", "same" etc.), interrogative determiners ("which", "what") and quantifiers ("all", "any" etc.) are represented in UNL as attributes: "the book" > book.@def, "this book" > book.@proximal, "all books" > book.@all, "which book" > book.@wh. Numbers are represented by the relation "qua": "two books" > qua(book,2), "second book" > qua(book,2.@ordinal). Possessive determiners ("my", "your" etc) and possessive pronouns ("mine", "yours" are represented by the relation "pos" and the personal pro-forms of UNL (00.@1, 00.@2 and 00.@3): "my book" > pos(book,00.@1), "book of mine" > pos(book,00.@1).
UNLization
There are basically two types of UNLization rules dealing with determiners: attribute rules and relation rules. Attribute rules normally make use of the attribute "att" and its corresponding features assigned in the dictionary.
the book
INPUT (ENG): the book OUTPUT (UNL): book.@def DICTIONARY:
- [the]{}"" (LEX=D,POS=ART,att=@def)<eng,255,0>;
- [book]{}"book" (LEX=N,POS=NOU,NUM=SNG)<eng,0,0>;
- [ ]{}""(BLK)<eng,0,0>;
T-GRAMMAR:
- (BLK):=;
- (D,att,%x)(N,%y):=(%y,+att=%x);
D-GRAMMAR: not necessary
TRACE:
INPUT: the book
- STATE#1: [the][ ][book] (tokenization)
- STATE#2: [the][book] (blank space is deleted)
- STATE#3: [book.@def] ("the" is deleted and its attribute is copied to "book"]
- OUTPUT: book.@def
DESCRIPTION
The first rule will delete the blank spaces. Next, rule #2 will delete the determiner because its node (%x) is not reproduced in the right side of the rule. In list rules, according to the UNL Grammar Specs, nodes that do not appear in the right side are deleted. The feature "+att=%x" creates an attribute "att" and assigns the value of the same attribute in %x in to it. Note that [book] has no "att" in the dictionary, but [the] does. The feature "+att=%x" creates an attribute "att" for [book] and assigns to it the value of the same attribute in [the]. This kind of rule will only work if %x has "att"; if not, the attribute "att" will be created, but without any value.
In the English grammar, the rule is actually:
- (D,att,%x)(NB,%y):=(%y,+att=%x);
Note that %y has the feature NB instead of N. The reason for this is that, in English, there can be other words between the determiner and the noun: "a beautiful book", for instance. And it's important to assign the value of the determiner to the head of the noun phrase (which is "book", and not "beautiful"). In that case, we first process the noun phrase, generating all its intermediate projections (NB's). Only then we resolve the determiners.
The single rule (D,att,%x)(NB,%y):=(%y,+att=%x); will apply for all the determiners having the feature "att", i.e., [a], [an], [this], [that], [same], [other] etc.
two books
Numbers are not represented as attributes, but as relations. In this case, we cannot avoid some syntactic processing. We present below the trace and the details. INPUT: two books DICTIONARY:
- [two]{}"2" (LEX=U,POS=CDN,DIGIT)<eng,255,0>;
- [books]{}"book" (LEX=N,POS=NOU,NUM=PLR)<eng,0,0>;
- [ ]{}""(BLK)<eng,0,0>;
T-GRAMMAR:
- (BLK):=;
- (N,^head,^XB,^XP):=(+XB=NB);
- (DIGIT,^ORD,%y)(NB,%x):=(XB(%x;%y,+spec,+qua),+LEX=N,+XB=NB,%z); two books > XB(books,two)
- (/X[BS]/(%x;%y,spec),N,%z):=(NS(%x;%y),%z);
- /[ACDIJNPV][ACS]/(%x;%y,qua):=qua(%x;%y);
D-GRAMMAR: not necessary
TRACE:
INPUT: two books
- STATE#1: [two][ ][books] (tokenization)
- STATE#2: [two][books] (blank space is deleted)
- STATE#3: [two][books] ([books] will receive the feature NB because of rule #2)
- STATE#4: [XB(books;two)] (rule #3 is applied)
- STATE#5: [NS(books;two)] (rule #4 is applied)
- STATE#6: [qua(books;two)] (rule #5 is applied)
- OUTPUT: qua(book;"2")
DESCRIPTION:
The first rule will delete the blank space. The second rule will assigns the feature NB to [books]. The third rule creates a new node with a relation XB inside between the nodes "books" and "two". This relation XB will be replaced by NS, with the same arguments, after rule #4. At last, rule #5 will convert this NS into "qua", which is one of the semantic relations of UNL. Note that [books] will be automatically represented by the UW "book", and [two] by "2", because this is stated in the dictionary.
The main difficult here is to understand why NB and XB were necessary. Why couldn't we simply write:
- DIGIT,^ORD,%x)(N,%y)
- =qua(%y;%x);
The simple rule above will work in the case "two books", but we have to be prepared to process more complicated constructions, such as "my two books", "two beautiful books" etc. Consider, for instance, the case of "the two books". The trace would be the following:
- STATE0: [my][two][books] (after tokenization and deletion of blank spaces)
- STATE1: [my] qua(books;two) (after the rule above)
Note that there is no longer any linear relation between [my] and qua(books;two). So, how to link [my] to [books], which is now inside "qua" and out of the list structure? This is the reason for the syntactic processing here. In the English grammar, we have been using the X-bar theory, with some adaptations, but you are free to choose any other approach. In our case, the following is happening:
The rule:
- (N,^head,^XB,^XP):=(+XB=NB);
is part of a set of general parsing rules that projects NB (intermediate projection of the noun phrase) out of a noun (N). According to the X-bar approach, phrases (noun phrases, for instance) are projected out of the corresponding heads (nouns). There can be several different projections (with complements and adjuncts), and all of them constitute intermediate projections, which we represent generally by the attribute XB (from "x-bar") and, more specifically, by the value of the projection ("NB", from "noun-bar", is an XB that has N as its head). At the topmost level, the XB is projected, with the specifier, to form the maximal projection ("XP", from "X phrase"), which is again a general attribute whose value depends on the category of the head ("NP", from "noun phrase", is the maximal projection of a noun). The first intermediate projection of any head is the head itself. This is stated by the rule above.
After that, we apply the rule:
- (DIGIT,^ORD,%y)(NB,%x):=(XB(%x;%y,+spec,+qua),+LEX=N,+XB=NB,%z); two books > XB(books,two)
This rules states that, if there is a digit, which is not ordinal, before a NB, then we should create another intermediate projection XB between the NB and the digit, i.e.: two books will be analyzed as:
XB / \ NB two | books
Note that we are not only converting a list: [two][books] into a relation: XB(books;two). We are actually creating a hyper-node which contains this relation. Compare:
- (%x)(%y):=XB(%y;%x); transforms a list into a relation
- (%x)(%y):=(XB(%y;%x),%z); replaces two nodes (%x and %y) by a third node (%z) and creates an XB relation between %x and %y inside this node.
Why is this necessary?
There are at least two reasons for operating this way:
- If we simply replace the nodes %x and %y by a relation, we will loose the relations that they could have with other nodes in the sentence. This is the case of [my][two][books] referred to above. If we replace [two][books] by XB(books;two) we will loose the relation between [my] and [books], because nodes inserted in graphs are removed from the list structure, i.e., we will have two isolated data structures: the graph, which contains only XB(books;two), and the list with [my]. In order to preserve this relation, we have to replace [two][books] by a new node [XB(books;two)] so that we could have [my][XB(books;two)]. We willloose the relation between [my] and [two], but this relation is any really relevant.
- It is important to deal with general categories, such as XB, instead of specific categories, such as NB. In the dearborization phase of the grammar, we will transform this tree-like structures into head-driven structures. If we have a general relation such as XB, the number of rules is considerably smaller:
- XB(XB(%x;%y);%z):=XB(%x;%y)XB(%x;%z);
- XP(XB(%x;%y);%z):=XB(%x;%y)XS(%x;%z);
- XP(%x;%y):=XS(%x;%y);
Note that these rules will transform hyper-relations into simple relations. If we use specific categories (such as NB,VB,PB etc) instead of the general category XB, we will have to repeat this for all different types of relations. However, we do need to preserve the information that this intermediate projection (XB) is a projection of a noun. This is done by assigning the features +LEX=N,+XB=NB to the hyper-node so that, later on, we will be able to convert XB into NB.
The other rules are easier:
- (/X[BS]/(%x;%y,spec),N,%z):=(NS(%x;%y),%z);
- This rule transforms the relations XB or XS into NS if their target node (%y) have the feature "spec". Note that we assigned this feature to "two" in rule #3
- /[ACDIJNPV][ACS]/(%x;%y,qua):=qua(%x;%y);
- At last, the relations AA, AC, AS, CA, CC, CS etc are replaced by "qua" if their target argument has the feature "qua". We have also assigned this feature to "two" in rule #3.