S-rule
From UNL Wiki
(Difference between revisions)
m (→Observations: typo) |
m (→Indexes: typo) |
||
Line 115: | Line 115: | ||
:**X(A;B)Y(A;C):=Z(B;C); is the same as X(A,'''%01''';B,'''%02''')Y(A,'''%03''';C,'''%04'''):=Z(B,'''%01''';C,'''%02'''); while | :**X(A;B)Y(A;C):=Z(B;C); is the same as X(A,'''%01''';B,'''%02''')Y(A,'''%03''';C,'''%04'''):=Z(B,'''%01''';C,'''%02'''); while | ||
:**X(A,%a;B,%b)Y(A,%a;C,%c):=Z(B,%b;C,%c); is the same as X(A,'''%01''';B,'''%02''')Y(A,'''%01''';C,'''%04'''):=Z(B,'''%02''';C,'''%04'''); | :**X(A,%a;B,%b)Y(A,%a;C,%c):=Z(B,%b;C,%c); is the same as X(A,'''%01''';B,'''%02''')Y(A,'''%01''';C,'''%04'''):=Z(B,'''%02''';C,'''%04'''); | ||
− | ::In the first case, the feature B is added to the head of X and the feature C is added to its argument; the relation Y is deleted. In the second case, the feature C is added to the argument of Y, and Z is made between the arguments of X and Y | + | ::In the first case, the feature B is added to the head of X and the feature C is added to its argument; the relation Y is deleted. In the second case, the feature C is added to the argument of Y, and Z is made between the arguments of X and Y. |
;If omitted, right side indexes are automatically co-indexed with the left side ones: | ;If omitted, right side indexes are automatically co-indexed with the left side ones: | ||
:*X(;):=Y(;); is the same as X('''%01''';'''%02'''):=Y('''%01''';'''%02'''); | :*X(;):=Y(;); is the same as X('''%01''';'''%02'''):=Y('''%01''';'''%02'''); |
Revision as of 16:42, 29 April 2011
S-rule (syntactic rule) is the formalism used for describing syntactic structures and syntactic operations in the UNLarium framework.
Contents |
When to use S-rules
S-rules are used for:
- composition, i.e., creating compounds out of the base forms (such as "take">"take into account");
- periphrasis, i.e., generating analytic grammatical structures, such as in ("love">"will love")
- subcategorization, i.e., defining the number and the type of arguments of a given base form;
- case marking, i.e., defining the grammatical cases of the arguments of a given base form;
- agreement, i.e., concord between different parts of a phrase;
- distribution, i.e., defining the precedence of word forms;
- adjacency, i.e., defining the distance between syntactic branches;
- projection, i.e., projecting syntactic structures out of the constituents;
- movement, i.e., moving nodes and branches to different places in the syntactic structure; and
- mapping, i.e., defining correspondences between semantic relations and syntactic relations.
When not to use S-rules
S-rules are not used for for affixation (prefixation, infixation, suffixation) or for changes that involve only sequences of words, which must be addressed by A-rules and L-rules, respectively.
Types of S-rules
There are four types of S-rules:
- Change
<CONDITION> := <RELATION>;
- Change the attributes of the constituents of the relation. The relation itself is not affected. Features are added through "+" and deleted through "-".
- VA(%head;%adjt):=VA(%head,+C;%adj,-D); (add the feature C to the head and remove the feature D from the adjunct)
- Create
<CONDITION> := +<RELATION>;
- Create a new relation. Nodes to be created must be defined as strings (between quotes) or lemmas (between brackets), if not co-indexed to an existing node.
- VA(%head;%adjt):=+VC(%head;"c"); (add the relation VC between the head and "c", which is created.)
- Delete
<CONDITION> := -<RELATION>;
- Delete a relation between the head and the argument. The head and the argument are not deleted.
- VA(%head;%adjt):=-VA(%head;%adjt); (delete the relation VA between the head and its arguments. The nodes are not deleted)
- Replace
<RELATION> := <RELATION>;
- Replace the relation in the left side by the one in the right side
- VA(%head;%any):=VC(%head;%any); (replace the relation VA by VC)
- Two special cases of replacement are
- Merge
- <RELATION><RELATION> := <RELATION>;
- Replace the relations in the left side by the ones in the right side.
- VA(%head;%adjt)VC(%head;%comp):=VB(VB(%head;%adjt);%comp); (VA and VC are deleted, and VB is created)
- Divide
- <RELATION> := <RELATION><RELATION>;
- Replace the relation in the left side by those in the right side.
- VA(%head;%adjt):=VC(%head;%x)VC(%head;%y); (VC is deleted, and the two VAs are created)
Where:
- <CONDITION> (to be repeated 0 or more times) may be a tag or a <RELATION> that defines when the rule is applied. It may be empty in general cases (i.e., if the rule is always applied).
- <RELATION> (to be repeated 1 or more times) may be:
- a syntactic relation containing the <HEAD>, in case of head-only relations (VH, NH, JH, PH, IH, CH, AH, DH), or the <HEAD> and <ARGUMENT> (i.e, complement, adjunct or specifier), in case of binary relations (VA, VC, VS, VB, NA, NC, NS, etc).
- a semantic relation, containing the <SOURCE> and the <TARGET>.
- <HEAD>, <ARGUMENT>, <SOURCE> and <TARGET> may be expressed as
- a "string" (strings come between parentheses);
- a [lemma] (lemmas come between square brackets);
- a feature or a set of features, separated by comma, and extracted from the UNDLF Tagset;
- an index;
- an action, to be performed by adding features (through "+"), deleting features (through "-"), or through the right side of an A-rule (i.e., prefixation, suffixation, infixation); or
- a <RELATION> itself (i.e., rules may be recursive).
Observations
- The <CONDITION> field may be empty in change, create and delete rules, in case of unconditional change, creation or deletion. It is obligatory in replace rules
-
- VA(+C); (add the feature C to all adjuncts to the head in the verbal phrase)
- +VA("a"); (add an adjunct "a" to the head of the verbal phrase, whatever the case)
- -VA(C); (delete all adjuncts to the head of the verbal phrase that have the feature C)
- The <HEAD> and the <ARGUMENT> may be empty in case of no change. Empty heads are automatically extended
- Binary relations (?A, ?S, ?C)
- VA(); (no head nor argument: the relation is automatically extended to "VA(;);" )
- VA(;); (same as above)
- VA("a"); (argument only: the relation is automatically extended to "VA(;"a");" )
- VA("a";); (head only)
- VA("a";"b"); (head and argument)
- Unary relations (?H)
- VH(); (no head)
- VH("a"); (head)
- Relations are always juxtaposed (they must not be separated by ",")
- VS("b")VC("c")VA("d");
VS("b"),VC("c"),VA("d");- Order is not important between relations, but essential between constituents of the same relation
- VS("b")VC("c")VA("d") = VC("c")VA("d")VS("b") = VA("d")VC("c")VS("b")
- VA("a";"b"); ≠ VA("b";"a");
- Arguments of relations may be expressed by A-rules, but only in the right side of rules
- VA(0>"a"); (the verbal adjuncts, if any, receive an "a" as suffix)
- Rules are conservative. Features will be preserved unless explicitly deleted (through "-")
- VC(%comp,ACC):=VC(%comp,NOM); (is the same as "VC(%comp,ACC):=VC(%comp,+NOM);" i.e., add the feature "NOM" to the complements of verb that have the feature "ACC"; the feature "ACC" will be preserved and not replaced by "NOM")
- VC(%comp,ACC):=VC(%comp,NOM,-ACC); (add the feature "NOM" and delete the feature "ACC" from the complements of the verb that have the feature "ACC")
- Features and strings should not be repeated in the right side except in case of deletion or change. Indexes may be repeated for clarity.
- VC(%comp,ACC):=VC(%comp,+NOM); (the feature "ACC" should not be repeated in the right side of the rule)
- VC(%comp,"a"):=VC(%comp,+NOM); (the string "a" should not be repeated in the right side of the rule)
- A node may have as many features as necessary, but one single string or lemma
- VC(%comp,"a"):=VC(%comp,"b"); ("a" is replaced by "b")
- Strings are represented between "quotes" while lemmas are represented between [brackets].
- The UNLarium distinguishes between strings (to be represented between "quotes") and lemmas (to be represented between [brackets]). The difference between strings and lemmas has to do with variance and the dictionary status: if the constituent is expected to figure as an entry in the dictionary (e.g., "in", "the", "after", "love", "sense", etc) or if may vary (e.g., if it may be inflected, or further composed by specification, adjunction or complementation), it must be represented between brackets; if it's a full phrase whose internal structure is not relevant, because invariant, it must come between quotes:
- VA("into account"); (the string "into account" does not vary: take > take into account,
take into more account) - VC([sense]); (the term "sense" may be further specified: make > make sense, make any sense, make no sense, etc).
- VA("into account"); (the string "into account" does not vary: take > take into account,
- Negation
- "^" is used for negation, and may be applied over features, strings or relations:
- VA(^NOU); (if the adjunct does not have the feature "NOU")
- VA(^"a"); (if the adjunct is not the string "a")
- ^VA("a"); (if there is no VA relation between the head and "a")
- S-rules always end in ";"
- VA("a");
VA("a")
Indexes
- Nodes are always indexed in S-rules
- Indexes (%) are used for indexing nodes, attributes and values inside and between the left (condition) and the right side of rules.
- X(%a;%b)Y(%a;%c); (the head of X is also the head of Y)
- Indexes as variables
- Indexes are features and may be used as variables
- X(%a;%b)Y(%a;%c):=Z(%b;%c); (if the head of the relation X is the head of the relation Y, delete X and Y and create Z between the arguments of X and Y)
- X(%a,A;%b,B):=X(%a;%b,+C,-B); (add the feature C to the argument of X and remove the feature B from it if the head of X has the feature A)
- If omitted, indexes are assigned by default, according to the position
-
- X(A;B)Y(C;D)Z(E;F); is the same as X(A,%01;B,%02)Y(C,%03;D,%04)Z(E,%05;F,%06);
- X(A;B):=X(;+C,-B); is the same as X(A,%01;B,%02):=X(%01;+C,-B,%02);
- X(A;B):=X(+C,-B); is the same as X(A,%01;B,%02):=X(%01;+C,-B,%02); (same as above: the relation is automatically extended if the head is empty)
- However
- X(A;B)Y(A;C):=Z(B;C); is different from X(%a;%b)Y(%a;%c):=Z(%b;%c);
- X(A;B)Y(A;C):=Z(B;C); is the same as X(A,%01;B,%02)Y(A,%03;C,%04):=Z(B,%01;C,%02); while
- X(A,%a;B,%b)Y(A,%a;C,%c):=Z(B,%b;C,%c); is the same as X(A,%01;B,%02)Y(A,%01;C,%04):=Z(B,%02;C,%04);
- In the first case, the feature B is added to the head of X and the feature C is added to its argument; the relation Y is deleted. In the second case, the feature C is added to the argument of Y, and Z is made between the arguments of X and Y.
- X(A;B)Y(A;C):=Z(B;C); is different from X(%a;%b)Y(%a;%c):=Z(%b;%c);
- If omitted, right side indexes are automatically co-indexed with the left side ones
-
- X(;):=Y(;); is the same as X(%01;%02):=Y(%01;%02);
- Right side indexes are to explicitly defined if order is to be altered
-
- X(;):=Y(%02;%01);
- Indexes can be replaced by user-defined labels made of any sequence of alphabetic characters and underscore
- X(A,%a;B,%b)Y(C,%c;D,%d)Z(E,%e;F,%f)
- %01 = A, %02 = B, %03 = C, %04 = D, %05 = E, %06 = F and
- %a = A, %b = B, %c = C, %d = D, %e = E, %f = F
- Numeric characters cannot be used as user-defined indexes
- X(A,%03;B,%05)
- %01 = A, %02 = B (there is no %03 nor %05)
- To avoid ambiguities, users are strongly recommended to replace default values by customized labels
-
- X(A,%a;B,%b)
- instead of simply X(A;B) or X(A,%01;B,%02)
- In case of sub-nodes, the parent node must be informed by the syntax <PARENT NODE><CHILD NODE>, where <PARENT NODE> may be, itself, a sub-node
- X(Y(A;B);C)
- %01 = Y(A;B), %02 = C, %01%01 = A, %01%02 = B
- X(Y(Z(A;B);C);D)
- %01 = Y(Z(A;B);C), %02 = D, %01%01 = Z(A;B), %01%02 = C, %01%01%01 = A, %01%01%02 = B
- Indexation is not affected by repetition
- X(A;B)Y(A;C)Z(A;D)
- %01 = A, %02 = B, %03 = A, %04 = C, %05 = A, %06 = D (and %01 = %03 = %05)
- Empty nodes are also indexed
- X(;)
- %01 = first node of X, %02 = second node of X
- Indexes may be used both in the left and in the right side of rules
- X(%a;%b):=Y(%b;%a); (the first node of the X relation becomes the second node of the Y relation)
- X(%a;)Y(%a;):=Z(%a); (if the first node of the X relation is the first node of the Y relation then make it the single node of a Z relation)
- Indexes may also be used to transfer attribute values expressed in the format ATTRIBUTE=VALUE
- X(A,%a,ATT1=VAL1;B,%b):=X(%a;%b,ATT1=%a); (the value "VAL1" of "ATT1" of %a is copied to the node %b)
Examples
Examples of S-rules:
- composition
- VA("into account"); (add the string "into account" as the adjunct of the verb)
- subcategorization
- VC(PH("in")); (the complement of the verb is a prepositional phrase headed by the preposition "in")
- agreement
- VS(ANUM,APER); (the specifier of the verb assigns number (ANUM) and person (APER) to its head
- case marking
- VS(NOM); (the specifier of the verb receives the case nominative (NOM)
- distribution
- VA(>>); (the adjunct of the verb comes at the right side of the verb after a blank space)
- adjacency
- VA(AJ2); (the adjunct of the verb integrates the second projection of the head)
- periphrasis
- VH(%vh,FUT):=+IC([will];%vh,+INF);
- projection
- VS(%head;%spec)VB(%head;%comp):=VP(VB(%head;%comp);%spec); (integrate the two relations on the left side into a single relation)
- mapping
- agt(%source;%target):=VS(%source;%target); (the agent relation is mapped into a VS relation)
Formal Syntax
S-rules comply with the following formal syntax:
<S-RULE> ::= <CONDITION> ":=" (<RELATION>)+";" <CONDITION> ::= <TAG>(","<TAG>)* | (<RELATION>)* <RELATION> ::= <SYNTACTIC RELATION> | <SEMANTIC RELATION> <SEMANTIC RELATION> ::= <UNL RELATION> "(" <NODE> ";" <NODE> ")" <SYNTACTIC RELATION> ::= <NL RELATION> "(" (<NODE>";")? <NODE> ")" <UNL RELATION> ::= {one of the head-driven syntactic relations defined in the UNL Specs} <NL RELATION> ::= {one of the head-driven syntactic relations defined in the UNDLF Tagset} <NODE> ::= <FEATURE>(","<FEATURE>)* <FEATURE> ::= <ID>|<TAG>|"""<STRING>"""|"["<STRING>"]"|<DIRECTION>|<SYNTACTIC RELATION>|<ACTION> <ID> ::= "%"[a-zA-Z_0-9]+ <TAG> ::= {one of the tags defined in the UNDLF Tagset} <STRING> ::= [a..Z]+ <DIRECTION> ::= ">"|">>"|"<"|"<<" <ACTION> ::= <PREFIXATION> | <SUFFIXATION> | <INFIXATION> | <REPLACEMENT> (cf. A-rule)
where
<a> = a is a non-terminal symbol
"a" = a is a constant
a | b = a or b
(a)? = a can be repeated 0 or one time
(a)* = a can be repeated 0 or more times
(a)+ = a can be repeated 1 or more times