Grammar Specs
The following Grammar Specs are used for writing rules for the UNDL Foundation tools (IAN, EUGENE, SEAN, NORMA, etc.).
Contents |
Basic Symbols
Symbol | Definition | Example |
---|---|---|
( ) | node | (%a) |
" " | string | "went" |
[ ] | natural language entry (headword) | [go] |
[[ ]] | UW | [[to go(icl>to move)]] |
// | regular expression | /a{2,3}/ = aa,aaa |
rel(x;y) | relation | agt(kill;Peter) |
^ | not | ^a = not a |
{ | } | or | {a|b} = a or b |
% | index for nodes, attributes and values | %x |
: | scope ID | :01 |
# | index for sub-NLWs | #01 |
= | attribute-value assignment | POS=NOU |
! | rule trigger | !PLR |
& | merge operator | %x&%y |
? | dictionary lookup operator | ?[a] |
Basic Concepts
Node
- main article: Node
A node is the most elementary unit in the graph. It is the result of the tokenization process, and corresponds to the notion of "lexical item". At the surface level, a natural language sentence is considered a list of nodes, and a UNL graph a set of relations between nodes.
Relation
- main article: Relation
In order to form a natural language sentence or a UNL graph, nodes are inter-related by relations.
Hyper-Node
- main article: Hyper-Node
A hyper-node is a sub-graph, i.e., a node containing relations between nodes.
Hyper-Relation
- main article: Hyper-Relation
A hyper-relation is a relation between relations.
Rule
Grammars are sets of rules used to go from UNL into natural language, or from natural language into UNL. In the UNL framework, there can be two different types of rules:
T-rules
- main article:T-rule
T-rules are used to perform actions and follow the very general formalism
α:=β;
where the left side α is a condition statement, and the right side β is an action to be performed over α.
There are several different especial types of T-rules:
- A-rule is a specific type of T-rule used for affixation (prefixation, infixation, suffixation)
- C-rule is a specific type of T-rule used for composition (word formation in case of compounds and multiword expressions)
- L-rule is a specific type of T-rule used for handling word order
- N-rule is a specific type of T-rule used for segmenting sentences and normalizing the input text
- S-rule is a specific type of T-rule used for handling syntactic structures
Examples of T-rules
- PLR:=0>"s"; (A-rule: add "s" in case of plural, as in book>books)
- MTW:=+VA("into account",PP); (C-rule: add the prepositional phrase "into account" as an adjunct to the verbal phrase (VA) in order to form the multiword expression, as in take>take into account)
- (ART,%x)(QUA,%y):=(%y)(%x); (L-rule: reverse the order ART+QUA to QUA+ART, as in the all>all the)
- ("don't"):=("do not"); (N-rule: replace the contraction "don't" by "do not")
- (V,%x)(N,%y):=VC(%x;%y); (S-rule: replace the linear relation between a verb and a noun by the syntactic relation VC between them)
D-rules
- main article: D-rule
D-rules are used to control the action of T-rules. They are used to control the dictionary retrieval (in tokenization) and to prevent or to induce the application of rules in transformation.
D-rules follow the syntax:
α=P;
where the left side α is a statement and the right side P is an integer from 0 to 255 that indicates the probability of occurrence of α.
Examples of D-rules
- (ART)(VER)=0; (there cannot be any article before a verb)
- agt(^V,^J;)=0; (the source node of an agent relation must be either a verb or an adjective)
- (D)(N)=1; (determiners may come before nouns)