Node
From UNL Wiki
(Difference between revisions)
(→Strings, headwords and UW's) |
(→Strings, headwords and UW's) |
||
Line 117: | Line 117: | ||
== Strings, headwords and UW's == | == Strings, headwords and UW's == | ||
+ | During the [[tokenization]] | ||
+ | |||
+ | |||
+ | |||
+ | : [a] will match the node associated to the entry [a] retrieved from the dictionary, no matter its current realization, which may be affected by other rules (the original [a] may have been replaced, for instance, by "b", but will still be indexed to the entry [a]) | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
*'''"Double quotes"''' are always used to represent strings: "a" will match only the string "a" | *'''"Double quotes"''' are always used to represent strings: "a" will match only the string "a" | ||
− | *'''<nowiki>[Simple square brackets]</nowiki>''' are always used to represent natural language entries (headwords) in the dictionary | + | *'''<nowiki>[Simple square brackets]</nowiki>''' are always used to represent natural language entries (headwords) in the dictionary |
*'''<nowiki>[[Double square brackets]]</nowiki>''' are always used to represent UWs: <nowiki>[[a]]</nowiki> will match the node associated to the UW <nowiki>[[a]]</nowiki> | *'''<nowiki>[[Double square brackets]]</nowiki>''' are always used to represent UWs: <nowiki>[[a]]</nowiki> will match the node associated to the UW <nowiki>[[a]]</nowiki> |
Revision as of 15:39, 16 August 2013
A node is the most elementary unit in the grammar. It is the result of the tokenization process, and corresponds to the notion of "lexical item". At the surface level, a natural language sentence is considered a list of nodes, and a UNL graph a set of relations between nodes.
Contents |
Elements
Any node is a vector (one-dimensional array) containing the following necessary elements:
- a string, to be represented between "quotes", which expresses the actual state of the node;
- a headword, to be represented between [square brackets], which expresses the original value of the node in the dictionary;
- a UW, to be represented between [[double square brackets]], which expresses the UW value of the node;
- a feature or set of features, which express the features of the node;
- an Index, preceded by the symbol %, which is used to reference the node;
Basic symbols
Symbol | Definition | Example |
---|---|---|
( ) | node | (%a) |
" " | string | "went" |
[ ] | natural language entry (headword) | [go] |
[[ ]] | UW | [[to go(icl>to move)]] |
// | regular expression | /a{2,3}/ = aa,aaa |
^ | not | ^a = not a |
{ | } | or | {a|b} = a or b |
% | index for nodes, attributes and values | %x |
# | index for sub-NLWs | #01 |
= | attribute-value assignment | POS=NOU |
! | rule trigger | !PLR |
& | merge operator | %x&%y |
? | dictionary lookup operator | ?[a] |
Examples
Examples of nodes:
- ("ing") (a node making reference only to its actual string value)
- ([book]) (a node making reference only to its headword,i.e., its original state in the dictionary)
- ([[book(icl>document)]]) (a node making reference only to its UW value)
- (NUM) (a node making reference only to one of its features)
- (POS=NOU) (a node making reference only to one of its features in the attribute-value pair format)
- (%x) (a node making reference only to its unique index)
- ("string",[headword],[[UW]],feature1,feature2,...,attribute1=value1,attribute2=value2,...,%x) (complete node)
Properties
- Nodes are enclosed between (parentheses)
- ("a") is a node
- "a" is not a node
- The elements of a node are separated by comma
- ("a",[a],[[a]],A,B,A=C,%a)
- The order of elements inside a node is not relevant.
- ("a",[a],[[a]],A,B,A=C,%a) is the same as ([[a]],B,A,"a",[a],A=C,%a)
- Nodes may have one single string, headword, UW and index, but may have as many features as necessary
("a","b")(a node may not contain more than one string)([a],[b])(a node may not contain more than one headword)([[a]],[[b]])(a node may not contain more than one UW)(%a,%b)(a node may not contain more than one index)- (A,B,C,D,...,Z) (a node may contain as many features as necessary)
- A node may be referred by any of its elements, but only the index make it unique
- ("a") refers to all nodes where actual string = "a"
- ([a]) refers to all nodes where headword = [a]
- ([[a]]) refers to all nodes where UW = [[a]]
- (A) refers to all nodes having the feature A
- ("a",[a],[[a]],A) refers to all nodes having the feature A where string = "a" and headword = [a] and UW = [[a]]
- (%a) refers to the specific node with the index %a
- Nodes are automatically indexed according to a position-based system if no explicit index is provided (see Indexation)
- ("a")("b") is actually ("a",%01)("b",%02)
- Regular expressions may be used to make reference to any element of the node, except the index
- ("/a{2,3}/") refers to all nodes where string is a sequence of 2 to 3 characters "a"
- ([/a{2,3}/]) refers to all nodes where headword is a sequence of 2 to 3 characters "a"
- ([[/a{2,3}/]]) refers to all nodes where UW is a sequence of 2 to 3 characters "a"
- (/a{2,3}/) refers to all nodes having a feature that is a sequence of 2 to 3 characters "a"
- Nodes may contain disjoint features enclosed between {braces} and separated by vertical bar
- ({A|B}) refers to all nodes having the feature A OR B
- Node features may be expressed as simple attributes, or attribute-value pairs
- (MCL) - feature as an attribute: refers to all nodes having the feature MCL
- (GEN=MCL) - feature as an attribute-value pair, which is the same as (GEN,MCL): refers to all nodes having the features GEN and MCL.
- Attribute-value pairs may be used to create co-reference between different nodes (as in agreement)
- (%x,GEN)(%y,GEN=%x) - the value of the attribute GEN of the node %x is the same of the attribute GEN of the node %y (see Indexation)
Strings, headwords and UW's
During the tokenization
- [a] will match the node associated to the entry [a] retrieved from the dictionary, no matter its current realization, which may be affected by other rules (the original [a] may have been replaced, for instance, by "b", but will still be indexed to the entry [a])
- "Double quotes" are always used to represent strings: "a" will match only the string "a"
- [Simple square brackets] are always used to represent natural language entries (headwords) in the dictionary
- [[Double square brackets]] are always used to represent UWs: [[a]] will match the node associated to the UW [[a]]