Node

From UNL Wiki

(Difference between revisions)

Revision as of 16:43, 16 August 2013

A node is the most elementary unit in the grammar. It is the result of the tokenization process, and corresponds to the notion of "lexical item". At the surface level, a natural language sentence is considered a list of nodes, and a UNL graph a set of relations between nodes.

Elements

Any node is a vector (one-dimensional array) containing the following necessary elements:

a string, to be represented between "quotes", which expresses the actual state of the node;
a headword, to be represented between [square brackets], which expresses the original value of the node in the dictionary;
a UW, to be represented between [[double square brackets]], which expresses the UW value of the node;
a feature or set of features, which express the features of the node;
an Index, preceded by the symbol %, which is used to reference the node.

Basic symbols

Basic symbols used in the UNL framework
Symbol	Definition	Example
( )	node	(%a)
" "	string	"went"
[ ]	natural language entry (headword)	[go]
[[ ]]	UW	[[to go(icl>to move)]]
//	regular expression	/a{2,3}/ = aa,aaa
^	not	^a = not a
{ \| }	or	{a\|b} = a or b
%	index for nodes, attributes and values	%x
#	index for sub-NLWs	#01
=	attribute-value assignment	POS=NOU
!	rule trigger	!PLR
&	merge operator	%x&%y
?	dictionary lookup operator	?[a]

Examples

Examples of nodes:

("ing") (a node making reference only to its actual string value)
([book]) (a node making reference only to its headword,i.e., its original state in the dictionary)
([[book(icl>document)]]) (a node making reference only to its UW value)
(NUM) (a node making reference only to one of its features)
(POS=NOU) (a node making reference only to one of its features in the attribute-value pair format)
(%x) (a node making reference only to its unique index)
("string",[headword],[[UW]],feature1,feature2,...,attribute1=value1,attribute2=value2,...,%x) (complete node)

Properties

Nodes are enclosed between (parentheses): ("a") is a node; "a" is not a node
The elements of a node are separated by comma: ("a",[a],[[a]],A,B,A=C,%a)
The order of elements inside a node is not relevant.: ("a",[a],[[a]],A,B,A=C,%a) is the same as ([[a]],B,A,"a",[a],A=C,%a)
Nodes may have one single string, headword, UW and index, but may have as many features as necessary: ~~("a","b")~~ (a node may not contain more than one string); ~~([a],[b])~~ (a node may not contain more than one headword); ~~([[a]],[[b]])~~ (a node may not contain more than one UW); ~~(%a,%b)~~ (a node may not contain more than one index); (A,B,C,D,...,Z) (a node may contain as many features as necessary)
A node may be referred by any of its elements, but only the index make it unique: ("a") refers to all nodes where actual string = "a"; ([a]) refers to all nodes where headword = [a]; ([[a]]) refers to all nodes where UW = [[a]]; (A) refers to all nodes having the feature A; ("a",[a],[[a]],A) refers to all nodes having the feature A where string = "a" and headword = [a] and UW = [[a]]; (%a) refers to the specific node with the index %a
Nodes are automatically indexed according to a position-based system if no explicit index is provided (see Indexation): ("a")("b") is actually ("a",%01)("b",%02)
Regular expressions may be used to make reference to any element of the node, except the index: ("/a{2,3}/") refers to all nodes where string is a sequence of 2 to 3 characters "a"; ([/a{2,3}/]) refers to all nodes where headword is a sequence of 2 to 3 characters "a"; ([[/a{2,3}/]]) refers to all nodes where UW is a sequence of 2 to 3 characters "a"; (/a{2,3}/) refers to all nodes having a feature that is a sequence of 2 to 3 characters "a"
Nodes may contain disjoint features enclosed between {braces} and separated by vertical bar: ({A|B}) refers to all nodes having the feature A OR B
Node features may be expressed as simple attributes, or attribute-value pairs: (MCL) - feature as an attribute: refers to all nodes having the feature MCL; (GEN=MCL) - feature as an attribute-value pair, which is the same as (GEN,MCL): refers to all nodes having the features GEN and MCL.
Attribute-value pairs may be used to create co-reference between different nodes (as in agreement): (%x,GEN)(%y,GEN=%x) - the value of the attribute GEN of the node %x is the same of the attribute GEN of the node %y (see Indexation)

Strings, headwords and UW's

During the tokenization

[a] will match the node associated to the entry [a] retrieved from the dictionary, no matter its current realization, which may be affected by other rules (the original [a] may have been replaced, for instance, by "b", but will still be indexed to the entry [a])

"Double quotes" are always used to represent strings: "a" will match only the string "a"
[Simple square brackets] are always used to represent natural language entries (headwords) in the dictionary
[[Double square brackets]] are always used to represent UWs: [[a]] will match the node associated to the UW [[a]]

@@ Line 8: / Line 8: @@
 *a '''feature or set of features''', which express the features of the node;
 *an [[Index]], preceded by the symbol %, which is used to reference the node.
-The elements of a node can be:
-*'''native''', if inherited from the dictionary in the [[tokenization]] process
-*'''non-native''', if assigned by transformation rules
 == Basic symbols ==

Node

Revision as of 16:43, 16 August 2013

Contents

Elements

Basic symbols

Examples

Properties

Strings, headwords and UW's

Views

Personal tools

Search

UNL

Lingware

Software

UNL Program

Navigation

Toolbox

Print/export