LSS

From UNL Wiki

Revision as of 22:21, 30 April 2013 by Martins (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Syntactic Surface Structures (SSS) are sequences of part-of-speech tags.

Methodology

SSS's are extracted from the tokenization of a corpus using the enumerative dictionary, i.e., the list of the word forms available for a given language. As the enumerative dictionary may contain multiword expressions, the length of the SSS may not correspond to the number of isolated words in the sentence, but to the number of dictionary entries resulting from the application of the principle of the longest first. Additionally, as the tokenization does not perform any lexical disambiguation, the SSS contain all possible lexical categories of each of the components. The punctuation of the original sentence is preserved.

Ambiguities

SSS brings all possible categories of a given string:

AN = a string that can be an adverb and a noun
AJN = a string that can be an adverb, an adjective and a noun
NV = a string that can be a noun and a verb

etc.

Example

SENTENCE: The book is on the table.

SSS: AO JNV NOV AJNO AO NV.

Because:

"the" may be an adverb or other (determiner)
"book" may be an adjective, a noun or a verb
"is" may be a noun, a verb or other (auxiliary)
"on" may be an adjective, an adverb, a noun or other (preposition)
"table" may be a noun or a verb

Note that the punctuation is preserved in the SSS.

Goal

The main goal of the SSS is to help users create grammars.

LSS

Contents

Methodology

Categories

Ambiguities

Example

Goal

Views

Personal tools

Search

UNL

Lingware

Software

UNL Program

Navigation

Toolbox

Print/export