LSS

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
m (moved SSS to LSS)
Line 1: Line 1:
Surface Sentence Structures (SSS) are sequences of part-of-speech tags.  
+
Linear Sentence Structures (LSS) are sequences of part-of-speech tags.  
  
 
== Methodology ==
 
== Methodology ==
SSS's are extracted from the [[tokenization]] of a corpus using the enumerative dictionary, i.e., the list of the word forms available for a given language. As the enumerative dictionary may contain multiword expressions, the length of the SSS may not correspond to the number of isolated words in the sentence, but to the number of dictionary entries resulting from the application of the principle of the longest first. Additionally, as the tokenization does not perform any lexical disambiguation, the SSS contain all possible lexical categories of each of the components. The punctuation of the original sentence is preserved.  
+
LSS's are extracted from the [[tokenization]] of a corpus using the enumerative dictionary, i.e., the list of the word forms available for a given language. As the enumerative dictionary may contain multiword expressions, the length of the LSS may not correspond to the number of isolated words in the sentence, but to the number of dictionary entries resulting from the application of the principle of the longest first. Additionally, as the tokenization does not perform any lexical disambiguation, the SSS contain all possible lexical categories of each of the components. The punctuation of the original sentence is preserved.  
  
 
== Categories ==
 
== Categories ==
SSS uses the values of the attribute [[LEX|Lexical Category (LEX)]]:
+
LSS uses the values of the attribute [[LEX|Lexical Category (LEX)]]:
 
*A (adverb)
 
*A (adverb)
 
*J (adjective)
 
*J (adjective)
Line 16: Line 16:
  
 
== Ambiguities ==
 
== Ambiguities ==
SSS brings all possible categories of a given string:
+
LSS brings all possible categories of a given string:
 
*AN = a string that can be an adverb and a noun
 
*AN = a string that can be an adverb and a noun
 
*AJN = a string that can be an adverb, an adjective and a noun
 
*AJN = a string that can be an adverb, an adjective and a noun
Line 24: Line 24:
 
== Example ==
 
== Example ==
 
  SENTENCE: The book is on the table.<br />
 
  SENTENCE: The book is on the table.<br />
  SSS: AO JNV NOV AJNO AO NV.
+
  LSS: AO JNV NOV AJNO AO NV.
 
Because:
 
Because:
 
*"the" may be an adverb or other (determiner)
 
*"the" may be an adverb or other (determiner)
Line 31: Line 31:
 
*"on" may be an adjective, an adverb, a noun or other (preposition)  
 
*"on" may be an adjective, an adverb, a noun or other (preposition)  
 
*"table" may be a noun or a verb
 
*"table" may be a noun or a verb
Note that the punctuation (blank spaces and punctuation signs) is preserved in the SSS.
+
Note that the punctuation (blank spaces and punctuation signs) is preserved in the LSS.
  
 
== Goal ==
 
== Goal ==
The main goal of the SSS is to help users create grammars.
+
The main goal of the LSS is to induce disambiguation rules.

Revision as of 19:23, 3 July 2013

Linear Sentence Structures (LSS) are sequences of part-of-speech tags.

Contents

Methodology

LSS's are extracted from the tokenization of a corpus using the enumerative dictionary, i.e., the list of the word forms available for a given language. As the enumerative dictionary may contain multiword expressions, the length of the LSS may not correspond to the number of isolated words in the sentence, but to the number of dictionary entries resulting from the application of the principle of the longest first. Additionally, as the tokenization does not perform any lexical disambiguation, the SSS contain all possible lexical categories of each of the components. The punctuation of the original sentence is preserved.

Categories

LSS uses the values of the attribute Lexical Category (LEX):

  • A (adverb)
  • J (adjective)
  • N (noun)
  • V (verb)
  • D (determiner)

etc. The symbol # is used for words not found in the dictionary. The punctuation is preserved.

Ambiguities

LSS brings all possible categories of a given string:

  • AN = a string that can be an adverb and a noun
  • AJN = a string that can be an adverb, an adjective and a noun
  • NV = a string that can be a noun and a verb

etc.

Example

SENTENCE: The book is on the table.
LSS: AO JNV NOV AJNO AO NV.

Because:

  • "the" may be an adverb or other (determiner)
  • "book" may be an adjective, a noun or a verb
  • "is" may be a noun, a verb or other (auxiliary)
  • "on" may be an adjective, an adverb, a noun or other (preposition)
  • "table" may be a noun or a verb

Note that the punctuation (blank spaces and punctuation signs) is preserved in the LSS.

Goal

The main goal of the LSS is to induce disambiguation rules.

Software