UNL2010

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(SCOPES)
 
(95 intermediate revisions by 2 users not shown)
Line 1: Line 1:
The guidelines here stated were first derived from the fully-manual UNLization experience of translating the integral text of Cratylus, by Plato, from English into UNL ([http://www.ronaldomartins.pro.br/cratylus the Cratylus Project]), and have been continuously extended and amended in order to be as comprehensive as possible. They were built in order to normalize and standardize UNLization strategies (which otherwise would be subject to a somewhat undesirable variance); and to guide the development of the natural language generation grammars (which would benefit from these standards as a sort of NL-to-UNL transfer grammar).
+
The specifications here stated are still experimental and tentative, and have been continuously extended and amended in order to be as comprehensive as possible. They follow the general strategies defined in the [http://www.undl.org UNL 2005 Specifications] (version of June 7th, 2005), but introduce several important changes derived from different UNLization experiences. Although formally adopted in the UNDL Foundation tools, projects and certificates, they should not be taken yet as the official specs, as they are still under construction and have not been widely discussed with the UNL Community.  
  
As a general UNLization policy, we have tried to follow the UNL 2005 Specifications (version of June 7, 2005, published by the [http://www.undl.org UNL Centre]) as close as possible. This is to say that texts should be treated as semantic networks, where paragraphs and sentences are represented as hypernodes, which in turn are represented as sets of binary relations between annotated nodes (representing both words, either simple, compound or complex — the so-called UWs; and clauses, either subordinate, embedded or coordinative — the so-called SCOPEs).
+
*[[Introduction to UNL]]
 
+
*[[Universal Words]]
Nevertheless, these guidelines should not be taken as the UNL Specifications themselves, as long as 1) they are rather experimental and tentative; 2) they differ, in several points, from the current version of the Specifications; 3) they do not follow some of the existing UNLization policies; and 4) they are not provided and have not been approved yet by the UNL Centre.
+
*[[Universal Attributes]]
 
+
*[[Universal Relations]]
== PREMISES ==
+
*[[UNL sentence|UNL sentence structure]]
These guidelines are derived from two main premises:
+
*[[UNL document|UNL document structure]]
The UNL representation is an intrepretation rather than a translation of a given text.
+
The main goal of the UNLization process is to represent the knowledge structure of the source text, which should be detached from its verbal structure. This means that the UNL representation should not be committed to replicate the lexical and the syntactic choices of the original, but should focus in representing, in a language-independent and non-ambiguous format, one of its possible readings, preferably the most conventional one.
+
The UNL representation should be as semantically complete as possible.
+
This means that, whenever possible, all the semantic valencies of the original text should be saturated, including anaphora, ellipses, presuppositions and implicatures. Pronouns and pro-forms, for instance, are expected to be replaced by their antecedents, and should not be represented in UNL, except in case of exophoric reference (indefinite pronouns, interrogative pronouns and personal pronouns that are not coindexed to any existing antecedent).
+
 
+
== UNL EXPRESSION ==
+
For the time being, the network macrostructure has not been addressed, but it seems clear that the relation “nxt”, proposed by the UNLCenter to link sentences and paragraphs, is syntactic rather than semantic, and it is not appropriate for a network claimed to be mostly semantic. Some alternatives have been considered (especially Discourse Representation Theory - DRT, proposed by Hans Kamp; and Rhetorical Structure Theory - RST, proposed by William Mann and Sandra Thompson), but they are still under investigation.
+
 
+
The document structure is also subject to change, and it is likely to move to a XML schema, which is still under development. For the moment, the syntax defined by the UNLCenter has been kept.
+
 
+
== THREE-LAYERED REPRESENTATION ==
+
The basic assumption of the UNL approach is that the meaning conveyed by natural language sentences can be formally represented through three different types of semantic units: UWs, attributes and relations. This three-layered representation model is the cornerstone of UNL and its most distinctive feature over other semantic networks, which normally proposes only two levels: edges and vertices. Nevertheless, it poses several problems to the UNLization as the distinction between what is supposed to be represented by each unit is not always clear. In order to avoid superposition and to facilitate the enconversion process, we have tried to clearly identify the scope of each unit using the following procedures:
+
*RELATIONS represent syntactic relations (subject, object, complement, adjunct) with their corresponding semantic value;
+
*UWs represent lexemes from open classes:
+
**nouns, including proper nouns, abbreviations and acronyms;
+
**adjectives;
+
**full verbs;
+
**adverbs and adverbials; and
+
**numbers (to be always represented as Arabic numerals)
+
*ATTRIBUTES represent bound morphemes, closed classes and context-dependent information:
+
**grammatical categories (gender, number, tense, aspect, mood, voice, etc)
+
**determiners (articles and demonstratives);
+
**adpositions (prepositions, postpositions and circumpositions);
+
**auxiliary and quasi-auxiliary verbs (auxiliaries, modals, coverbs, preverbs);
+
**interjections;
+
**conjunctions;
+
**text structure (.@entry, .@topic, .@qfocus, .@emphasis, .@relative, etc);
+
**speech acts (.@request, .@suggestion, .@offer, etc);
+
**other context-dependent information (such as politeness, metaphor, irony, etc);
+
Pronouns and pro-forms are expected to be replaced by their antecedents and not to be represented in UNL, except in case of exophoric reference (indefinite pronouns, interrogative pronouns and personal pronouns that are not coindexed to any existing antecedent).
+
 
+
The main changes concerning the present UNL Specifications are the following:
+
== RELATIONS ==
+
The set of relations is exactly the same as defined in the UNL 2005 Specifications, with a single difference: the introduction of the relation “voc”, which stands for vocative, because of sentences like:
+
 
+
''[S:20] Son of Hipponicus, there is an ancient saying, that "hard is the knowledge of the good.”''
+
 
+
In cases like this, the attribute .@vocative cannot be applied, because the vocative (“Son of Hipponicus”), which seems to be semantically isolated from the main clause, cannot be introduced by any other relation of the UNL Specification.
+
 
+
== ATTRIBUTES ==
+
The set of attributes has been substantially increased to represent information concerning grammatical categories, determiners, adpositions and conjunctions. The main additions are the following:
+
*gender: @male, @female
+
*degree of comparison: @more, @less, @equal, @most, @least
+
*demonstrative: @proximal, @medial, @distal
+
*preposition: @under, @below, @above, @after, @before, etc.
+
*conjunction: @before, @after, etc.
+
*relative (for the main entry of relative clauses): @relative
+
The decision to represent closed classes as attributes instead of UWs has led to a different way of representing several natural language phenomena:
+
;this X
+
:UNL Centre: mod(X, this)
+
:These guidelines: X.@proximal
+
;X is under Y
+
:UNL Centre: plc(X, under), obj(under, Y)
+
:These guidelines: plc(X, Y.@under)
+
;bigger than Y
+
:UNL centre: man(big, more), bas(big, Y)
+
:These guidelines: bas(big.@more, Y)
+
etc.
+
 
+
Additionally, the following general principles were adopted:
+
*interjections, filled pauses, phatic expressions and short answers should be represented by the null UW (to be represented as "00") together with the attribute indicating the corresponding speech act (.@confirmation, .@surprise, etc).
+
*the attribute .@entry (mandatory in every scope, including the main one) should be placed at the left (source) side of at least one relation;
+
*the difference between mentioning and using a word (which is a quite frequent situation in a metalinguistic text such as Cratylus) should be represented by the attribute .@mention (which is not the same as "quotation");
+
*attributes should be used in alphabetical order (“.@entry.@past” instead of “.@past.@entry”).
+
 
+
== UNIVERSAL WORDS ==
+
The set of Universal Words, i.e., the UNL Dictionary, has undergone the most radical change, as we have been using the UNLWN30, a set of UWs automatically extracted out of the WordNet30. In this dictionary, UWs correspond to sets of synonyms (synsets) of English, and may have several different headwords. They are represented as 9-digit strings with the following format:
+
<POS><WORDNETID>
+
where <POS> = {1,2,3,4}, being 1 = noun, 2 = verb, 3 = adjective and 4 = adverb; <br />
+
and <WORDNETID> is the synset ID in the WN3.0.
+
 
+
== SCOPES ==
+
In order to enhance the possibility of knowledge extraction out of the UNL document, we have restricted the use of scopes only to cases involving semantic ambiguity, such as:
+
*electric [light orchestra], with scope, i.e., a "light orchestra" that is electric; or
+
*electric light orchestra, without scope, i.e., an orchestra that is both "light" and "electric".
+
 
+
== INVITATION ==
+
 
+
Finally, we ought to stress that the UNLization standards here presented are tentative and provisional, and they are subject to improvements and changes as soon as they were proved not to be the most adequate ones. In order to provide such enhancements, we would invite UNL Society members and other people interested in UNL to criticize them, to propose alternatives and to help us build an NL-to-UNL transfer grammar as comprehensive as possible.
+
 
+
== BROWSE BY ALPHABETICAL ORDER ==
+
 
+
ABBREVIATIONS
+
ACRONYMS
+
ACTIVE VOICE
+
ADJECTIVES
+
ADJUNCTS
+
ADVERBS
+
APPOSITION
+
ARTICLES
+
ASPECT
+
AUXILIARY VERBS
+
CAPITALIZATION
+
CARDINALS
+
COMMON NOUNS
+
COMPARATIVE ADJECTIVE
+
COMPARATIVE ADVERB
+
COMPARATIVE
+
COMPLEMENT
+
COMPLEX WORDS
+
COMPOUND WORDS
+
CONDITIONAL
+
CONJUNCTIONS
+
CONJUNCTS
+
CONTRACTIONS
+
COORDINATION
+
COPULA
+
DEGREE
+
DEMONSTRATIVES
+
DETERMINERS
+
DIGITS
+
DISJUNCTS
+
EQUATIONS
+
FRACTIONS
+
GENDER
+
IMPERATIVE
+
INDICATIVE
+
INTENSIFIERS
+
INTERJECTIONS
+
INTERROGATIVE
+
MODAL VERBS
+
MOOD
+
MULTIPLICATIVE
+
NOMINALIZATION
+
NUMBER
+
NUMBERS IN NAMES
+
NUMBERS IN TITLES
+
NUMERALS
+
OBJECT
+
ORDINALS
+
PASSIVE
+
PERSON
+
PERSONAL PRONOUN
+
POSSESSIVE
+
PREMODIFIERS
+
PREPOSITIONS
+
PRONOUNS
+
PROPER NOUNS
+
PUNCTUATION MARKS
+
QUANTIFIERS
+
RECIPROCAL
+
REFLEXIVE PRONOUN
+
REFLEXIVE VOICE
+
RELATIVE CLAUSE
+
RELATIVE PRONOUN
+
SPELLING
+
SUBJECT
+
SUBJUNCTIVE
+
SUBORDINATION
+
SUPERLATIVE ADJECTIVE
+
SUPERLATIVE ADVERB
+
SUPERLATIVE
+
TENSE
+
TYPOGRAPHICAL SYMBOLS
+
VERB MODIFICATION
+
VERB MODIFIERS
+
VERBS
+
VOCATIVE
+
VOICE
+
 
+
== BROWSE BY SUBJECT ==
+
 
+
ORTHOGRAPHY
+
*CAPITALIZATION
+
*DIGITS
+
*EQUATIONS
+
*FRACTIONS
+
*NUMBERS IN NAMES
+
*NUMBERS IN TITLES
+
*PUNCTUATION MARKS
+
*SPELLING
+
*TYPOGRAPHICAL SYMBOLS
+
MORPHOLOGY
+
*COMPOUND WORDS
+
*COMPLEX WORDS
+
*CONTRACTIONS
+
*CONVERSION (NOMINALIZATION)
+
*PART OF SPEECH
+
**ABBREVIATIONS
+
**ACRONYMS
+
**ADJECTIVES
+
***COMPARATIVE
+
***SUPERLATIVE
+
**ADVERBS
+
***COMPARATIVE
+
***SUPERLATIVE
+
*ARTICLES
+
*CONJUNCTIONS
+
*DEMONSTRATIVES
+
*INTERJECTIONS
+
*NOUNS
+
**COMMON NOUNS
+
**PROPER NOUNS
+
*NUMERALS
+
**CARDINALS
+
**ORDINALS
+
**MULTIPLICATIVE
+
*PREMODIFIERS
+
**DETERMINERS
+
**INTENSIFIERS
+
**QUANTIFIERS
+
*PREPOSITIONS
+
*PRONOUNS
+
**INTERROGATIVE
+
**PERSONAL
+
**POSSESSIVE
+
**RECIPROCAL
+
**REFLEXIVE
+
**RELATIVE
+
*VERBS
+
**AUXILIARY VERBS
+
**COPULA
+
**MODAL VERBS
+
*VERB MODIFIERS
+
GRAMMAR
+
*DEGREE
+
**COMPARATIVE
+
**SUPERLATIVE
+
*GENDER
+
*MOOD
+
**INDICATIVE
+
**IMPERATIVE
+
**CONDITIONAL
+
**SUBJUNCTIVE
+
*NUMBER
+
*PERSON
+
*TENSE
+
*VERB MODIFICATION
+
*ASPECT
+
*MODAL VERBS
+
*VOICE
+
**ACTIVE
+
**PASSIVE
+
**REFLEXIVE
+
SYNTAX
+
*ADVERBIAL
+
**ADJUNCTS
+
**CONJUNCTS
+
**DISJUNCTS
+
*APPOSITION
+
*COMPLEMENT
+
*COORDINATION
+
*OBJECT
+
*SUBJECT
+
*SUBORDINATION
+
*VOCATIVE
+

Latest revision as of 19:08, 16 August 2013

The specifications here stated are still experimental and tentative, and have been continuously extended and amended in order to be as comprehensive as possible. They follow the general strategies defined in the UNL 2005 Specifications (version of June 7th, 2005), but introduce several important changes derived from different UNLization experiences. Although formally adopted in the UNDL Foundation tools, projects and certificates, they should not be taken yet as the official specs, as they are still under construction and have not been widely discussed with the UNL Community.

Software