UNLization

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
Line 1: Line 1:
<b>UNLization</b>, formerly known as enconversion, is the process of "representing" a natural language structure into UNL. This "representation"
+
<b>UNLization</b>, formerly known as enconversion, is the process of "representing" a natural language structure into UNL. This representation
should be understood as a '''interpretation''' rather than as a '''translation''' of the source document, in the sense it is not necessarily committed to its linguistic structure (such as lexical choice and syntax) but to its semantic structure only (it must replicate concepts and  relations between concepts conveyed by the linguistic structure).  
+
should be understood as a '''interpretation''' rather than as a '''translation''' of the source document, in the sense it is not necessarily committed to its linguistic structure (such as lexical choice and syntax) but to its semantic structure only (it must replicate concepts and  relations between concepts conveyed by the linguistic structure).
 +
 
 +
== UNLization units ==
 +
*Word-driven UNLization (the UNLization does not preserve any structure of the source document except the relations between concepts)
 +
*Sentence-driven UNLization (the UNLization preserves the sentence structures of the source document)
 +
*Text-driven UNLization (the UNLization preserves the whole structure of the source document)
  
 
== UNLization paradigms ==
 
== UNLization paradigms ==
Line 10: Line 15:
 
*Statistical-based UNLization (based mainly in statistical predictions derived from UNL-NL corpora)
 
*Statistical-based UNLization (based mainly in statistical predictions derived from UNL-NL corpora)
 
*Dialogue-based UNLization (based mainly in the interaction with the user)
 
*Dialogue-based UNLization (based mainly in the interaction with the user)
 +
The actual UNLization is normally hybrid and may combine several of the strategies above.
  
 
== Recall ==  
 
== Recall ==  
 
+
The process of UNLization may target the whole source document or only parts of it (e.g. main clauses):
 
*Full UNLization (the whole source document is UNLized)
 
*Full UNLization (the whole source document is UNLized)
*Partial (or chunk) UNLization (only a part of the source document, e.g. main clauses, is UNLized)
+
*Partial (or chunk) UNLization (only a part of the source document is UNLized)
  
 
== Precision ==
 
== Precision ==
 
+
The process of UNLization may target the deep semantic structure of the source document (i.e., the resulting semantic structure replicates the syntactic structure of the original) or only its surface structure (the resulting semantic structure does not preserve the syntactic structure of the original)
 
*Deep UNLization (the UNLization focus the deep semantic structure of the source document)
 
*Deep UNLization (the UNLization focus the deep semantic structure of the source document)
 
*Shallow UNLization (the UNLization focus the surface semantic structure of the source document)
 
*Shallow UNLization (the UNLization focus the surface semantic structure of the source document)
 +
Syntactic structures are preserved in the UNL document by the use of syntactic attributes (such as @passive, @topic, etc) or by hyper-nodes (i.e., [[scope]]s). For some purposes, as translation, UNLization may require syntactic details; for others, such as information retrieval, syntactic structures at this level are not normally necessary:
 +
;Mary was killed by Peter
 +
:Shallow UNLization: kill
 +
:Deep UNLization: kill.@passive
 +
;Mary saw Peter going to Paris.
 +
:Shallow UNLization: Mary saw Peter & Peter was going to Paris
 +
:Deep UNLization: Mary saw [Peter going to Paris].
 +
;As for the little girl, the dog licked her.
 +
:Shallow UNLization: the dog licked the little girl
 +
:Deep UNLization: the dog licked [the little girl].@topic
  
== Units ==
+
== Level ==
 
+
The process of UNLization may target represent literal meanings (locutionary content) or non-literal meanings (ilocutionary content).
*Word-driven UNLization (the UNLization does not preserve any structure of the source document)
+
*Locutionary (the UNLization represents only the literal meaning)
*Sentence-driven UNLization (the UNLization preserves the sentence strucutres of the source document)
+
*Ilocutionary (the UNLization represents also non-literal meanings, including speech acts)
*Text-driven UNLization (the UNLization preserves the whole structure of the source document)
+
The ilocutionary force may be represented by figure of speech and speech acts attributes:
 
+
;It is as soft as concrete
== Scope ==
+
:Locutionary level: it is as soft as concrete
 
+
:Ilocutionary level: [it is as soft as concrete].@irony
*Locutionary content (the UNLization represents only the literal meaning)
+
;Can you pass me the salt?
*Ilocutionary content (the UNLization represents also non-literal meanings)
+
:Locutionary level: can you pass me the salt?
 
+
:Ilocutionaruy level: [pass me the salt].@request
The main difference between both scopes of the UNLization process is that.
+
 
+
onsequence of such assumption is that the UNL document will not contain the semantic ambiguities of the original, and will only encode one of its possible semantic realisations, preferably the most frequent one. This does not mean, however, that UNL is constrained only to the literal meaning or that it is not able to register syntactic phenomena  that may affect the interpretation of a given utterance. The UNL Specs contain attributes to represent figures of speech, the functional structure of the sentence, speech acts and other information that may be used to provide not only semantically-equivalent but also functionally-equivalent utterances, as indicated below:
+
  
 +
== UNLization challenges ==
 +
The UNL document will not contain the semantic ambiguities of the original, and will only encode one of its possible semantic realisations, preferably the most frequent one:
 
;The bank crashed.
 
;The bank crashed.
 
:UNL is not able to preserve the lexical ambiguity of the word "bank" in the sentence above. The UNL representation will necessarily choose between one of the possible concepts conveyed by the English word "bank".
 
:UNL is not able to preserve the lexical ambiguity of the word "bank" in the sentence above. The UNL representation will necessarily choose between one of the possible concepts conveyed by the English word "bank".
 
;The boy saw the girl with binoculars
 
;The boy saw the girl with binoculars
 
:UNL is not able to represent the syntactic ambiguity of the sentence above. The UNL representation will necessarily choose between one of the possible syntactic structures of the sentence.
 
:UNL is not able to represent the syntactic ambiguity of the sentence above. The UNL representation will necessarily choose between one of the possible syntactic structures of the sentence.
;Mary was killed by Peter
 
:UNL may represent the passive voice by assigning the attribute @passive to the verb
 
;As for the little girl, the dog licked her.
 
:UNL may represent the topicalization of "little girl" by assigning the attribute @topic to it
 
;It is as soft as concrete
 
:UNL may represent the ironical aspect of the sentence above by assigning the attribute @irony to the corresponding representation.
 
;Can you pass me the salt?
 
:UNL may represent the speech act conveyed by the sentence above by assigning the attribute @request to it.
 

Revision as of 23:10, 13 December 2010

UNLization, formerly known as enconversion, is the process of "representing" a natural language structure into UNL. This representation should be understood as a interpretation rather than as a translation of the source document, in the sense it is not necessarily committed to its linguistic structure (such as lexical choice and syntax) but to its semantic structure only (it must replicate concepts and relations between concepts conveyed by the linguistic structure).

Contents

UNLization units

  • Word-driven UNLization (the UNLization does not preserve any structure of the source document except the relations between concepts)
  • Sentence-driven UNLization (the UNLization preserves the sentence structures of the source document)
  • Text-driven UNLization (the UNLization preserves the whole structure of the source document)

UNLization paradigms

The process of UNLization may follow several different paradigms, as follows:

  • Language-based UNLization (based mainly in a NL-UNL dictionary and NL-UNL grammar)
  • Knowledge-based UNLization (based mainly in the UNL Knowledge Base)
  • Example-based UNLization (based mainly in the UNL Example Base))
  • Memory-based UNLization (based mainly in the UNL-NL UNlization Memory)
  • Statistical-based UNLization (based mainly in statistical predictions derived from UNL-NL corpora)
  • Dialogue-based UNLization (based mainly in the interaction with the user)

The actual UNLization is normally hybrid and may combine several of the strategies above.

Recall

The process of UNLization may target the whole source document or only parts of it (e.g. main clauses):

  • Full UNLization (the whole source document is UNLized)
  • Partial (or chunk) UNLization (only a part of the source document is UNLized)

Precision

The process of UNLization may target the deep semantic structure of the source document (i.e., the resulting semantic structure replicates the syntactic structure of the original) or only its surface structure (the resulting semantic structure does not preserve the syntactic structure of the original)

  • Deep UNLization (the UNLization focus the deep semantic structure of the source document)
  • Shallow UNLization (the UNLization focus the surface semantic structure of the source document)

Syntactic structures are preserved in the UNL document by the use of syntactic attributes (such as @passive, @topic, etc) or by hyper-nodes (i.e., scopes). For some purposes, as translation, UNLization may require syntactic details; for others, such as information retrieval, syntactic structures at this level are not normally necessary:

Mary was killed by Peter
Shallow UNLization: kill
Deep UNLization: kill.@passive
Mary saw Peter going to Paris.
Shallow UNLization: Mary saw Peter & Peter was going to Paris
Deep UNLization: Mary saw [Peter going to Paris].
As for the little girl, the dog licked her.
Shallow UNLization: the dog licked the little girl
Deep UNLization: the dog licked [the little girl].@topic

Level

The process of UNLization may target represent literal meanings (locutionary content) or non-literal meanings (ilocutionary content).

  • Locutionary (the UNLization represents only the literal meaning)
  • Ilocutionary (the UNLization represents also non-literal meanings, including speech acts)

The ilocutionary force may be represented by figure of speech and speech acts attributes:

It is as soft as concrete
Locutionary level: it is as soft as concrete
Ilocutionary level: [it is as soft as concrete].@irony
Can you pass me the salt?
Locutionary level: can you pass me the salt?
Ilocutionaruy level: [pass me the salt].@request

UNLization challenges

The UNL document will not contain the semantic ambiguities of the original, and will only encode one of its possible semantic realisations, preferably the most frequent one:

The bank crashed.
UNL is not able to preserve the lexical ambiguity of the word "bank" in the sentence above. The UNL representation will necessarily choose between one of the possible concepts conveyed by the English word "bank".
The boy saw the girl with binoculars
UNL is not able to represent the syntactic ambiguity of the sentence above. The UNL representation will necessarily choose between one of the possible syntactic structures of the sentence.
Software