Grammar

From UNL Wiki
Revision as of 19:47, 21 September 2012 by Martins (Talk | contribs)
Jump to: navigation, search

In the UNL framework, a grammar is a set of rules that are used to generate UNL out of natural language, and natural language out of UNL. Along with the UNL-NL dictionaries, they constitute the basic resource for UNLization and NLization.

Contents

Types

In the UNLframework, we distinguish between transformation and disambiguation grammars:

  • Transformation Grammar, or T-Grammar, is the set of T-rules, which are used to transform structures
  • Disambiguation Grammar, or D-Grammar, is the set of D-rules, which are used to improve the performance of the T-rules

The syntax and behavior of T-rules and D-rules are defined in the Grammar Specs.

Direction

In the UNLframework, we distinguish between analysis and generation grammars:

  • The UNL-NL (Generation) Grammar is used to generate natural language out of UNL
  • The NL-UNL (Analysis) Grammar is used to generate UNL out of natural language

Units

The process of UNLization may have different representation units, as follows:

  • Word-driven UNLization (the source document is represented as a single network of individual concepts)
  • Sentence-driven UNLization (the source document is represented as a list of non-semantically related networks of individual concepts)
  • Text-driven UNLization (the source document is represented as a network of semantically related networks of individual concepts)

In word-driven UNLization, the sentence boundaries and the structure of the source document are ignored, and the source document is represented as a single graph, i.e., as a simple network of individual concepts. In sentence-driven UNLization, the source document is analyzed, sentence by sentence, as a list of non-semantically related hyper-graphs. Each sentence is represented separately, and the only relation standing between sentences is the order in the source document. At last, text-driven UNLization targets the rhetorical structure of the source document, i.e., it analyzes the source document as a network of semantically related hyper-graphs. Word-driven UNLization is used mainly for information retrieval and extraction, whereas sentence- and text-driven UNLization are normally used for translation.

Paradigms

The process of UNLization may follow several different paradigms, as follows:

  • Language-based UNLization (based mainly in a NL-UNL dictionary and NL-UNL grammar)
  • Knowledge-based UNLization (based mainly in the UNL Knowledge Base)
  • Example-based UNLization (based mainly in the UNL Example Base)
  • Memory-based UNLization (based mainly in the UNLization Memory)
  • Statistical-based UNLization (based mainly in statistical predictions derived from UNL-NL corpora)
  • Dialogue-based UNLization (based mainly in the interaction with the user)

The actual UNLization is normally hybrid and may combine several of the strategies above.

Recall

The process of UNLization may target the whole source document or only parts of it (e.g. main clauses):

  • Full UNLization (the whole source document is UNLized)
  • Partial (or chunk) UNLization (only a part of the source document is UNLized)
Peter killed Mary with a knife yesterday morning.
Full UNLization: Peter killed Mary with a knife yesterday morning.
Partial UNLization: Peter killed Mary.

Precision

The process of UNLization may target the deep semantic structure of the source document (i.e., the resulting semantic structure replicates the syntactic structure of the original) or only its surface structure (the resulting semantic structure does not preserve the syntactic structure of the original)

  • Deep UNLization (the UNLization focus the deep semantic structure of the source document)
  • Shallow UNLization (the UNLization focus the surface semantic structure of the source document)

Syntactic structures are preserved in the UNL document by the use of syntactic attributes (such as @passive, @topic, etc) or by hyper-nodes (i.e., scopes). For some purposes, as translation, UNLization may require syntactic details; for others, such as information retrieval, syntactic structures at this level are not normally necessary:

Mary was killed by Peter
Shallow UNLization: Peter killed Mary
Deep UNLization: [Peter killed Mary].@passive
Mary saw Peter going to Paris.
Shallow UNLization: Mary saw Peter & Peter was going to Paris
Deep UNLization: Mary saw [Peter going to Paris].
As for the little girl, the dog licked her.
Shallow UNLization: the dog licked the little girl
Deep UNLization: the dog licked [the little girl].@topic

Level

The process of UNLization may target literal meanings (locutionary content) or non-literal meanings (ilocutionary content).

  • Locutionary (the UNLization represents only the literal meaning)
  • Ilocutionary (the UNLization represents also non-literal meanings, including speech acts)

The ilocutionary force may be represented by figure of speech and speech acts attributes:

It is as soft as concrete
Locutionary level: it is as soft as concrete
Ilocutionary level: [it is as soft as concrete].@irony
Can you pass me the salt?
Locutionary level: can you pass me the salt?
Ilocutionaruy level: [you pass me the salt].@request

Methods

Humans and machines may play different roles in UNLization methods:

  • Fully automatic UNLization (the whole process is carried out by the machine, without any intervention of the human user)
  • Human-aided machine UNLization (the process is carried mainly by the machine, with some intervention of the human user, either as a pre-editor or as a post-editor, or during the UNLization itself, as in dialogue-based UNLization)
  • Machine-aided human UNLization (the process is carried mainly by the human user, with some help of the machine, as in the dictionary or memory lookup)
  • Fully human UNLization (the whole process is carried by the human user, without any intervention of the machine)
Software