F-measure

From UNL Wiki
(Difference between revisions)
Jump to: navigation, search
(How to calculate the F-Measure)
 
(20 intermediate revisions by one user not shown)
Line 6: Line 6:
 
*'''precision''' is the number of correct results divided by the number of all returned results
 
*'''precision''' is the number of correct results divided by the number of all returned results
 
*'''recall''' is the number of correct results divided by the number of results that should have been returned
 
*'''recall''' is the number of correct results divided by the number of results that should have been returned
A result is considered '''"RETURNED"''' in the following cases:
+
 
*In [[UNLization]], when the output is a graph (i.e., all the nodes are interlinked) made only of UW's (i.e., without natural language words)
+
== [[UNLization]] ==
*In [[NLization]], when the output is a list of natural language words (i.e., without any UW).
+
;RETURNED
A result is considered '''"CORRECT"''' in the following cases:
+
A result is considered '''"RETURNED"''' when:
*In [[UNLization]], when
+
*the output is a graph (i.e., all the nodes are interlinked); AND
**The discrepancy of relations between the actual and the expected output is less than 0.3; AND
+
*the output is made only of UW's (i.e., without natural language words)
**The discrepancy of UW's between the actual and the expected output is less than 0.3; AND
+
;CORRECT
**The overall discrepancy is less than 0.5;
+
A result is considered '''"CORRECT"''' when:
 +
*The discrepancy of relations between the actual and the expected output is less than 0.3; AND
 +
*The discrepancy of UW's between the actual and the expected output is less than 0.3; AND
 +
*The overall discrepancy is less than 0.3.
 
WHERE<br />
 
WHERE<br />
Discrepancy of relations is calculated by the formula:
+
:Discrepancy of relations is calculated by the formula:
  (exceding_relations + missing_relations)/total_relations
+
  (exceeding_relations + missing_relations)/total_relations
Discrepancy of UW's is calculated by the formula:
+
:Discrepancy of UW's is calculated by the formula:
  (exceding_UW + missing_UW)/total_UW
+
  (exceeding_UW + missing_UW)/total_UW
Overall discrepancy is calculated by the formula:
+
:Overall discrepancy is calculated by the formula:
  ((3*(exceding_relations+missing_relations))+(2*(exceding_UW+missing_UW)+(exceding_attribute+missing_attribute))/((3*total_relations)+(2*total_UW)+(total_attribute))
+
  ((3*(exceeding_relations+missing_relations))+(2*(exceding_UW+missing_UW)+(exceding_attribute+missing_attribute))/((3*total_relations)+(2*total_UW)+(total_attribute))
:*exceding_relations is the number of relations present in the actual output but absent from the expected output
+
:*exceeding_relations is the number of relations present in the actual output but absent from the expected output
 
:*missing_relations is the number of relations absent from the actual output but present in the expected output
 
:*missing_relations is the number of relations absent from the actual output but present in the expected output
 
:*total_relations is the sum of the total number of relations in the actual output and in the expected output
 
:*total_relations is the sum of the total number of relations in the actual output and in the expected output
:*exceding_UW is the number of UW's<ref name="a">For the sake of comparison, UW's appearing in the source position are considered to be different from UW's appearing in the target position of a relation. Scopes are ignored.</ref> present in the actual output but absent from the expected output
+
:*exceeding_UW is the number of UW's<ref name="a">For the sake of comparison, UW's appearing in the source position are considered to be different from UW's appearing in the target position of a relation. Scopes are ignored.</ref> present in the actual output but absent from the expected output
 
:*missing_UW is the number of UW's<ref name="a"/> absent from the actual output but present in the expected output
 
:*missing_UW is the number of UW's<ref name="a"/> absent from the actual output but present in the expected output
 
:*total_UW is the sum of the total number of UW's<ref name="a"/> in the actual output and in the expected output
 
:*total_UW is the sum of the total number of UW's<ref name="a"/> in the actual output and in the expected output
:*exceding_attribute is the number of attributes<ref name="b">For the sake of comparison, attributes appearing in the source position are considered to be different from attributes appearing in the target position of a relation. Optional attributes (such as .@def and .@indef) are ignored.</ref> present in the actual output but absent from the expected output
+
:*exceeding_attribute is the number of attributes<ref name="b">For the sake of comparison, attributes appearing in the source position are considered to be different from attributes appearing in the target position of a relation. Optional attributes (such as .@def and .@indef) are ignored.</ref> present in the actual output but absent from the expected output
 
:*missing_attribute is the number of attributes<ref name="b"/> absent from the actual output but present in the expected output
 
:*missing_attribute is the number of attributes<ref name="b"/> absent from the actual output but present in the expected output
 
:*total_attribute is the sum of the total number of attributes<ref name="b"/> in the actual output and in the expected output
 
:*total_attribute is the sum of the total number of attributes<ref name="b"/> in the actual output and in the expected output
*In [[NLization]], when the difference between the actual result and the expected result is less than 30%
+
 
**The difference between the actual and the expected result is calculated by the formula
+
== [[NLization]] ==
  (exceding_words+missing_words)/(total_words)
+
;RETURNED
;**WHERE
+
A result is considered '''"RETURNED"''' when the output is a list of natural language words (i.e., without any UW).
***exceding_words is the number of words present in the actual output but absent from the expected output
+
;CORRECT
***missing_words is the number of words absent from the actual output but present in the expected output
+
A result is considered '''"CORRECT"''' when the difference between the actual result and the expected result is less than 0.3<br />
***total_words is the sum of the total number of words in the actual output and in the expected output
+
:The difference between the actual and the expected result is calculated by the formula
 +
  (exceeding_words+missing_words)/(total_words)
 +
WHERE
 +
:*exceeding_words is the number of words present in the actual output but absent from the expected output
 +
:*missing_words is the number of words absent from the actual output but present in the expected output
 +
:*total_words is the sum of the total number of words in the actual output and in the expected output
 +
 
 +
== How to calculate the F-Measure ==
 +
The F-measure may be automatically calculated at UNLWEB>UNLARIUM>TOOLS>F-MEASURE.<br />
 +
In order to calculate the F-Measure, you have to provide the following:
 +
*For UNLization
 +
**The actual UNL output, automatically generated by [[IAN]] with your grammars and dictionaries, at the trace level NONE<ref name="range">Select the option RANGE, with all sentences, in order to generate the output for all sentences at once</ref>
 +
**The expected UNL output for the same set of sentences
 +
*For NLization
 +
**The actual NL output, automatically generated by [[EUGENE]] with your grammars and dictionaries, at the trace level MINIMAL<ref name="range"/>.
 +
**The expected NL output for the same set of sentences
 +
 
 
== References ==
 
== References ==
 
<references />
 
<references />

Latest revision as of 01:23, 18 February 2014

In the UNL System, the F-measure (or F1-score) is the measure of a grammar's accuracy. It considers both the precision and the recall of the grammar to compute the score, according to the formula

F-measure = 2 x ( (precision x recall) / (precision + recall) )

In the above:

  • precision is the number of correct results divided by the number of all returned results
  • recall is the number of correct results divided by the number of results that should have been returned

Contents

UNLization

RETURNED

A result is considered "RETURNED" when:

  • the output is a graph (i.e., all the nodes are interlinked); AND
  • the output is made only of UW's (i.e., without natural language words)
CORRECT

A result is considered "CORRECT" when:

  • The discrepancy of relations between the actual and the expected output is less than 0.3; AND
  • The discrepancy of UW's between the actual and the expected output is less than 0.3; AND
  • The overall discrepancy is less than 0.3.

WHERE

Discrepancy of relations is calculated by the formula:
(exceeding_relations + missing_relations)/total_relations
Discrepancy of UW's is calculated by the formula:
(exceeding_UW + missing_UW)/total_UW
Overall discrepancy is calculated by the formula:
((3*(exceeding_relations+missing_relations))+(2*(exceding_UW+missing_UW)+(exceding_attribute+missing_attribute))/((3*total_relations)+(2*total_UW)+(total_attribute))
  • exceeding_relations is the number of relations present in the actual output but absent from the expected output
  • missing_relations is the number of relations absent from the actual output but present in the expected output
  • total_relations is the sum of the total number of relations in the actual output and in the expected output
  • exceeding_UW is the number of UW's[1] present in the actual output but absent from the expected output
  • missing_UW is the number of UW's[1] absent from the actual output but present in the expected output
  • total_UW is the sum of the total number of UW's[1] in the actual output and in the expected output
  • exceeding_attribute is the number of attributes[2] present in the actual output but absent from the expected output
  • missing_attribute is the number of attributes[2] absent from the actual output but present in the expected output
  • total_attribute is the sum of the total number of attributes[2] in the actual output and in the expected output

NLization

RETURNED

A result is considered "RETURNED" when the output is a list of natural language words (i.e., without any UW).

CORRECT

A result is considered "CORRECT" when the difference between the actual result and the expected result is less than 0.3

The difference between the actual and the expected result is calculated by the formula
(exceeding_words+missing_words)/(total_words)

WHERE

  • exceeding_words is the number of words present in the actual output but absent from the expected output
  • missing_words is the number of words absent from the actual output but present in the expected output
  • total_words is the sum of the total number of words in the actual output and in the expected output

How to calculate the F-Measure

The F-measure may be automatically calculated at UNLWEB>UNLARIUM>TOOLS>F-MEASURE.
In order to calculate the F-Measure, you have to provide the following:

  • For UNLization
    • The actual UNL output, automatically generated by IAN with your grammars and dictionaries, at the trace level NONE[3]
    • The expected UNL output for the same set of sentences
  • For NLization
    • The actual NL output, automatically generated by EUGENE with your grammars and dictionaries, at the trace level MINIMAL[3].
    • The expected NL output for the same set of sentences

References

  1. 1.0 1.1 1.2 For the sake of comparison, UW's appearing in the source position are considered to be different from UW's appearing in the target position of a relation. Scopes are ignored.
  2. 2.0 2.1 2.2 For the sake of comparison, attributes appearing in the source position are considered to be different from attributes appearing in the target position of a relation. Optional attributes (such as .@def and .@indef) are ignored.
  3. 3.0 3.1 Select the option RANGE, with all sentences, in order to generate the output for all sentences at once
Software