Unified Information Gain Measures

for Inference Processes

José Hernández-Orallo

Abstract

We present a definition of information gain based on descriptional complexity, which
applies to different inference processes, either deductive or inductive, and evaluates
the relative value of new inference results, weighting the
convenience of leaving them implicitly or explicitly.

Concretely, processes which are apparently so different as induction and deduction can be
explained in a computational framework as inference processes that {\em just} generate an output
from an input, which must follow some semantical restrictions and/or selection criteria, widely studied
in philosophy of science and mathematical logic, respectively. The term information is seen as
the result of a computational effort, analogically to the way energy is seen as the result of
a physical work. This suggests many questions, especially how to measure this computational effort.
The {\em answer} was given by Levin in the seventies, proving that the weighting
$LT(x) = length(x) + log Cost(x)$ between space and time was optimal in the sense of universal
search problems. Given two objects, the effort from $x$ to $y$ in a computational system $\phi$ is then measured as the
relative Levin-descriptional complexity $Kt(y | x) = min \{ LT(p) : \phi(<p, x>) = y\}$.

The Information Gain of object $y$ wrt. object $x$ is then defined as the quotient between the effort
which is necessary to describe $y$ from $x$ and the effort which is necessary to describe $y$ alone.
More formally, $G(y | x) = Kt(y | x) / Kt(y)$. Some properties of this measure are shown before applying it to inference processes.
In the case of induction, information gain represents how informative is the hypothesis wrt.
the evidence (in Popper's sense) and it is compared with other
selection criteria, especially simplicity. This leads to the notion of {\em authentic learning},
quite different from Gold's identification. In the case of deduction, the measure also represents
how informative is the conclusion from the premises. This establishes a generic measure of the gain
which is obtained from making explicit something that was implicit, provided that the system is not
omniscient and resource limited, where there is a clear difference between the explicit or
surface information, and implicit or depth information, as it was highlighted by Hintikka
for first-order logic.

We study optimal compromises between the size of a theory and its explicitness, formalising
the necessity of lemmata and the use of extensional properties for mathematical practice,
in order to avoid difficult derivations that were already done (while still maintaining under control
the whole size of the theory).

Keywords: Informativeness, Learning, Descriptional Complexity, Inference Processes, Deduction, Induction.

Go back to my home page .