Information Gain

Original article
Information Gain and Mutual Information for Machine Learning

\begin{equation} \mathbf{IG}(\mathbf{S}, a) = \mathbf{H}(\mathbf{S}) – \mathbf{H}(\mathbf{S} | a) \end{equation}

Mutual information

References
Information Gain and Mutual Information for Machine Learning \
An introduction to mutual information - YouTube

Concerns the outcome of two random variables.

If we know the value of one of the random variables in a system there is a corresponding reduction in uncertainty for predicting the other one and mutual information measures that reduction in uncertainty.

\begin{equation} \textbf{I}(X_1 ; X_2) = \textbf{I}(X_1, X_2) = \sum_{X_1} \sum_{X_2} = \textbf{H}(X_1) - \textbf{H}(X_1 | X_2) = P(X_1, X_2) log \dfrac{P(X_1, X_2)}{\underbrace{P(X_1)P(X_2)}_{\text{Marginal Likelihood}}} \end{equation}

Where \(X_1\) and \(X_2\) are random variables and \(\textbf{I}(X_1 ; X_2)\) is the mutual information for \(X_1\) and \(X_2\).

Bayes' Rule

\begin{equation} \underbrace{p(\mathbf{z} \mid \mathbf{x})}_{\text{Posterior}} = \underbrace{p(\mathbf{z})}_{\text{Prior}} \times \frac{\overbrace{p(\mathbf{x} \mid \mathbf{z})}^{\text{Likelihood}}}{\underbrace{\int p(\mathbf{x} \mid \mathbf{z}) , p(\mathbf{z}) , \mathrm{d}\mathbf{z}}_{\text{Marginal Likelihood}}} \enspace , \end{equation}

where \(\mathbf{z}\) denotes latent parameters we want to infer and \(\mathbf{x}\) denotes data.** Euclid

Glossary

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
bit
shannon
Sh
    A unit of information and of entropy
    defined by IEC 80000-13.

    One shannon is the information content of
    an event occurring when its probability is
    1⁄2.

    It is also the entropy of a system with
    two equally probable states.

    If a message is made of a sequence of a
    given number of bits, with all possible
    bit strings being equally likely, the
    message's information content expressed in
    shannons is equal to the number of bits in
    the sequence.

mutual information
MI
    ...of two random variables.

    [dimensionless quantity]

    One of many quantities that measures how
    much one random variable tells us about
    another.

    Calculates the statistical dependence
    between two variables.

    Measures the amount of information one can
    obtain from one random variable given
    another.

    The name given to IG when applied to
    variable/feature selection.

    It is a dimensionless quantity with
    (generally) units of bits, and can be
    thought of as the reduction in uncertainty
    about one random variable given knowledge
    of another.

    A measure of the mutual dependence between
    the two variables.

    Quantifies the "amount of information"
    in bits (shannons) obtained about one
    random variable through observing the
    other random variable.

    The concept of mutual information is
    intricately linked to that of entropy of a
    random variable, a fundamental notion in
    information theory that quantifies the
    expected "amount of information" held in a
    random variable.