## Information Gain

Original article
Information Gain and Mutual Information for Machine Learning

$$\mathbf{IG}(\mathbf{S}, a) = \mathbf{H}(\mathbf{S}) – \mathbf{H}(\mathbf{S} | a)$$

## Mutual information

References
Information Gain and Mutual Information for Machine Learning \ An introduction to mutual information - YouTube

Concerns the outcome of two random variables.

If we know the value of one of the random variables in a system there is a corresponding reduction in uncertainty for predicting the other one and mutual information measures that reduction in uncertainty.

$$\textbf{I}(X_1 ; X_2) = \textbf{I}(X_1, X_2) = \sum_{X_1} \sum_{X_2} = \textbf{H}(X_1) - \textbf{H}(X_1 | X_2) = P(X_1, X_2) log \dfrac{P(X_1, X_2)}{\underbrace{P(X_1)P(X_2)}_{\text{Marginal Likelihood}}}$$

Where $$X_1$$ and $$X_2$$ are random variables and $$\textbf{I}(X_1 ; X_2)$$ is the mutual information for $$X_1$$ and $$X_2$$.

### Bayes’ Rule

$$\underbrace{p(\mathbf{z} \mid \mathbf{x})}_{\text{Posterior}} = \underbrace{p(\mathbf{z})}_{\text{Prior}} \times \frac{\overbrace{p(\mathbf{x} \mid \mathbf{z})}^{\text{Likelihood}}}{\underbrace{\int p(\mathbf{x} \mid \mathbf{z}) , p(\mathbf{z}) , \mathrm{d}\mathbf{z}}_{\text{Marginal Likelihood}}} \enspace ,$$

where $$\mathbf{z}$$ denotes latent parameters we want to infer and $$\mathbf{x}$$ denotes data.** Euclid

## Glossary

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60  bit shannon Sh A unit of information and of entropy defined by IEC 80000-13. One shannon is the information content of an event occurring when its probability is 1⁄2. It is also the entropy of a system with two equally probable states. If a message is made of a sequence of a given number of bits, with all possible bit strings being equally likely, the message's information content expressed in shannons is equal to the number of bits in the sequence. mutual information MI ...of two random variables. [dimensionless quantity] One of many quantities that measures how much one random variable tells us about another. Calculates the statistical dependence between two variables. Measures the amount of information one can obtain from one random variable given another. The name given to IG when applied to variable/feature selection. It is a dimensionless quantity with (generally) units of bits, and can be thought of as the reduction in uncertainty about one random variable given knowledge of another. A measure of the mutual dependence between the two variables. Quantifies the "amount of information" in bits (shannons) obtained about one random variable through observing the other random variable. The concept of mutual information is intricately linked to that of entropy of a random variable, a fundamental notion in information theory that quantifies the expected "amount of information" held in a random variable.