Shannon's Entropy

Next: Transmission Up: Information and Uncertainty Measures Previous: Probabilistic Interpretation of the

Shannon's Entropy

In 1948 Shannon introduced a general uncertainty-measure on random variables which takes different probabilities among states into account [40, pp. 392-396]. Today this measure is well known as ``Shannon's Entropy'' [26, pp. 112-116], [28, pp. 153-167], etc. Let X be a random-variable and P the space of all finite probability-distributions:

$\begin{displaymath}P := \{ f \vert \func{f}{dom(X)}{[0,1]}, x\in dom(X)=\{s_1,\ldots,s_n\},\;n=1,2,\ldots \} \end{displaymath}$

$\begin{displaymath}\func{H}{P}{[0,\infty)} \end{displaymath}$

$\begin{displaymath}H(X) = H( f(x) \vert x \in dom(X)) := - \sum_{x \in dom(X)} f(x) \log_2 f(x) \mbox{ bits} \end{displaymath}$

(8)

where dom(X) is the value-set of variable X, $x \in X$ a specific value and f the probability-distribution of X.

The conditional entropy of a variable Y knowing variable X is defined as the average of the entropies of Y for each value $x \in X$ , weighted according to the probability that x occurs:

$\begin{displaymath}H(Y\vert X) := \sum_{x \in dom(X)} f(x) \cdot \left( - \sum_{y \in dom(Y)} f(y\vert x) \log_2 f(y\vert x) \right) \end{displaymath}$

(9)

where f(y|x) denotes the conditional probability of $y \in Y$ when variable X is in state x. The conditional entropy expresses how uncertain we are of Y the average when we know X (which could be any of the values $x \in X$ ).

Shannon's entropy is an important measure for evaluating structures and patterns in our data. The lower the entropy (uncertainty) the more structure is already given in the relation. The usability becomes more obvious by looking at some properties of Shannon's Entropy:

1.

H=0 if and only if f(s)=1 for one $s \in dom(X)$ and f(x)=0 for all other $x \in dom(X)$ . This means the entropy H is only 0 if we are certain about the outcome.

2.

For any number of states N=|dom(X)| the entropy H is maximal and equal to $\log_2(N)$ if all states $x \in dom(X)$ have equal probability (Hartley-information). This is the situation where we have no structure in our distribution and are most uncertain.

3.

Any change toward equalization of the probabilities increases H. The more the states are equally likely to occur the less structure we have and the higher the uncertainty.

4.

The uncertainty of two independent variables (X,Y) is the sum of their respective uncertainties. This conforms with the initial comments we gave about the Hartley information-measure. Knowing X gives us no information about Y; therefore the conditional entropy Y knowing X equals the entropy of Y:

$\begin{displaymath}H(X, Y) = H(X) + H(Y), \quad \mbox{X,Y independent} \end{displaymath}$

H_X(Y) = H(Y)

5.

The uncertainty of two dependent variables (X,Y) is less than the sum of the individual uncertainties. This is caused by the information (structure) which is given in the correlation of the two variables. Because of the structure relating Y and X, the conditional entropy Y knowing X is smaller than the `a priori' entropy of Y:

$\begin{displaymath}H(X, Y) < H(X) + H(Y), \quad \mbox{X,Y dependent} \end{displaymath}$

H_X(Y) < H(Y)

6.

The uncertainties of two variables (X,Y) is the sum of the uncertainty of one variable X added to the conditional uncertainty of the other variable Y knowing X. This also shows that the uncertainty of a variable Y is never increased by knowledge of X:

H(X, Y) = H(X) + H_X(Y)

$\begin{displaymath}H_X(Y) \le H(Y) \end{displaymath}$

In connection with Shannon's entropy there are several similar measures defined.

A relative uncertainty of a variable, also called normalized uncertainty, is obtained by dividing by the maximum uncertainty $\log_2(\vert dom(X)\vert)$ :

$\begin{displaymath} H_{relative}(X) := \frac{H(X)}{\log_2(\vert dom(X)\vert)} = \frac{H(X)}{I(X)} \end{displaymath}$

(10)

Next: Transmission Up: Information and Uncertainty Measures Previous: Probabilistic Interpretation of the

Thomas Prang
1998-06-07