Hartley Information

Next: Probabilistic Interpretation of the Up: Information and Uncertainty Measures Previous: Information and Uncertainty Measures

Hartley Information

In 1928 Hartley introduced [15] a simple measure of information. When one message is chosen from a finite set of equally likely choices then the number of possible choices or any monotonic function of this number can be regarded as a measure of information. Hartley pointed out that the logarithmic function is the most ``natural'' measure. It is practical more useful because time, bandwidth, etc. tend to vary linearly with the logarithm of the number of possibilities: adding one relay to a group doubles the number of possible states of the relays. In this sense the logarithm also feels more intuitive as a proper measure: two identical channels should have twice the capacity for transmitting information than one.

Today usually the logarithm with base 2 is chosen and the resulting information-units are called binary digits, or bits. Therefore one relay or flip-flop which can be in any of two stable positions holds 1 bit of information. N such devices can store N bits, since the total number of possible states is 2^Nand $I= \log_2(2^N) = N$ (adapted from [40]):

$\begin{displaymath}S_n=\{s_1,\ldots,s_n\}, \vert S_n\vert=n, \mathcal{S}=\{ S_n\vert n=1,2,\ldots \;\} \end{displaymath}$

$\begin{displaymath}\func{I}{\mathcal{S}}{[0,\infty)} \end{displaymath}$

$\begin{displaymath}I(S_n) := \log_2( \vert S_n\vert ) = \log_2(n) \mbox{ bits} \end{displaymath}$

(4)

where $\mathcal{S}$ is the set of all finite sets of equally likely states.

To evaluate the information content of knowing a current state $s \in S_n$ we compare the a priori information (set S_n) with the a posteriori information (set $\{s\}$ ). In general our a posteriori information can be in any subset $U \subseteq S_n$ . Our knowledge about a particular subset U or state $\{s\}$ is expressed by the conditional Hartley information $I(\mbox{a posteriori set} \vert \mbox{a priori set})$ :

$\begin{displaymath}I(U\vert S_n) := I(S_n)-I(U) ) = \log_2\left(\frac{\vert S_n\vert}{\vert U\vert}\right) \mbox{ bits} \end{displaymath}$

(5)

If $U=\{s\}$ then |U|=1 and I(U|S_n)= I(S_n). Therefore we usually identify the information of knowing a particular state $s \in S_n$ with the information content of the set S_n.

Note that the information we have (in bits) if we know the state can be interpreted as the uncertainty (in bits) if we don't know the state. In the above example our uncertainty is N bits if we don't know the states of N (binary) relays, and thus we are uncertain about N bits of information.

Sometimes the set of states is partitioned into clusters. Let C_k be a partion of S_n:

$\begin{displaymath}C_k=\{c_1,c_2,\ldots,c_k\}, \quad c_i \subset S_n, \quad c_i \cap c_j = \emptyset, \quad \bigcup_{j=1}^k c_j = S \end{displaymath}$

Now two information measures are of interest: $I(\{s\}\vert S_n)=I(S_n)$ , the information content of the state $s \in S_n$ , and $I(\{c\}\vert C_k)=I(C_k)$ , the information content of the cluster $c \in C_k$ . A relative Hartley information of the partition over the overall state set is defined as follows:

$\begin{displaymath} I_{relative}(C_k,S_n) := \frac{\log_2(\vert C_k\vert)}{\log... ...n\vert)} = \frac{\log_2(k)}{\log_2(n)}= \frac{I(C_k)}{I(S_n)} \end{displaymath}$

(6)

It describes the relative information contained in the clustering compared to knowing the states.

The relative information is also used to evaluate the (statistical) significance of distributions. Here the single data entries are the states and their respective variable values are the corresponding clusters. Example, if there are 400 data entries $D=\{d_1,\ldots,d_{400}\}$ , 200 with value $v_1 \in V$ , and also 200 with value v₁ in V then I(V|D)=log(2)/log(400)=0.1157. If there are only two data entries $D=\{d_1,d_2\}$ , one with value v₁ and one with v₂then I(V|D)=log(2)/log(2)=1, which is the highest amount of possible relative information for a partition. We know that estimated probabilities of $\tilde{f}(x_1)=\tilde{f}(x_2)=.5$ are much more significant for the first case then for the second. Therefore a lower relative information among the values indicate higher significance for a probability approximation.

Probabilistic Interpretation of the Hartley Measure:

Next: Probabilistic Interpretation of the Up: Information and Uncertainty Measures Previous: Information and Uncertainty Measures

Thomas Prang
1998-06-07