next up previous contents
Next: Join-procedure: Up: Reconstructability Analysis (RA) Previous: Informal Introduction:

Formal Definition:

Let me now formally define Reconstructability Analysis formally. Define $V = \{x_1, \ldots, x_n\}$ to be the set of all dimensions. The domain of V is the cartesian product of the domains of its dimensions: $X :=dom(V)=dom(x_1) \times dom(x_2) \times \ldots \times dom(x_n)$. On this domain the database induces a count-table and a probability-distribution $f(\vec{x}),
\vec{x} \in dom(V)$ (Section 2.1) which represents the information and relationship among the variables, P denotes the set of all probability distributions defined on dom(V):

\begin{displaymath}\func{f}{\mathcal{P}(dom(V))}{[0,1]} \end{displaymath}


\begin{displaymath}P = \{ f \vert \func{f}{\mathcal{P}(dom(V))}{[0,1]} \} \end{displaymath}

A projection as described in Section 2.3.1 of f onto $\{x_1,x_2\}$ is denoted $\pi_{\{x_1,x_2\}}(f)$ , $\vec{x} \succ \vec{x'}$ means vector-inclusion:

\begin{displaymath}\func{\pi_{V_i}}{P}{P'}, \qquad V_i \subseteq V, \qquad
P' = \{ f \vert \func{f}{\mathcal{P}(dom(V_i))}{[0,1]} \} \end{displaymath}


\begin{displaymath}\pi_{V_i}: f \longmapsto f', \qquad f'(\vec{x'})=
\sum_{\ve...
...vec{x}) ,\qquad
\vec{x'} \in dom(V_i), \: \vec{x} \in dom(V)
\end{displaymath}

A model of V, the overall variable-set, is defined as a set of subsets of V

\begin{displaymath}M= \{V_1,\ldots,V_m\}, \qquad V_i \subseteq V,\; i=1,\ldots,m \end{displaymath}

such that:

\begin{displaymath}\begin{array}{l@{)\qquad}l@{\qquad}l}
1 & \cup_{i=1}^m V_i =...
...ot\subseteq V_j & \mbox{(irredundancy condition)}
\end{array} \end{displaymath}

The projection of f onto a model M is called a ``structure system'' and consist of the set of distributions:

\begin{displaymath}\pi_M(f) = \{\pi_{V_1}(f),\ldots,\pi_{V_m}(f) \} \end{displaymath}

As a structure system, $\pi_M(f)$ is a simplified representation of the overall system f, it corresponds to and could be projected from a set of possible overall systems. The set of overall probability distributions which are compatible with a given structure-system $\pi_M(f)$ is called its ``reconstruction family'':

\begin{displaymath}E( \pi_M(f) ) = \{ f' \in P \vert \;\pi_M(f') = \pi_M(f) \} \end{displaymath}

The maximum entropy reconstruction of a structure-system $\pi_M(f)$ is the unique overall distribution which can be rebuilt without adding any extra knowledge (unbiased) and is defined as $J(\pi_M(f) ) \in E( \pi_M(f) ) $such that

\begin{displaymath}H( J(\pi_M(f) ) ) = \max_{f' \in E( \pi_M(f) )} H(f'), \qquad
H(f)= - \sum_{s \in dom(V)} f(s) \log_2 f(s) \end{displaymath}

This maximum entropy reconstruction can be obtained by a series of relational join operations ( $i=1, \ldots,(m-1)\;$) in which we sequentially add the knowledge of the subrelations to form the overall system.


next up previous contents
Next: Join-procedure: Up: Reconstructability Analysis (RA) Previous: Informal Introduction:
Thomas Prang
1998-06-07