The training of the output units tries to minimize the sum-squared error E:
where is the desired and is the observed output of the output unit o for a pattern p. The error E is minimized by gradient decent using
where is the derivative of an activation function of a output unit o and is the value of an input unit or a hidden unit i for a pattern p. denominates the connection between an input or hidden unit i and an output unit o.
After the training phase the candidate units are adapted, so that the correlation C between the value of a candidate unit and the residual error of an output unit becomes maximal. The correlation is given by Fahlman with:
where is the average activation of a candidate unit and is the average error of an output unit over all patterns p. The maximization of C proceeds by gradient ascent using
where is the sign of the correlation between the candidate unit's output and the residual error at output o.