In Chapter 9, section 9.1.6, Raul Rojas describes how committees of networks can reduce the prediction error by training N identical neural networks and averaging the results.
If $f_i$ are the functions approximated by the $N$ neural nets, then:
$$ Q=\left|\frac{1}{N}(1,1, \ldots, 1) \mathbf{E}\right|^{2}=\frac{1}{N^{2}}(1,1, \ldots, 1) \mathbf{E} \mathbf{E}^{\mathrm{T}}(1,1, \ldots, 1)^{\mathrm{T}}\tag{9.4}\label{9.4} $$ is the quadratic error of the average of the networks, where $$ \mathbf{E}=\left(\begin{array}{cccc} e_{1}^{1} & e_{2}^{1} & \cdots & e_{m}^{1} \\ \vdots & \vdots & \ddots & \vdots \\ e_{1}^{N} & e_{2}^{N} & \cdots & e_{m}^{N} \end{array}\right), $$ and $\mathbf{E}$'s rows are the errors of the approximations of the $N$ functions, i.e. $\mathbf{e}^{i} = f_i(\mathbf{x}^{i}) - t_i$, for each of the input-output pairs $\left(\mathbf{x}^{1}, t_{1}\right), \ldots,\left(\mathbf{x}^{m}, t_{m}\right)$ used in training.
Is there a way to assure that the errors for a neural network are uncorrelated to the errors of the others?
Raul Rojas says that the uncorrelation of residual errors is true for a not too large $N$ (i.e. $N < 4$). Why is that?