How does:
$$\text{Var}(y) \approx \sigma^2 + \frac{1}{T}\sum_{t=1}^Tf^{\hat{W_t}}(x)^Tf^{\hat{W_t}}(x_t)-E(y)^TE(y)$$
approximate variance?
I'm currently reading What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision, and the authors wrote the above formula for the approximate estimation for the variance. I'm confused how the above is an approximation for $\frac{\sum(y-\bar{y})^2}{N-1}$. So, in the above equation, they're using a Bayesian Neural Network to quantify uncertainty. $\sigma$ is the predictive variance (kind of confused how they get this). $x$ is the input and $y$ is the label for the classification. $f^{\hat{W_t}}(\cdot)$ output a mean to a Gaussian distribution, with $\sigma$ being the SD for that distribution and $T$ is a predefined number of samples because the gradient is evaluated using Monte Carlo sampling.