Questions tagged [variance]

6 questions
3
votes
1 answer

Prooving the convergence rate of estimators (machine Learning)

I want to estimate a quantity and have two choices for estimators (they both sample from the same distribution). I suspect one of them has a higher variance and thus a slower convergence rate. I want to mathematically prove this, but I don't know…
3
votes
1 answer

Is there a bias-variance equivalent in unsupervised learning?

In supervised learning, bias, variance are pretty easy to calculate with labeled data. I was wondering if there's something equivalent in unsupervised learning, or like a way to estimate such things? If not, how do we calculate loss functions in…
user98235
3
votes
0 answers

Why does this formula $\sigma^2 + \frac{1}{T}\sum_{t=1}^Tf^{\hat{W_t}}(x)^Tf^{\hat{W_t}}(x_t)-E(y)^TE(y)$ approximate the variance?

How does: $$\text{Var}(y) \approx \sigma^2 + \frac{1}{T}\sum_{t=1}^Tf^{\hat{W_t}}(x)^Tf^{\hat{W_t}}(x_t)-E(y)^TE(y)$$ approximate variance? I'm currently reading What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision, and the…
2
votes
3 answers

Why my classification results are correlated with the proportionality of my data?

I'm facing a problem. I'm working on mixed data model with NN (MLP & Word Embedding). My results are not pretty good. And I observed that the proportionality of my data are corelated with my classification results. I explain: As you can see, I have…
1
vote
1 answer

How is the variance for a diffusion kernel derived for a diffusion model?

So I'm watching this video tutorial from CVPR this year on diffusion models, and I am confused by the variance term in the distribution on the left on the video. I understand that in the forward process, we can track intermediate…
0
votes
1 answer

How does high entropy targets relate to less variance of the gradient between training cases?

I've been trying to understand the Distilling the Knowledge in a Neural Network paper by Hinton et al. But I cannot fully understand this: When the soft targets have high entropy, they provide much more information per training case than hard…