1

From what I understand VC dimension is what establishes the feasibility of learning for infinite hypothesis sets, the only kind we would use in practice.

But, the literature (i.e. Learning from Data) states that VC gives a loose bound, and that in real applications, learning models with lower VC dimension tend to generalize better than those with higher VC dimension. So, a good rule of thumb would be to require at least 10xVC dimension examples in order to get decent generalization.

I am having trouble interpreting what loose bound means. Is the VC generalization bound loose due to its universality? Meaning, its results apply to all hypothesis sets, learning algorithms, input spaces, probability distributions, and binary target functions.

nbro
  • 39,006
  • 12
  • 98
  • 176

1 Answers1

1

But, the literature (i.e. Learning from Data) states that VC gives a loose bound and that in real applications, learning models with lower VC dimension tend to generalize better than those with higher VC dimension.

It's true that people often use techniques such as regularisation to avoid over-parametrized models. However, I think it's dangerous to say that, in real applications, those models are really generalizing better, given that you're typically assessing their generalization ability by using a finite validation dataset (possibly selected in a biased way). Moreover, note that the validation dataset is often ignored in certain bounds and the only thing that is taken into account is the expected risk and the empirical risk (on the training data).

In any case, the $\mathcal{VC}$ bounds may be "loose" because

  • the $\mathcal{VC}$ dimension is often expressed with the big-$\mathcal{O}$ notation (e.g. see this answer)
  • the bounds often involve probabilities and uncertainties (e.g. see this other answer)
  • there aren't stricter bounds (i.e. no one found better bounds yet)

Is the VC generalization bound loose due to its universality? Meaning, its results apply to all hypothesis sets, learning algorithms, input spaces, probability distributions, and binary target functions.

I think the answer is yes. In fact, often, the specific tasks or learning algorithms are ignored in the analyses, but, in general, you may exploit your knowledge of the problem to improve e.g. the number of useful examples that the learning algorithm could use.

nbro
  • 39,006
  • 12
  • 98
  • 176
  • 2
    Great. A comment about "(i.e. no one found better bounds yet)": "_There exist lower bounds on the rate of uniform convergence where the order of magnitude is close to the order of magnitude obtained for the upper bounds (Vapnik and Chervonenkis, 1974)_" – OmG Apr 19 '20 at 17:21