1

Let's start with a typical definition of the VC dimension (as described in this book)

Definition $3.10$ (VC-dimension) The $V C$ -dimension of a hypothesis set $\mathcal{H}$ is the size of the largest set that can be shattered by $\mathcal{H}$ : $$ \operatorname{VCdim}(\mathcal{H})= \max \left\{m: \Pi_{\mathcal{H}}(m)=2^{m}\right\} $$

So, if there exists some set of size $d$ that $\mathcal{H}$ can shatter and it cannot shatter any set of size $d+1$, then the $\operatorname{VCdim}(\mathcal{H}) = d$.

Now, my question is: why would we be just interested in the existence of some set of size $d$ and not all sets of size $d$?

For instance, if you consider one of the typical examples that are used to illustrate the concept of the VC dimension, i.e. $\mathcal{H}$ is the set of all rectangles, then we can show that $\operatorname{VCdim}(\mathcal{H}) = d = 4$, given that there's a configuration of $d=4$ points that, for all possible labellings of those points, there's a hypothesis in $\mathcal{H}$ that correctly classifies those points. However, we can also easily show that, if the 4 points are collinear, there's some labelling of them (i.e. the 1st and 3rd are of colour $A$, while the 2nd and 4th are of colour $B \neq A$) that a rectangle cannot classify correctly.

So, the class of all rectangles can shatter some sets of points, but not all, so we would need another class of hypotheses to classify all sets of four points correctly. The VC dimension does not seem to provide any intuition on which set of classes would do the trick.

So, do you know why the VC dimension wasn't defined for all configurations of $d$ points? Was this just a need of Vapnik and Chervonenkis for the work they were developing (VC theory), or could have they defined it differently? So, if you know the rationale behind this specific definition, feel free to provide an answer. References to relevant work by Vapnik and Chervonenkis are also appreciated.

nbro
  • 39,006
  • 12
  • 98
  • 176
  • For people interested in answering this question, the paper [On the uniform convergence of relative frequencies of events to their probabilities](https://courses.engr.illinois.edu/ece544na/fa2014/vapnik71.pdf) (1971) by VC is probably the one that you should read first. – nbro May 19 '21 at 11:06

2 Answers2

1

The measure that you are talking about actually has a name. It is called the "Popper dimension" -- it was introduced by Karl Popper in his "Logic of scientific discovery".

Popper's idea of falsifiability was, as Vladimir Vapnik himself admits, the inspiration behind their work on the VC dimension. The VC dimension of the hypothesis set $\mathcal{H}$ measures the "complexity" of the set by looking at how easy would it be to falsify it. The hypothesis sets with higher VC dimension should be harder to falsify. With limiting case of infinite VC dimension which is unfalsifiable for almost any data.

VC dimension and Popper dimension are different in exactly the way you are describing. This quote from the philosophy paper that Vapnik co-authored states it rather succinctly:

The VC-dimension is the largest number of points one can shatter, the Popper dimension is one less than the smallest number of points one can not shatter.

This paper goes on trying to reconcile the two definitions but I didn't find their exposition satisfactory (or even meaningful). I've found much more harsh take on this is in the "Learning from data" book (second edition, Section 4.7) :

Now we can contrast the VC dimension and Popper’s dimension, and conclude that Popper’s definition is not meaningful, as it does not lead to any useful conditions for generalization. In fact, for linear estimators the Popper’s dimension is at most 2, regardless of the problem dimensionality, as a set of hyperplanes cannot shatter three collinear points.

Which looks fair to me.

Kostya
  • 2,416
  • 7
  • 23
  • I finally had the opportunity to read the paper "_Falsificationism and Statistical Learning Theory: Comparing the Popper and Vapnik-Chervonenkis Dimensions_" that you mention. There, Vapnik does not admit that the VC dimension was inspired by Popper's dimension, which Popper calls the "_**characteristic number** of a theory_". I agree with your conclusion that this paper and the authors' arguments are not very meaningful or, better, clear. It's not even clear how Popper's definition of a **characteristic number** of a scientific theory is related to the VC dimension. – nbro Nov 28 '21 at 02:05
  • Popper writes about a theory as a set of statements; then talks about statements in that theory... It's not clear how the authors of this paper drew all those conclusions from the description of a characteristic number (section 3.2). It might be that, if I read the paper again, I understand more their points. As you point out, they say the VC dimension of a hyperplane can be $d+1$, so the Popper's dimension would be $d$ in this case. However, in another example, they say that the Popper's dimension is $2$, rather than $3$, which is one less than the VC dimension in that other example. – nbro Nov 28 '21 at 02:16
  • So, the paper seems to be inconsistent, or maybe they are just showing us how Popper was actually inconsistent, but, to me, their interpretations of Popper's quotes are not clear. One example that they give that is consistent with the concept of "falsification (of a theory)" is that of women's and men's measurements, where, after you have found a single man or woman with different measurements than the ones you believed men and women had, then you immediately falsify your theory. That makes sense to me. – nbro Nov 28 '21 at 02:22
  • In any case, after having read that paper, your statement "_Popper's idea of *falsifiability* was, as Vladimir Vapnik himself admits, the inspiration behind their work on the VC dimension. The VC dimension of the hypothesis set $\mathcal{H}$ measures the "complexity" of the set by looking at how easy would it be to falsify it. The hypothesis sets with higher VC dimension should be harder to falsify._" seems misleading because the definition of "falsification" that you're giving here, in the case of VC dimension, is different than the definition of falsification by Popper. – nbro Nov 28 '21 at 11:55
  • More specifically, if you cannot label one ordering/configuration of the same number of points $n$, it's fine, as long as you are able to label correctly at least one ordering of the same $n$ points. If you're not able to label any configuration of $n+1$ points, then, in a way, you falsify this hypothesis test, in the sense that you show that you cannot find a hypothesis that solves the classification problem for any of the possible labels of $n+1$ points. So, the VC dimension is $n$. Saying this does not imply that you will be able to solve all classification problems of $n$ points! – nbro Nov 28 '21 at 11:58
  • So, this notion of falsification is really different than Popper's notion of falsification of a theory, where, as far as I understand, a single (or a set of) contradictory statement(s) to your theory would falsify it (although he talks about d-tuples of statements). So, I think that saying the VC dimension is related to falsification, especially in the context of Popper's ideas, is misleading, because Popper had a different idea of falsification, which is stronger/stricter than the VC dimension, but I am not even sure if it makes sense to compare Popper's ideas with the VC dimension. – nbro Nov 28 '21 at 12:00
  • I would upvote this answer because Popper's ideas and the paper that you provided are relevant to what I was asking, but I don't think saying that last quote makes sense is fair. To me, it doesn't make sense. Popper was talking about a different problem, i.e. the one of falsifying a theory as a way of providing that it's wrong and he thought that you could not prove a general theory is true. So, why would Popper's dimension not be meaningful? It's meaningful, but in another context. People seem to just misinterpret Popper and I am not even sure why Popper pops up in the context of ML. – nbro Nov 28 '21 at 12:19
  • @nbro It is hard reading such a long aside across several long comments. As I understand, this is not how this site is supposed to work. I trust that I did give a helpful answer. You might disagree with it on philosophical grounds. If you can formulate a pointed follow-up question(s) - I (or somebody else) would try and answer it. Otherwise, this doesn't sound too constructive and useful. – Kostya Nov 29 '21 at 09:32
  • I don't have any follow-up question right now. My direct feedback to you is that, although it's relevant, your answer doesn't fully address my question and I disagree with something that you wrote, so I cannot upvote/accept it. If you're willing to read my comments above, then maybe you would be able to change your answer to fix what I think is wrong. Anyway, it's true, comments are not meant for long discussions, and I could have summarised what I wanted to say in fewer comments. Later, I will move these comments into a chat. But note that comments can be used to ask for clarifications. – nbro Nov 29 '21 at 11:46
  • @nbro Did you have a chance to look at the "Learning from data. Ch. 4.7" reference I've provided? It looks like it gives a detailed exposé on the matters you are trying to formulate in your comments. – Kostya Nov 29 '21 at 13:17
  • I don't have access to that book right now. – nbro Nov 29 '21 at 17:23
0

I would say, that meaning of VC dimension is at least a possibility to implement any function on the given number of points for some case, not an express any function on $n$ points.

Yes, you are right, that this definition, unfortunately, is not very useful in practice.

Say, family of functions $\text{sign}(\sin(ax))$ has infinite $\text{VC}$ dimension. There exists an infinite sequence $x_n$, such that by tuning the parameter $a$ one can implement any logical function $0 1 0 \ldots 1$ on these points. However, this doesn't make this class of functions an outstanding ML algorithm.

  • Unfortunately, this does not really answer my question, which is: _**why** did VC define the VC dimension in this way?_ The answer can probably be found in the paper I am mentioning in the comment under my post above. – nbro May 20 '21 at 09:21
  • @nbro they defined it in this way because they can obtain some analytical results about it. The definition, you are asking for is more useful, but seems like one has not succeeded to develop some generic theory for it – spiridon_the_sun_rotator May 20 '21 at 09:27
  • Could you please point me to the relevant part of the mentioned paper or any other resource that discusses this, i.e. the why? I didn't read the mentioned paper, but after having skimmed through it, I noticed that it mentions the _growth function_. It's probably also a good idea for you to edit your answer so that you focus on answering my question (if you really know the answer). – nbro May 20 '21 at 09:30