Convert a PAC-learning algorithm into another one which requires no knowledge of the parameter

Question

This is part of the exercise 2.13 in the book Foundations of Machine Learning (page 28). You can refer to chapter 2 for the notations.

Consider a family of concept classes $\left\{\mathcal{C}_{s}\right\}_{s}$ where $\mathcal{C}_{s}$ is the set of concepts in $\mathcal{C}$ with size at most $s.$ Suppose we have a PAC-learning algorithm $\mathcal{A}$ that can be used for learning any concept class $\mathcal{C}_{s}$ when $s$ is given. Can we convert $\mathcal{A}$ into a PAC-learning algorithm $\mathcal{B}$ that does not require the knowledge of $s ?$ This is the main objective of this problem.

To do this, we first introduce a method for testing a hypothesis $h,$ with high probability. Fix $\epsilon>0, \delta>0,$ and $i \geq 1$ and define the sample size $n$ by $n=\frac{32}{\epsilon}\left[i \log 2+\log \frac{2}{\delta}\right].$ Suppose we draw an i.i.d. sample $S$ of size $n$ according to some unknown distribution $\mathcal{D}.$ We will say that a hypothesis $h$ is accepted if it makes at most $3 / 4 \epsilon$ errors on $S$ and that it is rejected otherwise. Thus, $h$ is accepted iff $\widehat{R}(h) \leq 3 / 4 \epsilon$

(a) Assume that $R(h) \geq \epsilon .$ Use the (multiplicative) Chernoff bound to show that in that case $\mathbb{P}_{S \sim D^{n}}[h \text { is accepted}] \leq \frac{\delta}{2^{i+1}}$

(b) Assume that $R(h) \leq \epsilon / 2 .$ Use the (multiplicative) Chernoff bounds to show that in that case $\mathbb{P}_{S \sim \mathcal{D}^{n}}[h \text { is rejected }] \leq \frac{\delta}{2^{i+1}}$

(c) Algorithm $\mathcal{B}$ is defined as follows: we start with $i=1$ and, at each round $i \geq 1,$ we guess the parameter size $s$ to be $\widetilde{s}=\left\lfloor 2^{(i-1) / \log \frac{2}{\delta}}\right\rfloor .$ We draw a sample $S$ of size $n$ (which depends on $i$ ) to test the hypothesis $h_{i}$ returned by $\mathcal{A}$ when it is trained with a sample of size $S_{\mathcal{A}}(\epsilon / 2,1 / 2, \widetilde{s}),$ that is the sample complexity of $\mathcal{A}$ for a required precision $\epsilon / 2,$ confidence $1 / 2,$ and size $\tilde{s}$ (we ignore the size of the representation of each example here). If $h_{i}$ is accepted, the algorithm stops and returns $h_{i},$ otherwise it proceeds to the next iteration. Show that if at iteration $i,$ the estimate $\widetilde{s}$ is larger than or equal to $s,$ then $\mathbb{P}\left[h_{i} \text { is accepted}\right] \geq 3 / 8$

Question (a) and (b) are easy to prove, but I have trouble with the question (c). More specifically, I don't know how to use the condition that $\widetilde{s} \geq s$. Can anyone help?

Convert a PAC-learning algorithm into another one which requires no knowledge of the parameter

0 Answers0