Let $H_1$ , $H_2$ ,... be a sequence of hypothesis classes for binary classification.
Assume that there is a learning algorithm that implements the ERM rule in the realizable case such that the output hypothesis of the algorithm for each class $H_n$ only depends on $O(n)$ examples out of the training set. Furthermore, assume that such a hypothesis can be calculated given these $O(n)$ examples in time $O(n)$, and that the empirical risk of each such hypothesis can be evaluated in time $O(mn)$.
For example, if $H_n$ is the class of axis aligned rectangles in $R^n$ , we saw that it is possible to find an ERM hypothesis in the realizable case that is defined by at most $2n$ examples.
Prove that in such cases, it is possible to find an ERM hypothesis for $H_n$ in the unrealizable case in time $O(mnm^{O(n)})$.