I am reading the Wikipedia article on gradient boosting. There is written:
Unfortunately, choosing the best function $h$ at each step for an arbitrary loss function $L$ is a computationally infeasible optimization problem in general. Therefore, we restrict our approach to a simplified version of the problem.
How would the "best function" been constructed if there are no computationally limitations?
Because the wiki gives me the idea that the model could been better, but that compromises have been made.