Bayesian hyperparameter optimization, is it worth it?

Question

In the Deep Learning book by Goodfellow et al., section 11.4.5 (p. 438), the following claims can be found:

Currently, we cannot unambiguously recommend Bayesian hyperparameter optimization as an established tool for achieving better deep learning results or for obtaining those results with less effort. Bayesian hyperparameter optimization sometimes performs comparably to human experts, sometimes better, but fails catastrophically on other problems. It may be worth trying to see if it works on a particular problem but is not yet sufficiently mature or reliable

Personally, I never used Bayesian hyperparameter optimization. I prefer the simplicity of grid search and random search.

As a first approximation, I'm considering easy AI tasks, such as multi-class classification problems with DNNs and CNNs.

In which cases should I take it into consideration, is it worth it?

score 2 · Answer 1 · answered Dec 18 '20 at 04:56

2

Efficiently integrating HPO frameworks into an existing project is non-trivial. Most common datasets/tasks already have established architectures/hyperparameters/etc. and require only a few additional tuning parameters. In this case, the benefits (assuming they exist) brought by Bayesian HPO techniques lack behind development time (simplicity), and this is one dominant reason that users prefer grid search or random search over some more sophisticated Bayesian optimization techniques.

For a very large scale problem and an entirely new task (or dataset) for which the user has no intuition on "good hyperparameters", Bayesian HPO may be a good option to consider over random or grid search. This is because there may be a very large number of hyperparameters (due to insufficient domain knowledge), and integrating HPO into the project may result in far less time than grid search over all possible hyperparameters.

answered Dec 18 '20 at 04:56

rhdxor

206
1
4

3

You claim "Efficiently integrating HPO frameworks into an existing project is non-trivial", but you do not explain **why**. There are many hyper-parameter optimization packages, so I am not sure why you're saying it's non-trivial to integrate BO into your project. Compared to a random search or grid search, yes, sure, BO is slightly more complicated, because RS and GS are super trivial, but that doesn't make BO really difficult to integrate into your project, so I don't think that this is the main reason why BO is worth it or not. – nbro Dec 19 '20 at 18:41
Moreover, in many cases (especially in RL), to achieve the best performance, you really need to perform hyper-parameter optimization, because, even if there are already default hyper-parameters' values, they are typically not optimal for your specific problem. – nbro Dec 19 '20 at 18:43
OP is considering "easy AI tasks such as multi-class classification". In such cases, HPO is definitely an overkill. There are plenty of resources with already-optimized HPs. Of course, running additional searches may increase overall performance by 0.xx%, but if you consider scraping some DL tutorial or writing a training script, even using HPO frameworks would take significantly longer as suggested in first paragraph. In more complicated scenarios (second paragraph / RL as you suggest), one could and should integrate HPO frameworks. I think my response is an adequate fit to what OP considers. – rhdxor Dec 20 '20 at 02:24

Bayesian hyperparameter optimization, is it worth it?

1 Answers1