When training a CNN, what are the hyperparameters to tune first?

Question

I am training a convolutional neural network for object detection. Apart from the learning rate, what are the other hyperparameters that I should tune? And in what order of importance? Besides, I read that doing a grid search for hyperparameters is not the best way to go about training and that random search is better in this case. Is random search really that good?

1. Learning rate schedule (decrease rate, cyclic etc) 2.Momentum if used(including 0) 3. augmentation (random transformations values) if used — mirror2image, Jan 15 '20 at 15:03
@mirror2image momentum is fixed to 0.9 (it appears that it's common practice to fix it to this value if you want to focus on other hyperparameters). Augmentations are done, but I'm mostly talking about other hyperparameters : regularizarion, weight decay... AND MOST IMPORTANTLY: from where to start — S.E.K., Jan 15 '20 at 15:14
Weight decay is useless in most cases. There are a lot of different regularizations, use common sense for them. Fixed momentum is bad practice. Always good idea to check momentum 0, 0.6, 0.9 Start with different learning rate *schedules* — mirror2image, Jan 15 '20 at 15:18
@mirror2image so you say: I start with trying out different lr scheduling policies, and during each policy, I do a grid search of the learning rate, momentum and the parameters of regulatization ? — S.E.K., Jan 15 '20 at 16:32
First - schedule and learning rate with grid search. After that momentum and regularization separately - it's highly likely they are independent. — mirror2image, Jan 15 '20 at 17:55
This playlist by Andrew Ng helped me understand the different nuances of hyperparameter tuning. I'm sure it'll help you as well. https://www.youtube.com/watch?v=1waHlpKiNyY&list=PLkDaE6sCZn6Hn0vK8co82zjQtt3T2Nkqc&ab_channel=DeepLearningAI — Uday Sai, Nov 12 '21 at 12:20

keshav · Answer 1 · 2020-01-23T14:13:43.460

0

Firstly when you say an object detection CNN, there are a huge number of model architectures available. Considering that you have narrowed down on your model architecture a CNN will have a few common layers like the ones below with hyperparameters you can tweak:

Convolution Layer:- number of kernels, kernel size, stride length, padding
MaxPooling Layer:- kernel size, stride length, padding
Dense Layer:- size
Dropout:- Percentage to keep/drop

edited Jan 23 '20 at 14:13

answered Jan 23 '20 at 14:03

keshav

56
3

Thanks for your answer! Actually I do not intend to change the architecture, so the kernel size, the stride... are not concerned. The main goal of my question is HOW TO GO ABOUT THE TRAINING. What to strat from? – S.E.K. Jan 23 '20 at 14:15

score 0 · Answer 2 · answered Aug 10 '22 at 19:48

You can play around with batch Size, epochs, learning rate.

Start with a batch size of 8 and gradually increase two folds => 8,16,32

Start with a higher learning rate to find the direction towards global minima, then reduce learning rate to avoid bouncing around and reach local minima

Start with a low number of epochs to get quick feedback on the rest of hyperparameters. Once you have the optimal direction for hyperparameters move to higher number of epochs

When training a CNN, what are the hyperparameters to tune first?

2 Answers2