I am training a convolutional neural network for object detection. Apart from the learning rate, what are the other hyperparameters that I should tune? And in what order of importance? Besides, I read that doing a grid search for hyperparameters is not the best way to go about training and that random search is better in this case. Is random search really that good?
Asked
Active
Viewed 1,428 times
6
-
1. Learning rate schedule (decrease rate, cyclic etc) 2.Momentum if used(including 0) 3. augmentation (random transformations values) if used – mirror2image Jan 15 '20 at 15:03
-
@mirror2image momentum is fixed to 0.9 (it appears that it's common practice to fix it to this value if you want to focus on other hyperparameters). Augmentations are done, but I'm mostly talking about other hyperparameters : regularizarion, weight decay... AND MOST IMPORTANTLY: from where to start – S.E.K. Jan 15 '20 at 15:14
-
Weight decay is useless in most cases. There are a lot of different regularizations, use common sense for them. Fixed momentum is bad practice. Always good idea to check momentum 0, 0.6, 0.9 Start with different learning rate *schedules* – mirror2image Jan 15 '20 at 15:18
-
@mirror2image so you say: I start with trying out different lr scheduling policies, and during each policy, I do a grid search of the learning rate, momentum and the parameters of regulatization ? – S.E.K. Jan 15 '20 at 16:32
-
First - schedule and learning rate with grid search. After that momentum and regularization separately - it's highly likely they are independent. – mirror2image Jan 15 '20 at 17:55
-
This playlist by Andrew Ng helped me understand the different nuances of hyperparameter tuning. I'm sure it'll help you as well. https://www.youtube.com/watch?v=1waHlpKiNyY&list=PLkDaE6sCZn6Hn0vK8co82zjQtt3T2Nkqc&ab_channel=DeepLearningAI – Uday Sai Nov 12 '21 at 12:20
2 Answers
0
Firstly when you say an object detection CNN, there are a huge number of model architectures available. Considering that you have narrowed down on your model architecture a CNN will have a few common layers like the ones below with hyperparameters you can tweak:
- Convolution Layer:- number of kernels, kernel size, stride length, padding
- MaxPooling Layer:- kernel size, stride length, padding
- Dense Layer:- size
- Dropout:- Percentage to keep/drop

keshav
- 56
- 3
-
Thanks for your answer! Actually I do not intend to change the architecture, so the kernel size, the stride... are not concerned. The main goal of my question is HOW TO GO ABOUT THE TRAINING. What to strat from? – S.E.K. Jan 23 '20 at 14:15
0
You can play around with batch Size, epochs, learning rate.
Start with a batch size of 8 and gradually increase two folds => 8,16,32
Start with a higher learning rate to find the direction towards global minima, then reduce learning rate to avoid bouncing around and reach local minima
Start with a low number of epochs to get quick feedback on the rest of hyperparameters. Once you have the optimal direction for hyperparameters move to higher number of epochs

Marib Sultan
- 76
- 3