Questions tagged [hyperparameter-optimization]

For questions related to the concept of hyper-parameter optimization, that is, the task of finding the best hyper-parameters for a particular learning algorithm (e.g. gradient descent) or model (e.g. a multi-layer neural network) using an optimization method (e.g. Bayesian optimization or genetic algorithms).

For more info, see e.g. https://en.wikipedia.org/wiki/Hyperparameter_optimization.

122 questions
65
votes
4 answers

How to select number of hidden layers and number of memory cells in an LSTM?

I am trying to find some existing research on how to select the number of hidden layers and the size of these of an LSTM-based RNN. Is there an article where this problem is being investigated, i.e., how many memory cells should one use? I assume it…
33
votes
4 answers

How to find the optimal number of neurons per layer?

When you're writing your algorithm, how do you know how many neurons you need per single layer? Are there any methods for finding the optimal number of them, or is it a rule of thumb?
24
votes
3 answers

How to choose an activation function for the hidden layers?

I choose the activation function for the output layer depending on the output that I need and the properties of the activation function that I know. For example, I choose the sigmoid function when I'm dealing with probabilities, a ReLU when I'm…
18
votes
2 answers

How do I decide the optimal number of layers for a neural network?

How do I decide the optimal number of layers for a neural network (feedforward or recurrent)?
16
votes
1 answer

Will parameter sweeping on one split of data followed by cross validation discover the right hyperparameters?

Let's call our dataset splits train/test/evaluate. We're in a situation where we require months of data. So we prefer to use the evaluation dataset as infrequently as possible to avoid polluting our results. Instead, we do 10 fold cross validation…
16
votes
2 answers

How can I automate the choice of the architecture of a neural network for an arbitrary problem?

Assume that I want to solve an issue with a neural network that either I can't fit to existing architectures (perceptron, Konohen, etc) or I'm simply not aware of the existence of those or I'm unable to understand their mechanics and I rely on my…
8
votes
2 answers

Why should the number of neurons in a hidden layer be a power of 2?

I have read somewhere on the web (I lost the reference) that the number of units (or neurons) in a hidden layer should be a power of 2 because it helps the learning algorithm to converge faster. Is this a fact? If it is, why is this true? Does it…
7
votes
2 answers

How do we choose the kernel size depending on the problem?

Obviously, finding suitable hyper-parameters for a neural network is a complex task and problem or domain-specific. However, there should be at least some "rules" that hold most times for the size of the filter (or kernel)! In most cases, intuition…
7
votes
3 answers

How to determine the embedding size?

When we are training a neural network, we are going to determine the embedding size to convert the categorical (in NLP, for instance) or continuous (in computer vision or voice) information to hidden vectors (or embeddings), but I wonder if there…
7
votes
1 answer

How do we decide which membership function to use?

In classical set theory, there are two options for an element. It is either a member of a set or not. But in fuzzy set theory, there are membership functions to define the "rate" of an element being a member of a set. In other words, classical logic…
7
votes
1 answer

An intuitive explanation of Adagrad, its purpose and its formula

It (Adagrad) adapts the learning rate to the parameters, performing smaller updates (i.e. low learning rates) for parameters associated with frequently occurring features, and larger updates (i.e. high learning rates) for parameters associated…
6
votes
1 answer
6
votes
2 answers

How to shorten the development time of a neural network?

I am developing an LSTM for sequence tagging. During the development, I do various changes in the system, for example, add new features, change the number of nodes in the hidden layers, etc. After each change, I check the accuracy using…
6
votes
2 answers

When training a CNN, what are the hyperparameters to tune first?

I am training a convolutional neural network for object detection. Apart from the learning rate, what are the other hyperparameters that I should tune? And in what order of importance? Besides, I read that doing a grid search for hyperparameters is…
6
votes
3 answers

What is a "surrogate model"?

In the following paragraph from the book Automated Machine Learning: Methods, Systems, Challenges (by Frank Hutter et al.) In this section we first give a brief introduction to Bayesian optimization, present alternative surrogate models used in it,…
1
2 3
8 9