Questions tagged [hyper-parameters]

For questions related to the hyper-parameters of AI models and algorithms, which are parameters that are set before the learning process begins. For example, the number of hidden layers in a feed-forward neural network is usually a hyper-parameter.

93 questions
65
votes
4 answers

How to select number of hidden layers and number of memory cells in an LSTM?

I am trying to find some existing research on how to select the number of hidden layers and the size of these of an LSTM-based RNN. Is there an article where this problem is being investigated, i.e., how many memory cells should one use? I assume it…
33
votes
4 answers

How to find the optimal number of neurons per layer?

When you're writing your algorithm, how do you know how many neurons you need per single layer? Are there any methods for finding the optimal number of them, or is it a rule of thumb?
24
votes
3 answers

How to choose an activation function for the hidden layers?

I choose the activation function for the output layer depending on the output that I need and the properties of the activation function that I know. For example, I choose the sigmoid function when I'm dealing with probabilities, a ReLU when I'm…
14
votes
2 answers

How large should the replay buffer be?

I'm learning DDPG algorithm by following the following link: Open AI Spinning Up document on DDPG, where it is written In order for the algorithm to have stable behavior, the replay buffer should be large enough to contain a wide range of…
9
votes
1 answer

What causes a model to require a low learning rate?

I've pondered this for a while without developing an intuition for the math behind the cause of this. So what causes a model to need a low learning rate?
8
votes
2 answers

Why should the number of neurons in a hidden layer be a power of 2?

I have read somewhere on the web (I lost the reference) that the number of units (or neurons) in a hidden layer should be a power of 2 because it helps the learning algorithm to converge faster. Is this a fact? If it is, why is this true? Does it…
7
votes
2 answers

How do we choose the kernel size depending on the problem?

Obviously, finding suitable hyper-parameters for a neural network is a complex task and problem or domain-specific. However, there should be at least some "rules" that hold most times for the size of the filter (or kernel)! In most cases, intuition…
7
votes
3 answers

How to determine the embedding size?

When we are training a neural network, we are going to determine the embedding size to convert the categorical (in NLP, for instance) or continuous (in computer vision or voice) information to hidden vectors (or embeddings), but I wonder if there…
6
votes
1 answer
6
votes
1 answer

Should I be decaying the learning rate and the exploration rate in the same manner?

Should I be decaying the learning rate and the exploration rate in the same manner? What's too slow and too fast of an exploration and learning rate decay? Or is it specific from model to model?
6
votes
1 answer

Is this idea to calculate the required number of hidden neurons for a single hidden layer neural network correct?

I have an idea to find the optimal number of hidden neurons required in a neural network, but I'm not sure how accurate it is. Assuming that it has only 1 hidden layer, it is a classification problem with 1 output node (so it's a binary…
5
votes
1 answer

How does L2 regularization make weights smaller?

I'm learning logistic regression and $L_2$ regularization. The cost function looks like below. $$J(w) = -\displaystyle\sum_{i=1}^{n} (y^{(i)}\log(\phi(z^{(i)})+(1-y^{(i)})\log(1-\phi(z^{(i)})))$$ And the regularization term is added. ($\lambda$ is a…
5
votes
2 answers

What are the best hyper-parameters to tune in reinforcement learning?

Obviously, this is somewhat subjective, but what hyper-parameters typically have the most significant impact on an RL agent's ability to learn? For example, the replay buffer size, learning rate, entropy coefficient, etc. For example, in "normal"…
5
votes
1 answer

How do I design a neural network that breaks a 5-letter word into its corresponding syllables?

I am going to design a neural network which will be able to break a 5-letter word into its corresponding syllables (hybrid syllables, I mean it will not strictly adhere to grammatical syllable rules but will be based on some training sets I…
5
votes
1 answer

How many weights does the max-pooling layer have?

How many weights does the max-pooling layer have? For example, if there are 10 inputs, a pooling filter of size 2, stride 2, how many weights, including bias, does a max-pooling layer have?
1
2 3 4 5 6 7