Highest Voted 'hyper-parameters' Questions - Artificial Intelligence Stack Exchange

65

votes

4 answers

How to select number of hidden layers and number of memory cells in an LSTM?

I am trying to find some existing research on how to select the number of hidden layers and the size of these of an LSTM-based RNN. Is there an article where this problem is being investigated, i.e., how many memory cells should one use? I assume it…

asked Apr 14 '17 at 13:35

Stephen Johnson

969
2
8
9

33

votes

4 answers

How to find the optimal number of neurons per layer?

When you're writing your algorithm, how do you know how many neurons you need per single layer? Are there any methods for finding the optimal number of them, or is it a rule of thumb?

neural-networks hyperparameter-optimization artificial-neuron hyper-parameters layers

asked Aug 02 '16 at 15:41

kenorb

10,423
3
43
91

24

votes

3 answers

How to choose an activation function for the hidden layers?

I choose the activation function for the output layer depending on the output that I need and the properties of the activation function that I know. For example, I choose the sigmoid function when I'm dealing with probabilities, a ReLU when I'm…

neural-networks deep-learning activation-functions hyperparameter-optimization hyper-parameters

asked Jul 09 '18 at 00:06

gvgramazio

696
2
7
19

14

votes

2 answers

How large should the replay buffer be?

I'm learning DDPG algorithm by following the following link: Open AI Spinning Up document on DDPG, where it is written In order for the algorithm to have stable behavior, the replay buffer should be large enough to contain a wide range of…

reinforcement-learning deep-rl hyper-parameters ddpg experience-replay

asked Apr 04 '19 at 14:40

ycenycute

341
1
2
6

9

votes

1 answer

What causes a model to require a low learning rate?

I've pondered this for a while without developing an intuition for the math behind the cause of this. So what causes a model to need a low learning rate?

machine-learning models hyper-parameters learning-rate

asked Mar 31 '19 at 21:07

JohnAllen

217
1
6

8

votes

2 answers

Why should the number of neurons in a hidden layer be a power of 2?

I have read somewhere on the web (I lost the reference) that the number of units (or neurons) in a hidden layer should be a power of 2 because it helps the learning algorithm to converge faster. Is this a fact? If it is, why is this true? Does it…

deep-learning optimization hyperparameter-optimization hyper-parameters hidden-layers

asked Feb 22 '18 at 16:56

dsfx3d

205
2
7

7

votes

2 answers

How do we choose the kernel size depending on the problem?

Obviously, finding suitable hyper-parameters for a neural network is a complex task and problem or domain-specific. However, there should be at least some "rules" that hold most times for the size of the filter (or kernel)! In most cases, intuition…

convolutional-neural-networks image-recognition hyperparameter-optimization hyper-parameters filters

asked May 16 '17 at 10:47

daniel451

256
1
4
9

7

votes

3 answers

How to determine the embedding size?

When we are training a neural network, we are going to determine the embedding size to convert the categorical (in NLP, for instance) or continuous (in computer vision or voice) information to hidden vectors (or embeddings), but I wonder if there…

deep-learning hyperparameter-optimization hyper-parameters embeddings

asked Jul 07 '21 at 13:26

Lerner Zhang

877
1
7
19

6

votes

1 answer

How should we choose the dimensions of the encoding layer in auto-encoders?

neural-networks autoencoders hyperparameter-optimization variational-autoencoder hyper-parameters

asked Dec 27 '18 at 17:26

Neha soni

101
3

6

votes

1 answer

Should I be decaying the learning rate and the exploration rate in the same manner?

Should I be decaying the learning rate and the exploration rate in the same manner? What's too slow and too fast of an exploration and learning rate decay? Or is it specific from model to model?

reinforcement-learning deep-rl hyper-parameters learning-rate exploration-strategies

asked Sep 11 '18 at 10:47

rtz

91
6

6

votes

1 answer

Is this idea to calculate the required number of hidden neurons for a single hidden layer neural network correct?

I have an idea to find the optimal number of hidden neurons required in a neural network, but I'm not sure how accurate it is. Assuming that it has only 1 hidden layer, it is a classification problem with 1 output node (so it's a binary…

neural-networks deep-learning hyperparameter-optimization hidden-layers hyper-parameters

asked Oct 27 '19 at 19:58

w13rfed

205
1
5

5

votes

1 answer

How does L2 regularization make weights smaller?

I'm learning logistic regression and $L_2$ regularization. The cost function looks like below. $$J(w) = -\displaystyle\sum_{i=1}^{n} (y^{(i)}\log(\phi(z^{(i)})+(1-y^{(i)})\log(1-\phi(z^{(i)})))$$ And the regularization term is added. ($\lambda$ is a…

machine-learning proofs hyper-parameters regularization l2-regularization

asked Sep 23 '18 at 03:24

Riddle Aaron

65
3

5

votes

2 answers

What are the best hyper-parameters to tune in reinforcement learning?

Obviously, this is somewhat subjective, but what hyper-parameters typically have the most significant impact on an RL agent's ability to learn? For example, the replay buffer size, learning rate, entropy coefficient, etc. For example, in "normal"…

reinforcement-learning deep-rl hyperparameter-optimization hyper-parameters proximal-policy-optimization

asked May 28 '21 at 11:21

Dylan Kerler

243
2
7

5

votes

1 answer

How do I design a neural network that breaks a 5-letter word into its corresponding syllables?

I am going to design a neural network which will be able to break a 5-letter word into its corresponding syllables (hybrid syllables, I mean it will not strictly adhere to grammatical syllable rules but will be based on some training sets I…

neural-networks hyperparameter-optimization hyper-parameters feedforward-neural-networks network-design

asked Dec 31 '16 at 15:57

Programmer

164
6

5

votes

1 answer

How many weights does the max-pooling layer have?

How many weights does the max-pooling layer have? For example, if there are 10 inputs, a pooling filter of size 2, stride 2, how many weights, including bias, does a max-pooling layer have?

convolutional-neural-networks hyper-parameters filters pooling max-pooling

asked Oct 27 '19 at 20:53

Tibby

53
5

Questions tagged [hyper-parameters]