Highest Voted 'regularization' Questions - Artificial Intelligence Stack Exchange

10

votes

1 answer

Can someone explain R1 regularization function in simple terms?

I'm trying to understand the R1 regularization function, both the abstract concept and every symbol in the formula. According to the article, the definition of R1 is: It penalizes the discriminator from deviating from the Nash Equilibrium via…

asked Dec 30 '20 at 08:41

Aviad Hadad

211
2
7

10

votes

3 answers

Are there any rules of thumb for having some idea of what capacity a neural network needs to have for a given problem?

To give an example. Let's just consider the MNIST dataset of handwritten digits. Here are some things which might have an impact on the optimum model capacity: There are 10 output classes The inputs are 28x28 grayscale pixels (I think this…

neural-networks computational-learning-theory regularization vc-dimension capacity

asked Feb 24 '20 at 20:00

Alexander Soare

1,319
2
11
26

9

votes

1 answer

What is "early stopping" in machine learning?

What is early stopping in machine learning and, in general, artificial intelligence? What are the advantages of using this method? How does it help exactly? I'd be interested in perspectives and links to recent research.

deep-learning definitions overfitting regularization early-stopping

asked Aug 02 '16 at 15:53

kenorb

10,423
3
43
91

7

votes

3 answers

How should we regularize an LSTM model?

There are five parameters from an LSTM layer for regularization if I am correct. To deal with overfitting, I would start with reducing the layers reducing the hidden units Applying dropout or regularizers. There are kernel_regularizer,…

recurrent-neural-networks long-short-term-memory overfitting regularization dropout

asked Jan 25 '21 at 05:50

Leo

123
1
6

6

votes

2 answers

Why is dropout favoured compared to reducing the number of units in hidden layers?

Why is dropout favored compared to reducing the number of units in hidden layers for the convolutional networks? If a large set of units leads to overfitting and dropping out "averages" the response units, why not just suppress units? I have read…

neural-networks machine-learning deep-learning dropout regularization

asked Dec 11 '19 at 16:26

pascal sautot

231
1
8

5

votes

1 answer

How does L2 regularization make weights smaller?

I'm learning logistic regression and $L_2$ regularization. The cost function looks like below. $$J(w) = -\displaystyle\sum_{i=1}^{n} (y^{(i)}\log(\phi(z^{(i)})+(1-y^{(i)})\log(1-\phi(z^{(i)})))$$ And the regularization term is added. ($\lambda$ is a…

machine-learning proofs hyper-parameters regularization l2-regularization

asked Sep 23 '18 at 03:24

Riddle Aaron

65
3

4

votes

2 answers

Why did the L1/L2 regularization technique not improve my accuracy?

I am training a multilayer neural network with 146 samples (97 for the training set, 20 for the validation set, and 29 for the testing set). I am using: automatic differentiation, SGD method, fixed learning rate + momentum term, logistic…

neural-networks deep-learning training overfitting regularization

asked Oct 24 '18 at 20:05

LVoltz

121
1
5

4

votes

1 answer

Combine multiple losses with gradient descent

I am optimizing a neural network with Adam using 3 different losses. Their scale is very different, and the current method is to either sum the losses and clip the gradient or to manually weight them within the sum. Something like:…

gradient-descent regularization gradient-clipping

asked May 24 '23 at 17:29

Simon

153
3

4

votes

2 answers

How does Regularization Reduce Overfitting?

As I understand, this is the general summary of the Regularization-Overfitting Problem: The classical "Bias-Variance Tradeoff" suggests that complicated models (i.e. models with more parameters, e.g. neural networks with many layers/weights) are…

neural-networks overfitting regularization

asked Jan 24 '22 at 22:25

stats_noob

269
2
11

4

votes

0 answers

When is using weight regularization bad?

Regularization of weights (e.g. L1 or L2) keeps them small and standardized, which can help reduce data overfitting. From this article, regularization sounds favorable in many cases, but is it always encouraged? Are there scenarios in which it…

neural-networks regularization weights l2-regularization l1-regularization

asked Dec 21 '20 at 20:41

mark mark

753
4
23

4

votes

1 answer

Why does L1 regularization yield sparse features?

In contrast to L2 regularization, L1 regularization usually yields sparse feature vectors and most feature weights are zero. What's the reason for the above statement - could someone explain it mathematically, and/or provide some intuition (maybe…

machine-learning regularization l2-regularization l1-regularization

asked Jul 02 '20 at 08:44

stoic-santiago

1,121
5
18

4

votes

1 answer

Is there a way to ensure that my model is able to recognize an unseen example?

My question is more theoretical than practical. Let's say that I am training my cat classifier with a dataset that I feel is pretty representative of cat images in general. But then a new breed of cat is created that is distinct from other cats and…

neural-networks machine-learning overfitting regularization generalization

asked Feb 24 '20 at 21:31

mdurrant

41
2

3

votes

2 answers

Should I apply normalization to the observations in deep reinforcement learning?

I am new to DRL and trying to implement my custom environment. I want to know if normalization and regularization techniques are as important in RL as in Deep Learning. In my custom environment, the state/observation values are in a different range.…

reinforcement-learning deep-rl regularization normalisation observation-spaces

asked Sep 07 '21 at 06:06

moyukh

31
1
2

3

votes

1 answer

Does adding a model complexity penalty to the loss function allow you to skip cross-validation?

It's my understanding that selecting for small models, i.e. having a multi-objective function where you're optimizing for both model accuracy and simplicity, automatically takes care of the danger of overfitting the data. Do I have this right? It…

machine-learning overfitting regularization cross-validation capacity

asked Apr 06 '21 at 15:12

Redrock

33
2

3

votes

0 answers

Enforcing sparsity constraints that make use of spatial contiguity

I have a deep learning network that outputs grayscale image reconstructions. In addition to good reconstruction performance (measured through mean squared error or some other measure like psnr), I want to encourage these outputs to be sparse through…

deep-learning objective-functions image-processing autoencoders regularization

asked Sep 17 '20 at 21:02

Jane Sully

143
3

Questions tagged [regularization]