Questions tagged [regularization]

For questions about application of regularization techniques.

In mathematics, statistics, and computer science, particularly in the fields of machine learning and inverse problems, regularization is a process of introducing additional information in order to solve an ill-posed problem or to prevent overfitting.
https://en.wikipedia.org/wiki/Regularization_(mathematics)

61 questions
10
votes
1 answer

Can someone explain R1 regularization function in simple terms?

I'm trying to understand the R1 regularization function, both the abstract concept and every symbol in the formula. According to the article, the definition of R1 is: It penalizes the discriminator from deviating from the Nash Equilibrium via…
10
votes
3 answers

Are there any rules of thumb for having some idea of what capacity a neural network needs to have for a given problem?

To give an example. Let's just consider the MNIST dataset of handwritten digits. Here are some things which might have an impact on the optimum model capacity: There are 10 output classes The inputs are 28x28 grayscale pixels (I think this…
9
votes
1 answer

What is "early stopping" in machine learning?

What is early stopping in machine learning and, in general, artificial intelligence? What are the advantages of using this method? How does it help exactly? I'd be interested in perspectives and links to recent research.
kenorb
  • 10,423
  • 3
  • 43
  • 91
7
votes
3 answers

How should we regularize an LSTM model?

There are five parameters from an LSTM layer for regularization if I am correct. To deal with overfitting, I would start with reducing the layers reducing the hidden units Applying dropout or regularizers. There are kernel_regularizer,…
6
votes
2 answers

Why is dropout favoured compared to reducing the number of units in hidden layers?

Why is dropout favored compared to reducing the number of units in hidden layers for the convolutional networks? If a large set of units leads to overfitting and dropping out "averages" the response units, why not just suppress units? I have read…
5
votes
1 answer

How does L2 regularization make weights smaller?

I'm learning logistic regression and $L_2$ regularization. The cost function looks like below. $$J(w) = -\displaystyle\sum_{i=1}^{n} (y^{(i)}\log(\phi(z^{(i)})+(1-y^{(i)})\log(1-\phi(z^{(i)})))$$ And the regularization term is added. ($\lambda$ is a…
4
votes
2 answers

Why did the L1/L2 regularization technique not improve my accuracy?

I am training a multilayer neural network with 146 samples (97 for the training set, 20 for the validation set, and 29 for the testing set). I am using: automatic differentiation, SGD method, fixed learning rate + momentum term, logistic…
4
votes
1 answer

Combine multiple losses with gradient descent

I am optimizing a neural network with Adam using 3 different losses. Their scale is very different, and the current method is to either sum the losses and clip the gradient or to manually weight them within the sum. Something like:…
Simon
  • 153
  • 3
4
votes
2 answers

How does Regularization Reduce Overfitting?

As I understand, this is the general summary of the Regularization-Overfitting Problem: The classical "Bias-Variance Tradeoff" suggests that complicated models (i.e. models with more parameters, e.g. neural networks with many layers/weights) are…
stats_noob
  • 269
  • 2
  • 11
4
votes
0 answers

When is using weight regularization bad?

Regularization of weights (e.g. L1 or L2) keeps them small and standardized, which can help reduce data overfitting. From this article, regularization sounds favorable in many cases, but is it always encouraged? Are there scenarios in which it…
4
votes
1 answer

Why does L1 regularization yield sparse features?

In contrast to L2 regularization, L1 regularization usually yields sparse feature vectors and most feature weights are zero. What's the reason for the above statement - could someone explain it mathematically, and/or provide some intuition (maybe…
4
votes
1 answer

Is there a way to ensure that my model is able to recognize an unseen example?

My question is more theoretical than practical. Let's say that I am training my cat classifier with a dataset that I feel is pretty representative of cat images in general. But then a new breed of cat is created that is distinct from other cats and…
3
votes
2 answers

Should I apply normalization to the observations in deep reinforcement learning?

I am new to DRL and trying to implement my custom environment. I want to know if normalization and regularization techniques are as important in RL as in Deep Learning. In my custom environment, the state/observation values are in a different range.…
3
votes
1 answer

Does adding a model complexity penalty to the loss function allow you to skip cross-validation?

It's my understanding that selecting for small models, i.e. having a multi-objective function where you're optimizing for both model accuracy and simplicity, automatically takes care of the danger of overfitting the data. Do I have this right? It…
3
votes
0 answers

Enforcing sparsity constraints that make use of spatial contiguity

I have a deep learning network that outputs grayscale image reconstructions. In addition to good reconstruction performance (measured through mean squared error or some other measure like psnr), I want to encourage these outputs to be sparse through…
1
2 3 4 5