How to prove that a regularisation method simplified a neural network?

Question

There are a few ways to regularise a neural network, for example dropout or the L1. Now, both these methods, and possibly most other regularisation methods, tend to remove from, or simplify the neural network. The Dropout deactivates nodes and the L1 shrinks the weights of the model, and so on.

The main argument in favour of regularising a neural network is that by simplifying the model you are forcing it to learn more general functions and thus making the neural network more robust to overfitting or noisy input.

Once you have a model trained with, and without, regularisation, it is possible to compare their performance by calculating the error metrics on their outputs. This will prove whether the regularised model performs better than the standard model or not.

However, considering that the regularised model achieved better performance on its error metrics, how to prove that the weights of the regularised model have less variance (simpler) than the standard neural network?

This question doesn't have an answer unless you define what you mean by complexity. If by complexity you mean the number of parameters, then both regularization methods you mentioned do nothing for complexity. If by complexity you mean the variance in parameters you might have more luck, but I imagine that dropout wouldn't show any reduced complexity. — Recessive, May 07 '21 at 06:28
@Recessive Thank you I updated my question. But how does dropout not reduce complexity if it deactivates nodes during the training process? Having less nodes seems like a simpler model. — Marcus, May 07 '21 at 06:39
It inconsistently deactivates nodes during training which means you need just as many parameters, and then at testing time it doesn't deactivate them at all — Recessive, May 09 '21 at 10:02
Hello. I still think that this question is not clear enough, because of the part. "how to prove that the weights of the regularised model have less variance (simpler) than the standard neural network?". It seems that you're using "simpler" as a synonym for having less variance, but what do you mean by "less variance" here? You need to clarify this part, because, once we have 2 neural networks, we have 2 specific sets of weights, so the only thing that you can compute is really the variance of the values of these parameters. — nbro, May 11 '21 at 09:45
So, are you looking for a theoretical result that shows that, if using e.g. L2 regularization, your parameters will be limited to a smaller set of values, so will have less variance generally? — nbro, May 11 '21 at 09:46

How to prove that a regularisation method simplified a neural network?

0 Answers0