I thought L2 regularization decrease a length of weight vector by almost removing many weights. So, it would result that output neurons' activations are sparse and then less effected of co-adaptation like using dropout. I saw somewhere L1/L2 regularization does not prevent co-adaptation. Why??
Asked
Active
Viewed 27 times