0

I thought L2 regularization decrease a length of weight vector by almost removing many weights. So, it would result that output neurons' activations are sparse and then less effected of co-adaptation like using dropout. I saw somewhere L1/L2 regularization does not prevent co-adaptation. Why??

김동완
  • 1
  • 1

0 Answers0