3

I don't know too much about Deep Learning, so my question might be silly. However, I was wondering whether there are NN architectures with some hard constraints on the weights of some layers. For example, let $(W^k_{ij})_{ij}$ be the weights of the (dense) $k$-th layer. Are there architectures where it is imposed something like $$ \sum_{i, j} (W^k_{ij})^2 = 1 $$ (namely the roll-out vector of weights is constrained to stay on a sphere) or $W^k_{ij}$ are equivalence classes $mod K$ for some number $K>0$?

Then, of course, one should probably think about proper activation functions for these cases, but it's probably not a big obstacle.

Putting constraints of these kinds will prevent the weights to grow indefinitely and maybe could prevent over-fitting?

nbro
  • 39,006
  • 12
  • 98
  • 176
Onil90
  • 173
  • 5
  • 1
    Hello. Welcome to Artificial Intelligence Stack Exchange. I put what I think is your main specific question in the title. Make sure that's correct. Having said that, you tagged your post with [tag:weight-normalization], so I suppose you're aware of those techniques. Right? – nbro Nov 25 '21 at 22:00
  • Hi @nbro and thanks for the correction! Not really! I used that tag because it was recommended when I start typing "weight" – Onil90 Nov 26 '21 at 07:07

0 Answers0