I don't know too much about Deep Learning, so my question might be silly. However, I was wondering whether there are NN architectures with some hard constraints on the weights of some layers. For example, let $(W^k_{ij})_{ij}$ be the weights of the (dense) $k$-th layer. Are there architectures where it is imposed something like $$ \sum_{i, j} (W^k_{ij})^2 = 1 $$ (namely the roll-out vector of weights is constrained to stay on a sphere) or $W^k_{ij}$ are equivalence classes $mod K$ for some number $K>0$?
Then, of course, one should probably think about proper activation functions for these cases, but it's probably not a big obstacle.
Putting constraints of these kinds will prevent the weights to grow indefinitely and maybe could prevent over-fitting?