I am working on dynamical systems using Optimal Control theory and trying to find the connection between this field and Machine Learning. Consider a simple 2-layer Neural Network (NN) where the activation function is considered as $y = x+x^2 $ (I intentionally ignored bias term and supposed this activation function only for illustration purpose). So, if the output of the first layer is calculated as $y_1=w_ix_i+w_i^2x_i^2$, then the input to output mathematical relation can be given as:
$$y_o=w_o(w_1(w_ix_i+w_i^2x_i^2)+w_1^2(w_ix_i+w_i^2x_i^2)^2);$$
By expanding the terms, it results in:
$$y_o=w_ow_1^2x_i+w_o(w_1^3+w_1^4)x_i^2+2w_ow_1^5x_i^3+w_ow_1^6x_i^4.$$
But we know that there could be four parameters $\alpha_1,\alpha_2,\alpha_3,\alpha_4$ as
$$y_o=\alpha_1x_i+\alpha_2x_i^2+\alpha_3x_i^3+\alpha_4x_i^4$$
in the input to output relation, but the NN structure suggests only three parameters $w_i,w_1,w_o$ to be trained. So, the degrees of freedom is less than the actual potential of the given structure.
Will this lead to any inefficiency in NN model? Could you please suggest some references which mathematically studies the NN structure?