Consider the following excerpt from a paragraph taken from chapter 10: Sequence Modeling: Recurrent and Recursive Nets of the textbook named Deep Learning by Ian Goodfellow et al regarding the advantages of RNN over full traditional MLP.
To go from multilayer networks to recurrent networks, we need to take advantage of one of the early ideas found in machine learning and statistical models of the 1980s: sharing parameters across different parts of a model. Parameter sharing makes it possible to extend and apply the model to examples of different forms(different lengths, here) and generalize across them. If we had separate parameters for each value of the time index, we could not generalize to sequence lengths not seen during training, nor share statistical strength across different sequence lengths and across different positions in time. Such sharing is particularly important when a specific piece of information can occur at multiple positions within the sequence.
The authors used the phrase "statistical strength". Do they mean the strength of RNN in learning the embeddings of a word based on its context rather than its position in input, if it occurs in several inputs? Or does it mean that RNN uses fewer parameters to generalize in a better way compared to a traditional MLP? Or do they mean something else?