I'm reading the ImageNet Classification with Deep Convolutional Neural Networks paper by Krizhevsky et al, and came across these lines in the Intro paragraph:
Their (convolutional neural networks') capacity can be controlled by varying their depth and breadth, and they also make strong and mostly correct assumptions about the nature of images (namely, stationarity of statistics and locality of pixel dependencies). Thus, compared to standard feedforward neural networks with similarly-sized layers, CNNs have much fewer connections and parameters and so they are easier to train, while their theoretically-best performance is likely to be only slightly worse.
What's meant by "stationarity of statistics" and "locality of pixel dependencies"? Also, what's the basis of saying that CNN's theoretically best performance is only slightly worse than that of feedforward NN?