Let's consider a deep convolutional network. It seems that there is some consensus on the following notions:
1. Shallow layers tend to recognise more low-level features such as edges and curves.
2. Deeper layers tend to recognise more high-level features (whatever this means).
While I usually come across various online articles and blogs that state this, no one ever cites literature that supports this claim. I am not seeking the question as to why this phenomenon happens, I'm only seeking whether it has actually been experimented on and documented. Also, I am barely able to find any peer-reviewed literature that provides evidence of this on sites such as Google Scholar or ResearchGate.
Could anyone point me to the right direction?