4

Let's consider a deep convolutional network. It seems that there is some consensus on the following notions:

1. Shallow layers tend to recognise more low-level features such as edges and curves.

2. Deeper layers tend to recognise more high-level features (whatever this means).

While I usually come across various online articles and blogs that state this, no one ever cites literature that supports this claim. I am not seeking the question as to why this phenomenon happens, I'm only seeking whether it has actually been experimented on and documented. Also, I am barely able to find any peer-reviewed literature that provides evidence of this on sites such as Google Scholar or ResearchGate.

Could anyone point me to the right direction?

nbro
  • 39,006
  • 12
  • 98
  • 176
mesllo
  • 141
  • 2
  • 1
    https://arxiv.org/abs/2106.14587 is the most exhausting, most fundamental work on this - it connects the architecture of NN with neural manifolds (manifolds of activities) and with categories of theories and languages. All is one system, layers (graphs, neural manifolds, languages-theories) are category theoretic fibrations. Just define learnability meatrics on this structurue and this can solve the problem of finding the optimal architecutre for the task. – TomR Aug 22 '22 at 07:45

2 Answers2

1

It is assumed that NNs build up a hierarchical representation, whereby each layer combines features from the lower-level layers. The layers could be understood as representing a cascade of stacked features:

edges -> texture -> patterns -> parts -> objects

So from lower-level patterns to the more abstract higher-level concept like representation. This Distill article as far as I can tell is one of the most cited sources (740 citations) and provides an in-depth explanation of the features and how to visualize them. The journal is peer-reviewed.

The post also points to some older references such as: this, this or this. The website of Chris Olah one of the authors of the Distill article is also a great source for finding visualizations for different deep learning architectures.

Mariusmarten
  • 383
  • 1
  • 16
0

You won’t find literature on this point because it’s true by definition. Low-level features are simple statistics of the raw input. High-level features are statistics of lower-level features. In a convolutional (or any feedforward) network the shallowest layers compute statistics directly on the input, so they create the lowest-level features. Deeper layers operate on the features of shallower layers, so they create higher-level features.

As an example, edges might be computed at the lowest/first convolutional layer, then corners at the second layer by looking for two perpendicular edges. The highest level may detect, say, whole faces, which is a complex, aka high-order, statistic.

If you want a visual example of what ‘low-level’ and ‘high-level’ features look like in a convolutional network, check out Google’s Deep Dream Generator, which emphasizes what the different layers ‘see’.

C deeply
  • 1
  • 2
  • This does not provide an answer to the question. Once you have sufficient [reputation](https://ai.stackexchange.com/help/whats-reputation) you will be able to [comment on any post](https://ai.stackexchange.com/help/privileges/comment); instead, [provide answers that don't require clarification from the asker](https://meta.stackexchange.com/questions/214173/why-do-i-need-50-reputation-to-comment-what-can-i-do-instead). - [From Review](/review/late-answers/22169) – Rob Aug 21 '22 at 15:01