Aren't all discrete convolutions (not just 2D) linear transforms?

Question

The image above, a screenshot from this article, describes discrete 2D convolutions as linear transforms. The idea used, as far as I understand, is to represent the 2 dimensional $n$x$n$ input grid as a vector of $n^2$ length, and the $m$x$m$ output grid as a vector of $m^2$ length. I don't see why this can't be generalised to higher-dimensional convolutions, since a transformation matrix can be constructed from one vector to another, no matter the length (right?)

My question: Aren't all discrete convolutions (not just 2D) linear transforms?

Are there cases where such a transformation matrix cannot be found?

Are you sure you want to talk about continuous case? Continuous operations cannot be denoted by matrices and thus non-linear by your definition but linear in the world's definition. — , Mar 30 '20 at 14:40
What do you mean by "higher-dimensional convolutions"? Can you provide a link to a definition of them? — nbro, Mar 30 '20 at 18:16
@nbro I would think of higher dimensional convolutional as starting with a 3D grid of input, say $n$x$n$x$n$, and mapping it to a smaller 3D grid using a kernel - same as in the case of 2D grids. This may generalise to higher dimensions, even. — stoic-santiago, Mar 31 '20 at 03:48
@jughead.ai well I had edited the question to discrete convolutions but you edited it back to convolution. Convolution is a very broad term and it's properties can't be fitted in an answer. — , Mar 31 '20 at 07:08
I will suggest you to ask this question in SignalProcessing.SE, my answer will probably not be the entire picture. But to give you an overview, the convolution of CNN is not actually convolutions used in digital signal processing. In digital signal processing convolutions as the name digital suggest will always be matrix operation (any op. in a computer is matrix op.). But can it be called linear or not is a different matter altogether. There are also different types of discrete convolution. Only a person with solid DSP background can provide your answer, hence you should ask this on SP.SE. — , Mar 31 '20 at 11:52
Also definition of linearity is also a bit different. What you mean by linearity is it's a 1st degree polynomial, what we mean by linearity when it comes to convolution is that it follows f(a+b)=f(a)+f(b). Surprisingly an equation like a.x +b is not linear according to this definition (where x is the variable). We do convolution to find the output of a system which is Linear Time Invariant and hence as you can see convolution is a linear operation (it can be proven). — , Mar 31 '20 at 12:02
Convolution has a strict mathematical definition in signal processing, which results in interesting properties when convolution is applied to the frequency domain of a signal. So what I am trying to saying is that actual convolution is a lot different than convolution defined in CNN. It's an abuse of term to use it in CNN. — , Mar 31 '20 at 12:03
Just a question, what do you mean by 'Are there cases where such a transformation matrix cannot be found?' — , Apr 10 '20 at 01:10
Hi! I have thoroughly updated the answer and removed many useless junk, let me know if you still have doubts it the explanations are insufficient. — , Apr 13 '20 at 01:01

score 5 · Accepted Answer · 2020-04-13T01:28:41.493

Convolution is a pretty misused term in recent times with the advent of CNN.

Short answer, Convolution is a linear operator (check here) but what you are defining in context of CNN is not convolution, it is cross-correlation which is also linear in case of images (dot product).

Convolution:

Computers work only on discrete values, so the discrete time convolution is defined similarly as: $$ y[n] = \sum_{k=\infty}^{\infty} x[k] h[n-k] $$

which has the nice property of

$$Y(e^{j\omega}) = X(e^{j \omega})H(e^{j \omega})$$

where each $A(e^{j\omega})$ is the Fourier Transform of $a(t)$ So this is the basic idea of discrete convolution. The point to illustrate here is that convolution operation has some nice properties and very helpful.

Now there are 2 main types of convolution (as far as I know). One is linear convolution another is circular convolution. In terms of $2D$ signal they are defined in the following ways: $$y[m,n] = \sum_{j=-\infty}^{\infty}\sum_{k=-\infty}^{\infty} x[i,j]h[m-i,n-j]$$

Circular convolution is the same except that the input sequence is finite from $0$ to $N-1$ giving rise to periodicity in frequency domain. The convolution operation is a little bit different due to this periodicity.

So these are the actual definitions of convolution. It has linearity property (clearly obvious since it is used to calculate output of LTI system), and it also has the linearity in terms of matrix multiplications, since we want to do the computer to do these calculations for us. I don't know a lot but there are many clever manipulations e.g the FFT algorithm an indispensable tool in signal processing (used to convert signals to frequency domain for a certain sampling rate). Like this you can define convolutions in terms of Hermitian matrices and stuff (only if you want to process in the $n$ domain it is much easier to process in the frequency domain).

For example a $1D$ circular convolution between 2 signals in $n$ domain defined in matrix form, can be written as follows $y_n=h_n*x_n$: where $$y_t = \begin{bmatrix} y_0 \\ y_1 \\ y_2 \\ y_3 \end{bmatrix} = \begin{bmatrix} h_0 & h_3 & h_2 & h_1\\ h_1 & h_0 & h_3 & h_2\\ h_2 & h_1 & h_0 & h_3 \\ h_3 & h_2 & h_1 & h_0 \end{bmatrix} \begin{bmatrix} x_0 \\ x_1 \\ x_2 \\ x_3 \end{bmatrix}$$

The same can be done quite easily by converting the functions into frequency domain, multiplying and converting back to time domain.

When we move to $2D$ domain similar equation is formed except (in $n$ domain)instead of $h_0$ we will have a Hermitian matrix $H_0$ and we will perform the Kronecker product (I don't know the justification or proofs of these equations, it probably satisfies the convolution theorem and is fast when ran on computers). Again much easier to do in frequency domain.

When we move to multiple dimensions, it is called Multidimensional discrete convolution as per this Wikipedia article. As the article suggests the property where everything is in frequency domain. $$Y(k_1,k_2,...,k_M)=H(k_1,k_2,...,k_M)X(k_1,k_2,...,k_M)$$ still holds good. When we do convolutions in the $n$ domain things get tricky and we have to do clever matrix manipulations as shown in the example above. Whereas, as stated above things get much easier in the frequency domain.

Its counter-intuitive that a picture has a frequency domain, and in general its actually not frequency. BUT in DSP we use a lot of filters whose math properties are similar to those of filters in the traditional sense of frequency and hence has the same calculations as in a frequency setting as shown in the 1st example of $1D$ signal.

The point convolution by definition is a linear operation. Check here if the explanations are unconvincing. There are Linear Time Varying systems and its output maybe determined by convolution, but then it uses the equation at a certain point of time i.e: $$Y(e^{j\omega}, t) = H(e^{j\omega}, t)X(e^{j\omega} )$$.

Now whether it can be represented by matrix products in $n$ domain or not I cannot say. But generalising, it should be possible, only it will involve increasingly complex matrix properties and operations.

Cross Correlation:

Cross correlation is nothing similar to convolution, it has a bunch of interesting properties on its own, hence I do not like the two being mixed together. In signal processing it is mainly related to finding energy (auto-correlation) and has other applications in communication channels.

While in statistics, it is related to finding correlation (misused term since it can take any value, but correlation coefficient takes value between -1 to 1) between 2 stochastic processes, at different time instants, where we have multiple samplings of the signal or ensembles of the signal based on which the expectation is taken (See here). In CNNs maybe a way to see this would be, the image dataset is a ensemble of stochastic process, while the filter is another process which is fixed. And moving the filter over the image is time delay.

In the context of digital image processing cross correlation is just a dot product. And hence it can be represented by matrices. Whether cross correlation is linear or not is difficult to tell (at least I don't know), but in the context of image processing it seems linear. Bottom-line is that it can be definitely implemented by matrices as it is a dot product, whether it is linear in true mathematical sense is doubtful.

The more intuitive way in context of CNNs would be to view filters as just a bunch of shared weights for better regularisation instead of cross correlation or convolution.

I would suggest you ask this question in SignalProcessing.SE. They will provide better insights than this answer. — , Apr 07 '20 at 03:15
I think it's inaccurate or misleading to say that convolution neural networks are not doing a convolution. You can say that they are doing cross-correlation or whatever. Actually, it doesn't really matter whether you say CNNs are doing convolution or cross-correlation because the kernels are learned! They are **actually** doing a convolution-like operation, so the term isn't really a misnomer, in my opinion. — nbro, Apr 10 '20 at 01:00
@nbro I have a strict definition for convolution due to its properties. I think it does , since there is not really anything to answer. Not sure what OP means by 'Are there cases where such a transformation matrix cannot be found?'. i'll clarify. — , Apr 10 '20 at 01:05
If there's a question, there's something to answer. Also, given that you like strict definitions, then maybe you shouldn't say that CNNs use a "bunch of shared weights for better regularisation". That makes only sense if you think of CNNs in terms of "neurons", which is, IMHO, less appropriate (although not completely wrong) and intuitive than thinking of them as a "bunch of cross-correlations/convolutions with learnable kernels". — nbro, Apr 10 '20 at 01:15
@nbro ummm..yes...what other sense is there to make? So convolution is a linear operation, now whether it can be represented as a matrix is a different story. I could have written its a linear operation end of story. But when OP says convolution which is not actually convolution, I have to show why it's not. It doesn't appeal to my intuition, I have had tried to make sense for a long time of CNNs and had posted a question similar to this a long time back asking whether feedforward NNs are also special cases of CNN. So the shared weights explanation makes the best sense to me. — , Apr 10 '20 at 01:18
@nbro Also I have mentioned in multiple comments this is much more suitable for DSP.SE. There are like entire 100s of books written regarding theory and applications of convolution its different types and use of each. only a person with huge amount of experience can answer the question in a truly concise manner. I have just tried to string together my knowledge to form a coherent answer. — , Apr 10 '20 at 01:32

score 1 · Answer 2 · answered Mar 31 '20 at 06:33

The convolutions are linear transformations. However in typical applications a non linear activation function like RELU is used following the convolution to provide non-linearity otherwise a convolutional neural network would just be a net linear transformation.

Aren't all discrete convolutions (not just 2D) linear transforms?

2 Answers2