1

I'm trying to understand if a 3D convolution of the sort performed in a convolutional layer of a CNN is associative. Specifically, is the following true:

$$ X \otimes(W \cdot Q)=(X \otimes W) \cdot Q, $$

where

  • $\otimes$ is a convolution,
  • $X$ is a 3D input to a convolution layer,
  • $W$ is a 4D weights matrix reshaped into 2 dimensions,
  • and $Q$ is a PCA transformation matrix.

To elaborate: say I take my 512 convolutional filters of shape ($3 \times 3 \times 512$), flatten across these three dimensions to give a ($4096 \times 512$) matrix $W$, and perform PCA on that matrix, reducing it to say dimensions of ($4096 \times 400$), before reshaping back into ($400$) 3d filters and performing convolution.

Is this the same as when I convolve $X$ with $W$, and then perform PCA on that output using the same transformation matrix as before?

I know that matrix multiplication is associative i.e. $A(BC)=(AB)C$, and I have found that convolution operations can be rewritten as matrix multiplication.

So my question is, if I rewrite the convolution as matrix multiplication, is it associative with respect to the PCA transformation (another matrix multiplication)?

For example, does $X' \cdot (W' \cdot Q) = (X' \cdot W') \cdot Q$, where $X'$ and $W'$ represent the matrices necessary to compute the convolution in matrix multiplication form?

To try and figure it out, I looked to see how convolutions could be represented as matrix multiplications, since I know matrix multiplications are associative. I've seen a few posts/sites explaining how 2D convolutions can be rewritten as matrix multiplication using Toeplitz matrices (e.g. in this Github repository or this AI SE post), however, I'm having trouble expanding on it for my question.

I've also coded out simple convolutions with a $W$ matrix of $4 \times 3$, an $X$ matrix of $4 \times 2$, and using sklearn's PCA to reduce $W$ to $4 \times 2$. If I do this both ways, the output is not the same, leading me to think this kind of associativity does not exist. But how can I explain this with linear algebra?

Can anyone explain whether this is or is not the case, with a linear algebra explanation?

nbro
  • 39,006
  • 12
  • 98
  • 176
HereItIs
  • 11
  • 2

0 Answers0