2

Recently I was reading this paper Skeleton Based Action RecognitionUsing Spatio Temporal Graph Convolution. In this paper, the authors claim (below equation (\ref{9})) that we can perform graph convolution with the following formula

$$ \mathbf{f}_{o u t}=\mathbf{\Lambda}^{-\frac{1}{2}}(\mathbf{A}+\mathbf{I}) \mathbf{\Lambda}^{-\frac{1}{2}} \mathbf{f}_{i n} \mathbf{W} \label{9}\tag{9} $$

using the standard 2d convolution with kernels of shape $1 \times \Gamma$ (where $\Gamma$ is defined under equation 6 of the paper), and then multiplying it with the normalised adjacency matrix

$$\mathbf{\Lambda}^{-\frac{1}{2}}(\mathbf{A}+\mathbf{I}) \mathbf{\Lambda}^{-\frac{1}{2}}$$

For the past few days, I was thinking about his claim but I can't find an answer. Does anyone read this paper and can help me to find it out, please?

nbro
  • 39,006
  • 12
  • 98
  • 176
Swakshar Deb
  • 673
  • 4
  • 12
  • Isn't this a duplicate of https://ai.stackexchange.com/q/22650/2444? Or are you interested only in the 1d convolution? If yes, maybe read my answer https://ai.stackexchange.com/a/21824/2444. I talk about 1d convolutions there, but I don't remember anymore the level of detail I used. – nbro Jul 24 '20 at 19:58
  • I am interested in (1xD) convolution specifically. But can you please explain your answer? – Swakshar Deb Jul 24 '20 at 20:01
  • Have you read my answer: https://ai.stackexchange.com/a/21824/2444? (Btw, I edited the comment above to include a link to this answer). I don't know if it answers your question because I don't remember what I wrote there, but I think it should give you some intuition behind 1d convolutions, as far as I remember :P – nbro Jul 24 '20 at 20:02
  • Yes, I go through the answer you gave previously. I think you are more focused on FCN and segmentation but I am trying to know can 1xD standard convolution and then multiplying it with adjacency matrix is the same as graph convolution in the skeleton based graphs? – Swakshar Deb Jul 24 '20 at 20:07
  • It's been a while since I read something about graph NNs, but can you point me exactly to the part of the paper where the author claims that, so that I have some context? – nbro Jul 24 '20 at 20:14
  • obviously, section 3.6, after equation 9. – Swakshar Deb Jul 24 '20 at 20:17
  • So, if I understand correctly, they are performing a 2d convolution between $\mathbf{f}_{i n}$ and $\mathbf{W}$ with kernels of shape $1 \times \Gamma$ to implement the multiplication $\mathbf{f}_{i n} \mathbf{W}$. What is $\mathbf{f}_{i n}$? The features at the nodes? I think so because they say "_In practice, under the spatial temporal cases, we can represent the input feature map as a tensor of $(C, V, T)$ dimensions_" – nbro Jul 24 '20 at 20:30
  • Yes, you're right. $f_{in}$ is the feature vector of the nodes. – Swakshar Deb Jul 24 '20 at 20:33
  • I can't think about it right now (cause I don't have much time), but, hopefully, this discussion and clarification was somehow useful. Maybe the next step to understand is what are $C$, $V$ and $T$ (if you haven't already) and why the feature map can be represented as a 3d tensor of dimensions $(C, V, T)$. If I remember correctly, algorithms that apply the so-called "graph convolution" are actually _message-passing_ algorithms, where you "flow the information between nodes" for some time steps, and that should explain the parameter $\Gamma$. – nbro Jul 24 '20 at 20:36
  • It;s okay. $C$ = coordinate of a perticular joint(x,y,z), $V$ = vertex(in this case joints), $T$ = Timestep. – Swakshar Deb Jul 24 '20 at 20:39
  • Ok. I just want to clarify one thing. They are apparently not talking about 1d convolutions (as I initially thought from your explanation), but 2d convolutions with kernels of shape $1 \times \Gamma$, i.e. rather than having the usual $3 \times 3$ (or $5 \times 5$), you have a $1 \times \Gamma$ kernel. If they really apply a 2d convolution, then the actual kernel will have the same $1 \times \Gamma \times N$, where $N$ is the depth of the feature map, which should be $T$, from what I understand, so the kernel should be $1 \times \Gamma \times T$. – nbro Jul 24 '20 at 20:43
  • Then you apply (i.e. convolve) this kernel $1 \times \Gamma \times T$ to the spatial dimensions of the input, where the spatial dimensions (height and width, or vice-versa) should be $C$ and $V$. I don't know which values $\Gamma$ is supposed to take, but you may want to try to solve values, like $3$, and see why the graph convolution would be equivalent to this 2d convolution with kernel $1 \times 3 \times T$. (I hope I didn't confuse you more). – nbro Jul 24 '20 at 20:44
  • yes, you are right. I initally talk about 1D convolution since in their code $\Gamma$ parameter is set to 1 – Swakshar Deb Jul 24 '20 at 20:48
  • I wrote "kernel will have the same", but I meant "kernel will have the _shape_" and I wrote "want to try to solve values, like 3", but I meant "want to try to solve this with values like $\Gamma = 3$". – nbro Jul 24 '20 at 20:51
  • Ha, so they set $\Gamma = 1$. I see that you wrote that only now. Well, in that case, you're really applying a $1\times 1 \times T$ kernel to the input feature map (i.e. 1d convolution). The application of a $1 \times 1$ kernel to the input produces a tensor with the same spatial dimensions as the input feature map. So, if you apply only one $1 \times 1 \times T$ kernel to an input feature map of shape $(C, V, T)$, then you will get an output feature map of shape $(C, V)$. – nbro Jul 24 '20 at 20:59
  • What are the dimensions of the normalized adjacency matrix, $\mathbf{\Lambda}^{-\frac{1}{2}}(\mathbf{A}+\mathbf{I}) \mathbf{\Lambda}^{-\frac{1}{2}}$? – nbro Jul 24 '20 at 21:02
  • After the convolution operation, they got 4-dimensional tensor, where extra 2 dimensions are batch size and number of filter. The normalized adjacency matrix is a 3 dimensional tensor – Swakshar Deb Jul 24 '20 at 21:29
  • What are the 3rd and 4th dimensions? – nbro Jul 24 '20 at 21:46
  • After convolution the dimension of the feature map is $(N,C,V,F)$ where $N$=no of batch,$F$ = number of filters and $C,V$ are cordiality and joints. The dimension of the normalized adjacency matrix is $(K,V,V)$ where $V$= no of joints, I am not sure about $K$. – Swakshar Deb Jul 24 '20 at 22:05
  • Then that's consistent with what I said, and it somehow seems to be consistent with what is written below that equation 9, i.e. "_and multiplies the resulting tensor with the normalized adjacency matrix **on the second dimension**_" – nbro Jul 24 '20 at 22:08
  • I think $K=1$, so the dimension of the adjacency matrix is $(1,V.V)$ – Swakshar Deb Jul 24 '20 at 22:17
  • Ha, wait, I misinterpreted your last comment or you changed it. A tensor of shape $(N, C, V, F)$ needs to be multiplied with a tensor of shape $(K, V, V)$, on the second dimension. That looks problematic, actually, because the dimensions don't seem to match for matrix multiplication. – nbro Jul 24 '20 at 22:20
  • My bad input tensor will be $(N,F,C,V)$. They are doing the multiplication like this: torch.einsum('nkctv,kvw->nctw', (x, A)) – Swakshar Deb Jul 24 '20 at 22:25
  • I am not familiar with that function (although I once used PyTorch). I would need to check the documentation, but I don't have time now. – nbro Jul 26 '20 at 15:26
  • Check this paper out - Graph Classification with 2D Convolutional Neural Networks This addresses your issue. https://arxiv.org/abs/1708.02218 – Arpit Maclay Nov 01 '20 at 01:29

0 Answers0