Recently I was reading this paper Skeleton Based Action RecognitionUsing Spatio Temporal Graph Convolution. In this paper, the authors claim (below equation (\ref{9})) that we can perform graph convolution with the following formula
$$ \mathbf{f}_{o u t}=\mathbf{\Lambda}^{-\frac{1}{2}}(\mathbf{A}+\mathbf{I}) \mathbf{\Lambda}^{-\frac{1}{2}} \mathbf{f}_{i n} \mathbf{W} \label{9}\tag{9} $$
using the standard 2d convolution with kernels of shape $1 \times \Gamma$ (where $\Gamma$ is defined under equation 6 of the paper), and then multiplying it with the normalised adjacency matrix
$$\mathbf{\Lambda}^{-\frac{1}{2}}(\mathbf{A}+\mathbf{I}) \mathbf{\Lambda}^{-\frac{1}{2}}$$
For the past few days, I was thinking about his claim but I can't find an answer. Does anyone read this paper and can help me to find it out, please?