1

I am trying to implement this paper for unsupervised video anomaly detection.

The gist of the paper seems to be:

  • Create a dataset for an unsupervised setting, by mixing up the train and anomalous videos (section 4)
  • Divide each video into $p(=16)$ segments each, so for video $i$ we have the segments $s_{ij}$, $j= 1,2...p$.
  • Using a Feature Extractor (ResNext) ,compute a '$d$' dimensional feature vector $f_{ij}$ for each $s_{ij}$. (Section 3.1)
  • Use this to (pre)train the Generator (a simple Autoencoder). Once G is (pre)trained, it is used to generate pseudo labels. The pseudo labels from G are then used to (pre)train the Discriminator (D). (section 3.3)
  • Pretrained G and D are then put into a collaborative loop, where pseudo labels from D are used to improve G and likewise.

I am having trouble with generating the pseudo labels using G. The paper clearly mentions in section 3.3: "Once G is pretrained, it is used to generate pseudo labels".

So does the pseudo label generation come into play once training is done? Like after the model is ready, we pass the dataset again and keep note of the reconstruction losses, and generate labels accordingly?

Or are they generated during training?

(Section 3.2.2 mentions the pseudo label generation and section 4 mentions the threshold selection).

satan 29
  • 141
  • 3

0 Answers0