0

I have a little perplexity about the NT-Xent loss employed in self-supervised contrastive learning.

enter image description here

What we are essentially doing is maximizing the similarity of pairs of augmented images while minimzing the similarity with all the other instances in a single batch. if $j$ is an index corresponding to positive example for the index $i$, then I don't understand why it gets included in the denominator. Shouldn't we consider the indicator function $\textbf{1}_{\{k \neq i\} \cap \{k \neq j\}}$ instead?

James Arten
  • 297
  • 1
  • 8

0 Answers0