I am working on a modified version of the triplet loss function introduced with SBERT, where instead of the Euclidean distance we use the cosine similarity. The formula to minimize is max( (|s_a*s_p| / |s_a|*|s_p|) - (|s_a*s_n| / |s_a|*|s_n|) + e, 0)
where s_a
is the embedding of the anchor sentences (the context), s_p
is the embedding of the positive sentence (correct continuation) and s_n
is the embedding of the negative sentence (wrong continuation).
I would like to check that the function I came up with makes sense from a theoretical point of view. Where should I look to check which features a loss function should satisfy?
Motivation for the question: I'm getting my hands dirty with contrastive loss functions, and this is an easy variation I came up with.