Should I expect the FaceNet to learn to group faces that look different, but in a (probably) predictable way?

Question

A FaceNet learns to cluster images containing the same face together. I want to use a pre-trained FaceNet that was trained to do this, to now learn to cluster faces together, thus clustering clusters of images. More specifically, now that the network knows how to cluster images of the same face together, I want it to now pick up on the characteristics in each face to cluster the clusters. I have a dataset of faces it should cluster together and I don't think getting the distance of the faces in the latent space from the network trained only on facial similarity would work.

If I create a triplet loss function that finds the mean of the square euclidean distance between each point in 2 clusters (as in torch.cdist(x, y).square().mean()), given enough faces in the batch, the distance consistently tends towards 1 (which is also the margin). I think it because the network is not penalized as much for it's inconsistencies because they are softened when the distances are averaged.

Fast Approximated Triplet Loss proposes a solution that essentially draws a circle around the embeddings of the positive and negative images in the latent space and measures the distance between the anchor and their centres.

Either way, given that a FaceNet is mostly made of convolutional layers that tend to summarise patterns in the data, would it even work to expect it to learn to group faces that look different, but in a (probably) predictable way?

Will I need to add fully-connected layers at the end to process the extracted features into a new embedding? Would the ReLU layers in the backbone of the FaceNet (that being an Inception-ResNet) be enough to learn with?

Should I expect the FaceNet to learn to group faces that look different, but in a (probably) predictable way?

0 Answers0