4

I have a model that outputs a latent N-dimensional embedding for all data points, trained in a way that clusters data-points from the same class together, while being separated from other clusters belonging to other different classes.

The N-dimensional embedding is projected down to 2D using UMAP. At each epoch, I wish to test the clustering capability of the model on these 2D projections for use as validation accuracy. I have the labels for each class.

How should I proceed?

UMAP Projection

nbro
  • 39,006
  • 12
  • 98
  • 176
jaeger6
  • 308
  • 1
  • 7

2 Answers2

2

You can compute Silhouette Coefficient for your aim. Its values mean:

1: Means clusters are well apart from each other and clearly distinguished.

0: Means clusters are indifferent, or we can say that the distance between clusters is not significant.

-1: Means clusters are assigned in the wrong way.

Other measures, such as purity and mutual information, are also possible by computing

an external criterion that evaluates how well the clustering matches the gold standard classes

nbro
  • 39,006
  • 12
  • 98
  • 176
OmG
  • 1,731
  • 10
  • 19
1

One more popular metric for this is the Davies Bouldin Score.

You can also take a look at the clustering metrics in scikit documentation.

nbro
  • 39,006
  • 12
  • 98
  • 176
Abhishek Verma
  • 858
  • 3
  • 6