3

I need to cluster my points into unknown number of clusters, given the minimal Euclidean distance R between the two clusters. Any two clusters that are closer than this minimal distance should be merged and treated as one.

I could implement a loop starting from the two clusters and going up until I observe the pair of clusters that are closer to each other than my minimal distance. The upper boundary of the loop is the number of points we need to cluster.

Are there any well known algorithms and approaches estimate the approximate number of centroids from the set of points and required minimal distance between centroids?

I am currently using FAISS under Python, but with the right idea I could also implement in C myself.

h22
  • 145
  • 7

2 Answers2

3

Yes, the silhouette method (which is implemented in sklearn as silhouette_score) is commonly used to assess the quality of clusters produced by any clustering algorithm (including $k$-means or any hierarchical clustering algorithm). Roughly, you can compute the silhouette value for different $k$, then you would pick the $k$ with the highest silhouette value.

nbro
  • 39,006
  • 12
  • 98
  • 176
1

If you look at Kaufman & Rousseeuw (1990), Finding Groups in Data, they describe an algorithm to evaluate the quality of clusters in agglomerative clustering. You run the clustering algorithm with a specific value k for the number of clusters you want, and that routine then gives you a score to reflect the cohesion of the clustering. If you then cluster again with a different value for k, you will get another score. You repeat this process until you have found a maximum score, and then you have the clustering with the optimum number of clusters.

Oliver Mason
  • 5,322
  • 12
  • 32