Questions tagged [clustering]

For questions related to clustering (a usual unsupervised learning technique).

63 questions
4
votes
1 answer

How to define machine learning to cover clustering, classification, and regression?

How to define machine learning to cover clustering, classification, and regression? What unites these problems?
4
votes
2 answers

Which metric should I use to assess the quality of the clusters?

I have a model that outputs a latent N-dimensional embedding for all data points, trained in a way that clusters data-points from the same class together, while being separated from other clusters belonging to other different classes. The…
jaeger6
  • 308
  • 1
  • 7
3
votes
2 answers

What techniques to explore for dynamic clustering of documents (emails)?

I have a dataset of unlabelled emails that fall into distinct categories (around a dozen). I want to be able to classify them along with new ones to come in the future in a dynamic matter. I know that there are dynamic clustering techniques that…
3
votes
1 answer

Is it normal that SOM clusters the instances with the "versicolor" class into multiple different BMUs?

I have trained (with different sizes, learning rates, and epochs) a SOM network to cluster the Iris dataset. The instances associated with the class setosa have been mainly fitted to a 1-2 BMUs. In the case of virginica, the instances have also be…
3
votes
0 answers

What is meant by subspace clustering in MFA?

The basic idea of MFA is to perform subspace clustering by assuming the covariance structure for each component of the form, $\Sigma_i = \Lambda_i \Lambda_i^T + \Psi_i$, where $\Lambda_i \in \mathbb{R}^{D\times d}$, is the factor loadings matrix…
stoic-santiago
  • 1,121
  • 5
  • 18
3
votes
6 answers

How can I cluster this data frame with several features and observations?

How can I cluster the data frame below with several features and observations? And how would I go about determining the quality of those clusters? Is k-NN appropriate for this? id Name Gender Dob Age Address 1 MUHAMMAD JALIL …
3
votes
0 answers

Neural network to extract correlated columns

I want to use a neural network to find correlated columns in a .csv file and give them as a output. The input .csv file has multiple columns with 0 and 1 ( like Booleans) in it. The file got the assignment from people to interests in it. Example…
3
votes
2 answers

How to compute the number of centroids for K-means clustering algorithm given minimal distance?

I need to cluster my points into unknown number of clusters, given the minimal Euclidean distance R between the two clusters. Any two clusters that are closer than this minimal distance should be merged and treated as one. I could implement a loop…
h22
  • 145
  • 7
3
votes
1 answer

What is graph clustering?

There are several (family of) algorithms that can be used to cluster a set of $d$-dimensional points: for example, k-means, k-medoids, hierarchical clustering (agglomerative or divisive). What is graph-based clustering? Are we clustering the nodes…
nbro
  • 39,006
  • 12
  • 98
  • 176
2
votes
1 answer

How do we know the classification boundaries of the data?

Consider an image classification problem. Conceptually, we then have some high dimensional space where all the images can be represented as points, and having large enough labeled data set we can build a classifier. But how do we know that our data…
2
votes
1 answer

How to tackle the human error made in labeling datasets for classification tasks like facial expression recognition?

I am working on the Facial Expression Recognition Task. One of the most challenging tasks that I faced was human error in labeling the datasets (ex: let's say FER2013). Are there anyways to Handle incorrect labeling of datasets in the classification…
2
votes
1 answer

What clustering algorithms work best for datasets with only binary categorical features?

I have a dataset with a lot of binary categorical features and a single continuous target value. I would like to cluster them, but I am not quite sure what to use. In the past, I have used DBSCAN for something similar and it worked well, but that…
2
votes
1 answer

Perform clustering on high dimensional data

Recently I trained a BYOL model on a set of images to learn an embedding space where similar vectors are close by. The performance was fantastic when I performed approximate K-nearest neighbours search. Now the next task, where I am facing a problem…
2
votes
0 answers

What would be a reasonable option for clustering for unknown number of clusters and a lot of outliers?

I am implementing the CV detection pipeline with the use of SIFT and KNN Matcher. Image keypoints matched to the query keypoints produce the following image: The matched objects have a lot of key points on them and there are some false matches. I…
2
votes
1 answer

Is there a clustering algorithm that can make n clusters and the n+1 "others" cluster?

As far as I know all clustering algorithms assume that all delivered data points have to find its cluster. My question is, is there an algorithm that could focus only on n clusters (number stated by user) and try to dismiss the rest of the points…
GKozinski
  • 1,240
  • 8
  • 19
1
2 3 4 5