1

According to my lecture, Fuzzy c-Means tries to minimize the following objective function:

$$J(X,B,U)=\sum_{i=1}^c\sum_{j=1}^n u_{ij}^w \, d^2(\vec{\beta_i},\vec{x_j})$$

where $X$ are the data points, $B$ are the cluster-'prototypes', and $U$ is the matrix containing the fuzzy membership degrees. $d$ is a distance measure.

A constraint is that the membership degrees for a single datapoint w.r.t. all clusters sum to $1$: $\sum_{j=1}^n\, u_{ij}=1$.

Now in the first equation, what is the role of the $w$? I read that one could use any convex function instead of $(\cdot)^w$. But why use anything at all. Why don't we just use the membership degrees? My lecture says using the fuzzifier is necessary but doesn't explain why.

nbro
  • 39,006
  • 12
  • 98
  • 176

1 Answers1

1

Its not required, you can have $m=1$, actually it can be any number $\geq 1$.

Now the better question is why to have it? The answer is that it adds a smoothing effect. Lets look at it in each of the limits ($\lim m \rightarrow 1$ and $\lim m \rightarrow \infty$)

Towards $\infty$, it makes $u_{ij}$ equal to $\frac{1}{c}$, making each point have equal membership of each class regardless of $m$. From the optimization perspective, its saying how can we achieve finding clusters that are closest to all points, therefore by definition it has already achieved that, and so the Loss will always be 0. (at its global minimum)

Now in the other limit, the constants are inversely proportional to the square of the normalized euclidean distance. This makes intuitive sense, the membership is high if they are close, and the membership is low if they are not (relatively)

So why do we have the $m$, its for control. It allows us to choose and experiment with how heavy each distance should hold weight in the membership. An example where a larger $m$ may be useful is when the data isnt clean, and you dont want to rely so heavily on euclidean distance as the membership, so you forcibly add in a smoothing effect

mshlis
  • 2,349
  • 7
  • 23
  • Assume I have two clusters, then for a point $x_1$ the $u_{ij}$ must sum to $1$. So I could have for example $u_{11}=0.8$ and $u_{12}=0.2$. If I now let $m\rightarrow\infty$, then $0.8^\infty=0$ and $0.2^\infty=0$, not $\frac{1}{2}$. How did you come up with the $\frac{1}{C}$? – user9007131 Jul 23 '19 at 19:46
  • The u’s are a function of m as well – mshlis Jul 23 '19 at 20:11
  • I don't understand. It's still $0.8^m \rightarrow 0$ and not $0.8^m\rightarrow\frac{1}{2}$ for $m\rightarrow\infty$ – user9007131 Jul 24 '19 at 17:29
  • the value is still 0, the membership though is $\frac{1}{2}$... If $m \rightarrow \infty$ the problem is meaningless because the membership is no longer a function of any of the cetroids but is gauranteed to be equally associated with each – mshlis Jul 24 '19 at 17:30
  • i adjusted my answer slightly to reflect that – mshlis Jul 24 '19 at 17:32