Introduce

A cluster refers to a collection of data points aggregated together because of certain similarities.

Fuzzy c means

Fuzzy c-means (FCM) is a method of clustering which allows one piece of data to belong to two or more clusters. This method (developed by Dunn in 1973 and improved by Bezdek in 1981) is frequently used in pattern recognition. It is based on minimization of the following objective function:
$J_m = \sum_{i=1}^{N}\sum_{j=1}^{C}u_{ij}^m\|x_i-c_j\|^2,\,\,\,1\leq m\leq\infty$
where $m$ is any real number greater than 1, $u_{ij}$ is the degree of membership of $x_i$ in the cluster $j$ , $x_i$ is the $i$ th of d-dimensional measured data, $c_j$ is the $d$ -dimension center of the cluster, and $\|*\|$ is any norm expressing the similarity between any measured data and the center.
Fuzzy partitioning is carried out through an iterative optimization of the objective function shown above, with the update of membership $u_{ij}$ and the cluster centers $c_j$ by:
$\begin{array}{l} u_{ij} = \frac{1}{\sum_{k=1}^{C}\left(\frac{\|x_i-c_j\|}{x_i-c_k}\right)^{\frac{2}{m-1}}} \\ \\ c_j=\frac{\sum_{i=1}{N}u_{ij}^m\cdot x_i}{\sum_{i=1}^{N}u_{ij}^m} \end{array}$
This iteration will stop when $\text{max}_{ij}\left\{|u_{ij}^{(k+1)}-u_{ij}^{(k)}|\right\}<\epsilon$ , where $\epsilon$ is a termination criterion between 0 and 1, whereas $k$ is the iteration steps. This procedure converges to a local minimum or a saddle point of $J_m$ .
The algorithm is composed of the following steps:

Initialize $U=[u_{ij}]$ matrix, $U^{(0)}$
At $k - s t e p$ : calculate the centers vectors $C^(k)=[c_j]$ with $U^{(k)}$
$c_{j}=\frac{\sum_{i=1}^{N} u_{i j}^{m} \cdot x_{i}}{\sum_{i=1}^{N} u_{i j}^{m}}$
Update $U^{(k)}$ , $U^{(k+1)}$
$u_{i j}=\frac{1}{\sum_{k=1}^{c}\left(\frac{\left\|x_{i}-c_{j}\right\|}{\left\|x_{i}-c_{k}\right\|}\right)^{\frac{2}{m-1}}}$
If $\left\|U^{(k+1)}-U^{(k)}\right\|<\varepsilon$ , then STOP; otherwise return to step 2.

import numpy as np
import cv2
import os
#import matplotlib.pyplot as plt

def fcm(data, cluster_n, options=None):
    data_n = data.shape[0]
    # Change the following to set default options
    default_options = [2,  # exponent for the partition matrix U
        100,    # max. number of iteration
        1e-5,   # min. amount of improvement
        1]     # info display during iteration
    if options==None:
        options = default_options
    else:
        # If "options" is not fully specified, pad it with default values.
        if len(options) < 4:
            tmp = default_options
            tmp[0:len[options]] = options
            options = tmp

    expo = options[0]      # Exponent for U
    max_iter = options[1]  # Max. iteration
    min_impro = options[2] # Min. improvement
    display = options[3]   # Display info or not

    obj_fcn = np.zeros([max_iter, 1])  # Array for objective function

    U = initfcm(cluster_n, data_n)     # Initial fuzzy partition
    # Main loop
    for i in np.arange(max_iter):
        U, center, obj_fcn[i] = stepfcm(data, U, cluster_n, expo)
        if display:
            print('Iteration count = %d, obj. fcn = %f', i, obj_fcn[i])
        # check termination condition
        if i > 0:
            if np.abs(obj_fcn[i] - obj_fcn[i-1]) < min_impro:
                break
    obj_fcn = obj_fcn[:i]
    return center, U, obj_fcn


def stepfcm(data, U, cluster_n, expo):
    """
    %STEPFCM One step in fuzzy c-mean clustering.
    %   [U_NEW, CENTER, ERR] = STEPFCM(DATA, U, CLUSTER_N, EXPO)
    %   performs one iteration of fuzzy c-mean clustering, where
    %
    %   DATA: matrix of data to be clustered. (Each row is a data point.)
    %   U: partition matrix. (U(i,j) is the MF value of data j in cluster j.)
    %   CLUSTER_N: number of clusters.
    %   EXPO: exponent (> 1) for the partition matrix.
    %   U_NEW: new partition matrix.
    %   CENTER: center of clusters. (Each row is a center.)
    %   ERR: objective function for partition U.
    %
    %   Note that the situation of "singularity" (one of the data points is
    %   exactly the same as one of the cluster centers) is not checked.
    %   However, it hardly occurs in practice.
    %
    """
    mf = np.power(U, expo)  # MF matrix after exponential modification
    center = mf.dot(data) / np.sum(mf,1,keepdims=True).dot(np.ones([1,data.shape[1]]))  #new center
    dist = distfcm(center, data)   # fill the distance matrix
    obj_fcn = np.sum(dist**2*mf)   # objective function
    tmp = dist**(-2/(expo-1))      # calculate new U, suppose expo != 1
    U_new = tmp/(np.ones([cluster_n, 1])*np.sum(tmp,0,keepdims=True))
    return U_new, center, obj_fcn


def distfcm(center, data):
    """
    %DISTFCM Distance measure in fuzzy c-mean clustering.
    %   OUT = DISTFCM(CENTER, DATA) calculates the Euclidean distance
    %   between each row in CENTER and each row in DATA, and returns a
    %   distance matrix OUT of size M by N, where M and N are row
    %   dimensions of CENTER and DATA, respectively, and OUT(I, J) is
    %   the distance between CENTER(I,:) and DATA(J,:).
    """
    out = np.zeros([center.shape[0], np.shape(data)[0]])
    print(center.shape, data.shape)
    # fill the output matrix
    if center.shape[1] > 1:
        for k in np.arange(0, center.shape[0]):
            out[k, :] = np.sqrt(np.sum(((data-np.ones([data.shape[0], 1]).dot(center[k, :].reshape([1,-1])))**2), 1))
    else: # 1-D data
        for k in np.arange(0, center.shape[0]):
            out[k, :] = np.transpose(np.abs(center[k]-data))
    return out


def initfcm(cluster_n, data_n):
    """
    %INITFCM Generate initial fuzzy partition matrix for fuzzy c-means clustering.
    %   U = INITFCM(CLUSTER_N, DATA_N) randomly generates a fuzzy partition
    %   matrix U that is CLUSTER_N by DATA_N, where CLUSTER_N is number of
    %   clusters and DATA_N is number of data points. The summation of each
    %   column of the generated U is equal to unity, as required by fuzzy
    %   c-means clustering. 
    """
    U = np.random.rand(cluster_n, data_n)
    col_sum = np.sum(U, axis=0,keepdims=True)
    U = U / np.repeat(col_sum, [cluster_n], axis=0)
    return U


def main():
    os.chdir(r'C:\Users\wgy\Desktop\literature\forestFire\postFire\c\Landsat5TMSardinia\camp\DATA\2017\fire_level_tar_files')
    data1 = np.random.rand(50,2)+np.array([[7,8]])
    data2 = np.random.rand(50,2)+np.array([[3,3]])
    data = np.concatenate([data1,data2], axis=0)
    #data = np.random.rand(100,2)
    center, U, obj_fcn = fcm(data, 2)
    #plt.plot(data[:,1], data[:,2],'o');
    maxU = np.max(U, axis=0);
    # Find the data points with highest grade of membership in cluster 1
    index1 = U[0,:] == maxU
    # Find the data points with highest grade of membership in cluster 2
    index2 = U[1,:] == maxU
    #fig = plt.figure()
    #plt.plot(data[index1,0],data[index1,1], color='green', marker='*')
    #plt.plot(data[index2,0],data[index2,1], color='red', marker='*')
    # Plot the cluster centers
    #plt.plot([center([0,1],1)],[center([0,1],2)],'*','color','k')
    #fig.savefig('fcm.png', dpi=fig.dpi)
    print(index1)
    print(index2)
    

if __name__ == '__main__':
    main()

K-Means

K-means clustering is one of the simplest and popular unsupervised machine learning algorithm.

problem discription

Given a set of observations ( $x_1$ , $x_2$ , …, $x_n$ ), where each observation is a $d$ -dimensional real vector, $k$ -means clustering aims to partition the $n$ observations into $k$ (≤ ‘‘n’’) sets $S$ = {’‘S’'₁, '‘S’'₂, …, ‘‘S_k’’} so as to minimize the within-cluster sum of squares (WCSS) (i.e. variance). Formally, the objective is to find:
$\underset{\mathbf{S}} {\operatorname{arg\,min}} \sum_{i=1}^{k} \sum_{\mathbf x \in S_i} \left\| \mathbf x - \boldsymbol\mu_i \right\|^2 = \underset{\mathbf{S}} {\operatorname{arg\,min}} \sum_{i=1}^k |S_i| \operatorname{Var} S_i$
where $\mu_i$ is the mean of points in $S_i$ . This is equivalent to minimizing the pairwise squared deviations of points in the same cluster:
$\underset{\mathbf{S}} {\operatorname{arg\,min}} \sum_{i=1}^{k} \, \frac{1}{2 |S_i|} \, \sum_{\mathbf{x}, \mathbf{y} \in S_i} \left\| \mathbf{x} - \mathbf{y} \right\|^2$
The equivalence can be deduced from identity
$\sum_{\mathbf x \in S_i} \left\| \mathbf x - \boldsymbol\mu_i \right\|^2 =\sum_{\mathbf{x}\neq\mathbf{y} \in S_i}(\mathbf x - \boldsymbol\mu_i)(\boldsymbol\mu_i - \mathbf y)$ . Because the total variance is constant, this is equivalent to maximizing the sum of squared deviations between points in ‘‘different’’ clusters (between-cluster sum of squares, BCSS), which follows from the law of total variance.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
X= -2 * np.random.rand(100,2)
X1 = 1 + 2 * np.random.rand(50,2)
X[50:100, :] = X1
plt.scatter(X[ : , 0], X[ :, 1], s = 50, c = 10*np.ones([X.shape[0]]))
plt.show()
from sklearn.cluster import KMeans
Kmean = KMeans(n_clusters=2)
Kmean.fit(X)
Kmean.cluster_centers_
plt.scatter(X[Kmean.labels_==0,0], X[Kmean.labels_==0,1], marker='s', s =50, color='green')
plt.scatter(X[Kmean.labels_==1,0], X[Kmean.labels_==1,1], marker='o', s =50, color='red')
plt.scatter(Kmean.cluster_centers_[0,0], Kmean.cluster_centers_[0,1], marker='s', s=200, color='black')
plt.scatter(Kmean.cluster_centers_[1,0], Kmean.cluster_centers_[1,1], marker='o', s=200, color='black')
plt.show()
Kmean.labels_
sample_test=np.array([-3.0,-3.0])
second_test=sample_test.reshape(1, -1)
Kmean.predict(second_test)

Rough K-Means

在这里插入图片描述

Illustrative Example

The following table shows example information system with real-valued conditional attributes. It consist of sis objects/genes, and two features/samples. $k = 2$ , which is the number of clusters. Weight of the lower approximation $\mathrm{W}_{\text{lower}}=0.7$ , Weight of the upper approximation $\mathrm{W}_{\text{upper}} = 0.3$ and Relative threshold = 2.

$\text{Example dataset for Rough K-Means}\\ \begin{array}{|c|c|c|} \hline \mathbf{U} & \mathbf{X} & \mathbf{Y} \\ \hline 1 & 0 & 3 \\ \hline 2 & 1 & 3 \\ \hline 3 & 3 & 1 \\ \hline 4 & 3 & 0.5 \\ \hline 5 & 5 & 0 \\ \hline 6 & 6 & 0 \\ \hline \end{array}$

Step1: Randomly assign each data objects to exactly one lower approximation

$\begin{array}{l} \mathrm{K}_{1}=\{(0,3),(1,3),(3,1)\} \\ \mathrm{K}_{2}=\{(3,0.5),(5,0),(6,0)\} \end{array}$
Step 2: In this case \underline{U}(K)\neq \emptyset$ and $\overline{U}(K)-\underline{U}(K) = \emptyset$ , so we compute the centroid using $\mathrm{C}_{\mathrm{j}}=\sum_{x \in \underline{U}(K)} \frac{x_{i}}{|\underline{U}(K)|}$ ,

$\begin{array}{l} C_{1}=\left(\frac{0+1+3}{3}, \frac{3+3+1}{3}\right)=(1.33,2.33) \\ C_{2}=\left(\frac{3+5+6}{3}, \frac{0.5+0+0}{3}\right)=(4.67,0.17) \end{array}$
Find the distance from centroid to each point using equlidean distance,
$D_i=\sqrt{(x_2-x_1)^2+(y_2-y_1)^2}$

$d_{1}\left(X, C_{1}\right)$ :
$\begin{array}{l} (0,3)(1.33,2.33) \Rightarrow \sqrt{(1.33-0)^{2}+(2.33-3)^{2}}=1.49 \\ (1,3)(1.33,2.33) \Rightarrow \sqrt{(1.33-1)^{2}+(2.33-3)^{2}}=0.75 \\ (3,1)(1.33,2.33) \Rightarrow \sqrt{(1.33-3)^{2}+(2.33-1)^{2}}=2.13 \\ (3,0.5)(1.33,2.33) \Rightarrow \sqrt{(1.33-3)^{2}+(2.33-0.5)^{2}}=2.48 \\ (5,0)(1.33,2.33) \Rightarrow \sqrt{(1.33-5)^{2}+(2.33-0)^{2}}=4.45 \\ (6,0)(1.33,2.33) \Rightarrow \sqrt{(1.33-6)^{2}+(2.33-0)^{2}}=5.22 \end{array}$
$d_{2}\left(X, C_{2}\right)$ :
$\begin{array}{l} (0,3)(4.67,0.17) \Rightarrow \sqrt{(4.67-0)^{2}+(0.17-3)^{2}}=5.46 \\ (1,3)(4.67,0.17) \Rightarrow \sqrt{(4.67-1)^{2}+(0.17-3)^{2}}=4.63 \\ (3,1)(4.67,0.17) \Rightarrow \sqrt{(4.67-3)^{2}+(0.17-1)^{2}}=1.86 \\ (3,0.5)(4.67,0.17) \Rightarrow \sqrt{(4.67-3)^{2}+(0.17-0.5)^{2}}=1.70 \\ (5,0)(4.67,0.17) \Rightarrow \sqrt{(4.67-5)^{2}+(0.17-0)^{2}}=0.37 \\ (6,0)(4.67,0.17) \Rightarrow \sqrt{(4.67-6)^{2}+(0.17-0)^{2}}=1.34 \end{array}$

Step 3: Assign each object to the lower approximation $\underline{U}(K)$ or upper approximation $\overline{U}(K)$ of cluster $i$ respectively. Check if $d(X,C_i)/d(X,C_j)\leq \text{epsilon}$ .

$\Rightarrow \mathrm{d}_{2} / \mathrm{d}_{1}=5.46 / 1.49=3.66443 \nleq 2$ . So, $x_1$ will be part of $\underline{K}_1$
$\Rightarrow 4.63 / 0.75=6.173 \nleq 2$ . So, $x_2$ will be a part of $\underline{K}_1$
$\Rightarrow 2.13/1.86=1.145 < 2$ . So, $x_3$ will not be a part of $\underline{K}_1\&\underline{K}_2$
$\Rightarrow 2.48/1.70=1.458 < 2$ . So, $x_4$ will not be a part of $\underline{K}_1\&\underline{K}_2$
$\Rightarrow 4.35/0.37=11.756\nleq 2$ . So, $x_5$ will be a part of $\underline{K}_2$
$\Rightarrow 5.22/1.34=3.895 \nleq 2$ . So, $x_6$ will be a part of $\underline{K}_2$

Now, we have cluster
$\underline{K}_1=\{(0,3), (1,3)\}\,\,\,\,\overline{K}_1=\{ (0,3), (1,3), (3,1), (3,0.5)\}$
$\underline{K}_2=\{(5,0), (6,0)\}\,\,\,\,\overline{K}_2=\{ (5,0), (6,0), (3,0), (3,0.5)\}$

Here, $\underline{U}(K)\neq \emptyset$ and $\overline{U}(K)-\underline{U}(K)\neq\emptyset$ then find out the new centroid by using below equation,
$C_j=W_{\text{lower}}\times\sum_{x\in\underline{U}(K)} \frac{x_j}{|\underline{U}(K)|} + W_{\text{upper}} \times \sum_{x\in \overline{U}(K)-\underline{U}(K)}\frac{x_i}{|\overline{U}(K)-\underline{U}(K)|}$

Cluster algorithm