1. Some materials are taken from machine learning course of Victor Kitov
$$ \text{Rand}(\hat{\pi},\pi^*) = \frac{a + d}{a + b + c + d} \text{,}$$ where
$d$ - number of pairs that are separated both in $\hat{\pi}$ and $\pi^*$
$b$ ($c$) - number of pairs that are separated both in $\hat{\pi}$ ($\pi^*$), but grouped in $\pi^*$ ($\hat{\pi}$)
$$ \text{Rand}(\hat{\pi},\pi^*) = \frac{tp + tn}{tp + fp + fn + tn} \text{,}$$ where
$tn$ - number of pairs that are separated both in $\hat{\pi}$ and $\pi^*$
$fp$ ($fn$) - number of pairs that are separated both in $\hat{\pi}$ ($\pi^*$), but grouped in $\pi^*$ ($\hat{\pi}$)
Adjusted Rand Index
$$\text{ARI}(\hat{\pi},\pi^*) = \frac{\text{Rand}(\hat{\pi},\pi^*) - \text{Expected}}{\text{Max} - \text{Expected}}$$
Check wikipedia =)
For each object $x_{i}$ define:
Silhouette coefficient for $x_{1},...x_{N}$: $$ Silhouette=\frac{1}{N}\sum_{i=1}^{N}\frac{d_{i}-s_{i}}{\max\{d_{i},s_{i}\}} $$
Advantages
Disadvantages
It can be accurately modelled with existing parametric family - Normal
$$ p(x | \theta) = \mathcal{N}(x|\mu, \sigma) = \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left({-\frac{(x-\mu)^2}{2\sigma^2}}\right) $$ or shall we.. $$ p(x | \theta) = \mathcal{N}(x|\mu, \Sigma) = \frac{1}{(2\pi\Sigma)^{1/2}}\exp\left(-\frac{1}{2}(x-\mu)^\top\Sigma^{-1}(x-\mu)\right) $$
$$ \begin{align} L(x) = & \prod\limits_{i=1}^B\mathcal{N}(x_i|\mu, \Sigma)\rightarrow \max\limits_{\mu, \Sigma} \end{align} $$
$$\mu_{ML} = \frac 1 N \sum_{i=1}^N x_i, \quad \mathbf{\Sigma}_{ML} = \frac 1 N \sum_{i=1}^N (x_i - \mu_{ML}) (x_i - \mu_{ML})^T$$
$$ p(x)=\sum_{z=1}^{Z}\phi_{z}p(x|\theta_{z}) $$
$p(x|\theta_{z})$ may be of single or different parametric families.
Gaussians model continious r.v. on $(-\infty,+\infty)$.
$$p(x|\theta_{z})=N(x|\mu_{z},\Sigma_{z}),\,\theta_{z}=\{\mu_{z},\Sigma_{z}\}$$
$$ p(x)=\sum_{z=1}^{Z}\phi_{z}N(x|\mu_{z},\Sigma_{z}) $$
$$ \begin{align} L(x) = & \sum_{i=1}^N \log p(x_i) = \sum_{i=1}^N\log\left(\sum_{z=1}^{Z}\phi_{z}N(x|\mu_{z},\Sigma_{z})\right) \rightarrow \max\limits_{\mu_z, \Sigma_z, \phi_z} \end{align} $$
$$ \sum_k w_{ik} = 1 $$
We can make it hard clustering using $z_{i}=\arg\max_{z}w_{iz}$.
EM clustering becomes K-means clustering when: