1. Some materials are taken from machine learning course of Victor Kitov
(lat, lon)
(lat, lon)
$$ \rho(x_i, x_j) = \sum\limits_{d=1}^{D}(x^d_i - x^d_j)^2 \text{: euclidean distance} $$
$$ \rho(x_i, x_j) = \sum\limits_{d=1}^{D}|x^d_i - x^d_j| \text{: manhattan distance} $$
$$ \rho(x_i, x_j) = 1 - \frac{\langle x_i,x_j \rangle}{||x_i||_2\cdot||x_j||_2} \text{: cosine distance} $$
\begin{equation} D ( i , j ) = \begin{cases} {\begin{array}{llcl}0,&&&i=0,\ j=0\\i,&&&j=0,\ i>0\\j,&&&i=0,\ j>0\\\min\{\\&D(i,j-1)+1,\\&D(i-1,j)+1,&&j>0,\ i>0\\&D(i-1,j-1)+{\rm {m}}(S_{1}[i],S_{2}[j])\\\}\end{array}}, \end{cases} \end{equation} where $m(a,b) = 0$, if $a = b$ and $1$ otherwise
Consider training sample $\left(x_{1},y_{1}\right),...\left(x_{N},y_{N}\right)$ with
Training: Calculate centroids for each class $c=1,2,...C:$ $$ \mu_{c}=\frac{1}{N_{1}}\sum_{n=1}^{N}x_{n}\mathbb{I}[y_{n}=c] $$
Classification:
interact(plot_centroid_class)
<function __main__.plot_centroid_class>
Classification:
plt.scatter(X_moons[:,0], X_moons[:,1], c=y_moons, cmap=plt.cm.spectral)
plt.xlabel('$x_1$')
plt.ylabel('$x_2$')
Text(0,0.5,u'$x_2$')
interact(plot_knn_class, k=IntSlider(min=1, max=10, value=1))
<function __main__.plot_knn_class>
Regression:
plt.plot(x_true, y_true, c='g', label='$f(x)$')
plt.scatter(x, y, label='actual data')
plt.xlabel('x')
plt.ylabel('y')
plt.legend(loc=2)
<matplotlib.legend.Legend at 0x1a197da850>
plot_linreg()
interact(plot_knn, k=IntSlider(min=1, max=10, value=1))
<function __main__.plot_knn>
When several classes get the same rank, we can assign to class:
* None
Advantages:
where $\mu_{j},\,\sigma_{j},\,L_{j},\,U_{j}$ are mean value, standard deviation, minimum and maximum value of the $j$-th feature.
$D=2$![]() |
$D=2 \dots 100$![]() |
---|
$$ \lim_{D \rightarrow \infty} \frac{\text{dist}_{max} - \text{dist}_{min}}{\text{dist}_{min}} = 0$$
Consider for object $x$:
Classification: $$\begin{align*} g_{c}(x) & =\sum_{k=1}^{K}\mathbb{I}[y_{i_{k}}=c],\quad c=1,2,...C.\\ \widehat{y}(x) & =\arg\max_{c}g_{c}(x) \end{align*} $$
Regression: $$ \widehat{y}(x)=\frac{1}{K}\sum_{k=1}^{K}y_{i_{k}} $$
Weighted classification: $$\begin{align*} g_{c}(x) & =\sum_{k=1}^{K}w(k,\,\rho(x,x_{i_{k}}))\mathbb{I}[y_{i_{k}}=c],\quad c=1,2,...C.\\ \widehat{y}(x) & =\arg\max_{c}g_{c}(x) \end{align*} $$
Weighted regression: $$ \widehat{y}(x)=\frac{\sum_{k=1}^{K}w(k,\,\rho(x,x_{i_{k}}))y_{i_{k}}}{\sum_{k=1}^{K}w(k,\,\rho(x,x_{i_{k}}))} $$
Index dependent weights: $$ w_{k}=\alpha^{k},\quad\alpha\in(0,1) $$ $$ w_{k}=\frac{K+1-k}{K} $$
Distance dependent weights:
$$ w_{k}=\begin{cases} \frac{\rho(z_{K},x)-\rho(z_{k},x)}{\rho(z_{K},x)-\rho(z_{1},x)}, & \rho(z_{K},x)\ne\rho(z_{1},x)\\ 1 & \rho(z_{K},x)=\rho(z_{1},x) \end{cases} $$ $$ w_{k}=\frac{1}{\rho(z_{k},x)} $$
interact(plot_knn_class_kernel, k=IntSlider(min=1, max=10, value=1),
h=FloatSlider(min=0.05, max=5, value=1, step=0.05))
<function __main__.plot_knn_class_kernel>
Important hyperparameters of K-NN:
Output depends on feature scaling.