1. Some materials are taken from machine learning course of Victor Kitov
In fact we have only $X$,$Y$ (Known set) and $X'$ (Test set)
Can minimize empirical risk $$ L(\theta|X,Y)=\frac{1}{N}\sum_{n=1}^{N}\mathcal{L}(f_{\theta}(\mathbf{x}_{n}),\,y_{n}) $$
Method of empirical risk minimization: $$ \widehat{\theta}=\arg\min_{\theta}L(\theta|X,Y) $$
Divide training set into K parts, referred as folds (here $K=4$).
Variants:
Cross-validation empirical risk estimation
$$\widehat{L}_{total}=\frac{1}{N}\sum_{n=1}^{N}\mathcal{L}(f_{\widehat{\theta}^{-k(n)}}(x_{n}),\,y_{n})$$For $K$-fold CV we have:
$K$ estimations of empirical risk: $\widehat{L}_{k}=\frac{1}{\left|I_{k}\right|}\sum_{n\in I_{k}}\mathcal{L}(f_{\widehat{\theta}^{-k}}(\mathbf{x}_{n}),\,y_{n}),\,k=1,2,...K.$
(lat, lon)
(lat, lon)
where $m(a,b) = 0$, if $a = b$ and $1$ otherwise
Consider training sample $\left(x_{1},y_{1}\right),...\left(x_{N},y_{N}\right)$ with
Training: Calculate centroids for each class $c=1,2,...C:$ $$ \mu_{c}=\frac{1}{N_{1}}\sum_{n=1}^{N}x_{n}\mathbb{I}[y_{n}=c] $$
Classification:
interact(plot_centroid_class)
<function __main__.plot_centroid_class>
Classification:
plt.scatter(X_moons[:,0], X_moons[:,1], c=y_moons, cmap=plt.cm.Spectral)
plt.xlabel('$x_1$')
plt.ylabel('$x_2$')
Text(0,0.5,'$x_2$')
interact(plot_knn_class, k=IntSlider(min=1, max=10, value=1))
<function __main__.plot_knn_class>
Regression:
plt.plot(x_true, y_true, c='g', label='$f(x)$')
plt.scatter(x, y, label='actual data')
plt.xlabel('x')
plt.ylabel('y')
plt.legend(loc=2)
<matplotlib.legend.Legend at 0x1096a32b0>
plot_linreg()
interact(plot_knn, k=IntSlider(min=1, max=10, value=1))
<function __main__.plot_knn>
When several classes get the same rank, we can assign to class:
* None
Advantages:
where $\mu_{j},\,\sigma_{j},\,L_{j},\,U_{j}$ are mean value, standard deviation, minimum and maximum value of the $j$-th feature.
$D=2$ | $D=2 \dots 100$ |
---|
Consider for object $x$:
Classification: $$\begin{align*} g_{c}(x) & =\sum_{k=1}^{K}\mathbb{I}[y_{i_{k}}=c],\quad c=1,2,...C.\\ \widehat{y}(x) & =\arg\max_{c}g_{c}(x) \end{align*} $$
Regression: $$ \widehat{y}(x)=\frac{1}{K}\sum_{k=1}^{K}y_{i_{k}} $$
Weighted classification: $$\begin{align*} g_{c}(x) & =\sum_{k=1}^{K}w(k,\,\rho(x,x_{i_{k}}))\mathbb{I}[y_{i_{k}}=c],\quad c=1,2,...C.\\ \widehat{y}(x) & =\arg\max_{c}g_{c}(x) \end{align*} $$
Weighted regression: $$ \widehat{y}(x)=\frac{\sum_{k=1}^{K}w(k,\,\rho(x,x_{i_{k}}))y_{i_{k}}}{\sum_{k=1}^{K}w(k,\,\rho(x,x_{i_{k}}))} $$
Index dependent weights: $$ w_{k}=\alpha^{k},\quad\alpha\in(0,1) $$ $$ w_{k}=\frac{K+1-k}{K} $$
Distance dependent weights:
$$ w_{k}=\begin{cases} \frac{\rho(z_{K},x)-\rho(z_{k},x)}{\rho(z_{K},x)-\rho(z_{1},x)}, & \rho(z_{K},x)\ne\rho(z_{1},x)\\ 1 & \rho(z_{K},x)=\rho(z_{1},x) \end{cases} $$$$ w_{k}=\frac{1}{\rho(z_{k},x)} $$interact(plot_knn_class_kernel, k=IntSlider(min=1, max=10, value=1),
h=FloatSlider(min=0.05, max=5, value=1, step=0.05))
<function __main__.plot_knn_class_kernel>
Important hyperparameters of K-NN:
Output depends on feature scaling.