1. Some materials are taken from machine learning course of Victor Kitov
Regression: $\widehat{y}(x)=F(x)$
Binary classification: $score(y|x)=F(x),\,\widehat{y}(x)= sign(F(x))$
Input:
ALGORITHM:
For $m=1,2,...M$:
Input: training dataset $(x_{i},y_{i}),\,i=1,2,...n$; number of additive weak classifiers $M$, a family of weak classifiers $h(x)\in\{+1,-1\}$, trainable on weighted datasets.
ALGORITHM:
for $m=1,2,...M$:
Output: composite classifier $f(x)=sign\left(\sum_{m=1}^{M}\alpha_{m}h^{m}(x)\right)$
X = np.array([[-2, -1], [-2, 1], [2, -1], [2, 1], [-1, -1], [-1, 1], [1, -1], [1, 1]])
y = np.array([-1,-1,-1,-1,1,1,1,1])
plt.scatter(X[:, 0], X[:, 1], c=y, s=500)
AdaBoostClassifier(algorithm='SAMME', base_estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=1, max_features=None, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, presort=False, random_state=None, splitter='best'), learning_rate=1.0, n_estimators=3, random_state=None)
ada = AdaBoostClassifier(n_estimators=3, algorithm='SAMME',
base_estimator=DecisionTreeClassifier(max_depth=1))
ada.fit(X, y)
plot_decision(ada)
ada.estimator_weights_
array([1.09861229, 1.60943791, 1.38629436])
X, y = make_moons(noise=0.1)
plt.figure(figsize=(17,15))
plt.scatter(X[:, 0], X[:, 1], c=y)
<matplotlib.collections.PathCollection at 0x1a172e5fd0>
interact(ada_demo, n_est=IntSlider(min=1, max=150, value=1, step=1))
<function __main__.ada_demo>
Gradient descend algorithm:
Input: $\eta$-parameter, controlling the speed of convergence $M$-number of iterations
ALGORITHM:
interact(grad_small_demo,
n_est=IntSlider(min=0, max=50, value=0, step=1),
learning_rate=FloatSlider(min=0.1, max=1., value=0.1, step=0.05),
max_depth=IntSlider(min=1, max=5, value=1, step=1))
<function __main__.grad_small_demo>
Input: training dataset $(x_{i},y_{i}),\,i=1,2,...N$; loss function $\mathcal{L}(f,y)$; learning rate $\nu$ and the number $M$ of successive additive approximations.
For each step $m=1,2,...M$:
Output: approximation function $f_{M}(x)=f_{0}(x)+\sum_{m=1}^{M}\nu h_{m}(x)$
y = np.array([0, 0, 1, 1, 0])
X = np.array([
[1,1],
[2,1],
[1,2],
[-1,1],
[-1,0],
])
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='flag')
<matplotlib.collections.PathCollection at 0x1a17d507f0>
interact(grad_small_demo_class,
n_est=IntSlider(min=1, max=50, value=1, step=1),
learning_rate=FloatSlider(min=0.1, max=1., value=0.1, step=0.05),
max_depth=IntSlider(min=1, max=5, value=1, step=1))
<function __main__.grad_small_demo_class>
interact(grad_demo, n_est=IntSlider(min=1, max=150, value=1, step=1))
<function __main__.grad_demo>
Input: $M$-number of iterations
ALGORITHM:
Input: training dataset $(x_{i},y_{i}),\,i=1,2,...N$; loss function $\mathcal{L}(f,y)$; learning rate $\nu$ and the number $M$ of successive additive approximations.
For each step $m=1,2,...M$:
solve univariate optimization problem: $$ \sum_{i=1}^{N}\mathcal{L}\left(f_{m-1}(x_{i})+c_{m}h_{m}(x_{i}),y_{i}\right)\to\min_{c_{m}\in\mathbb{R}_{+}} $$
set $f_{m}(x)=f_{m-1}(x)+c_m h_{m}(x)$
Output: approximation function $f_{M}(x)=f_{0}(x)+\sum_{m=1}^{M}c_m h_{m}(x)$
Input : training dataset $(x_{i},y_{i}),\,i=1,2,...N$; loss function $\mathcal{L}(f,y)$ and the number $M$ of successive additive approximations.
Output: approximation function $f_{M}(x)$
Comments:
Subsampling