1. Some materials are taken from machine learning course of Victor Kitov
$$ F(x)=f_{0}(x)+\alpha_{1}h_{1}(x)+...+\alpha_{M}h_{M}(x) $$
Regression: $\widehat{y}(x)=F(x)$
Binary classification: $score(y|x)=F(x),\,\widehat{y}(x)= sign(F(x))$
Input: training dataset $(x_{i},y_{i}),\,i=1,2,...N$; loss function $\mathcal{L}(f,y)$, general form of ``base learner'' $h(x|\gamma)$ (dependent from parameter $\gamma$) and the number $M$ of successive additive approximations.
For $m=1,2,...M$:
Input: training dataset $(x_{i},y_{i}),\,i=1,2,...n$; number of additive weak classifiers $M$, a family of weak classifiers $h(x)\in\{+1,-1\}$, trainable on weighted datasets.
for $m=1,2,...M$:
Output: composite classifier $f(x)=sign\left(\sum_{m=1}^{M}\alpha_{m}h^{m}(x)\right)$
X = np.array([[-2, -1], [-2, 1], [2, -1], [2, 1], [-1, -1], [-1, 1], [1, -1], [1, 1]])
y = np.array([-1,-1,-1,-1,1,1,1,1])
plt.scatter(X[:, 0], X[:, 1], c=y, s=500)
ada = AdaBoostClassifier(n_estimators=3, algorithm='SAMME',
base_estimator=DecisionTreeClassifier(max_depth=1))
ada.fit(X, y)
AdaBoostClassifier(algorithm='SAMME', base_estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=1, max_features=None, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, presort=False, random_state=None, splitter='best'), learning_rate=1.0, n_estimators=3, random_state=None)
plot_decision(ada)
ada.estimator_weights_
array([ 1.09861229, 1.60943791, 1.38629436])
X, y = make_moons(noise=0.1)
plt.figure(figsize=(7,5))
plt.scatter(X[:, 0], X[:, 1], c=y)
<matplotlib.collections.PathCollection at 0x1a13a89990>
interact(ada_demo, n_est=IntSlider(min=1, max=150, value=1, step=1))
<function __main__.ada_demo>
$$ F(w)\to\min_{w},\quad w\in\mathbb{R}^{N} $$
Gradient descend algorithm:
Input: $\eta$-parameter, controlling the speed of convergence $M$-number of iterations
ALGORITHM:
Input: $M$-number of iterations
ALGORITHM:
Input: training dataset $(x_{i},y_{i}),\,i=1,2,...N$; loss function $\mathcal{L}(f,y)$ and the number $M$ of successive additive approximations.
For each step $m=1,2,...M$:
Output: approximation function $f_{M}(x)=f_{0}(x)+\sum_{m=1}^{M}c_{m}h_{m}(x)$
Input : training dataset $(x_{i},y_{i}),\,i=1,2,...N$; loss function $\mathcal{L}(f,y)$ and the number $M$ of successive additive approximations.
Output: approximation function $f_{M}(x)$
interact(grad_demo, n_est=IntSlider(min=1, max=150, value=1, step=1))
<function __main__.grad_demo>
$$ f_{m}(x)=f_{m-1}(x)+\nu\sum_{j=1}^{J_{m}}\gamma_{jm}\mathbb{I}[x\in R_{jm}] $$
Comments:
Subsampling
$$ \phi_j(x) = \frac{1}{N}\sum_{k=1}^N F(x^k_1, x^k_2,\dots, x^k_{j-1}, x, x^k_{j+1} \dots,x^k_p) $$