1. Some materials are taken from machine learning course of Victor Kitov
![]() |
![]() |
---|
Human brain consists of multiple interconnected neuron cells
This is structure of multilayer perceptron - acyclic directed graph.
Output of neuron $j$: $O_{j}=f(I_{j})$.
$$ I_{j}=\sum_{k\in inc(j)}w_{kj}O_{k} $$
Number of layers usually denotes all layers except input layer (hidden layers+output layer)
Classification:
$$ \frac{1}{N}\sum_{n=1}^{N}(\widehat{y}_{n}(x_{n})-y_{n})^{2}\to\min_{w} $$
$$ \frac{1}{NK}\sum_{n=1}^{N}\sum_{k=1}^{K}(\widehat{y}_{nk}(x_{n})-y_{nk})^{2}\to\min_{w} $$
Two classes ($y\in\{0,1\}$, $p=P(y=1)$): $$ \prod_{n=1}^{N}p(y_{n}=1|x_{n})^{y_{n}}[1-p(y_{n}=1|x_{n})]{}^{1-y_{n}}\to\max_{w} $$
$C$ classes ($y_{nc}=\mathbb{I}\{y_{n}=c\}$):
$$ \prod_{n=1}^{N}\prod_{c=1}^{C}p(y_{n}=c|x_{n})^{y_{nc}}\to\max_{w} $$
We may optimize neural network using gradient descent:
k=0 initialize randomly w_0 # small values for sigmoid and tangh
while stop criteria not met:
w_k+1 := w_k - alpha * grad(L(w_k))
k := k+1
Standardization of features makes gradient descend converge faster
Direct $\nabla E(w)$ calculation, using $$ \frac{\partial L}{\partial w_{i}}=\frac{L(w+\varepsilon_{i})-L(w)}{\varepsilon}+O(\varepsilon)\label{eq:deriv1} $$ or better $$ \frac{\partial L}{\partial w_{i}}=\frac{L(w+\varepsilon_{i})-L(w-\varepsilon_{i})}{2\varepsilon}+O(\varepsilon^{2})\label{eq:deriv2} $$ has complexity: $O(W^{2})$
Backpropagation algorithm needs only $O(W)$ to evaluate all derivatives.
Different optima will correspond to:
So we may solve task many times for different conditions and then
And/Or use some complex optimization methods
x = np.linspace(-10, 10, 1000)
gr_sigm = sigmoid(x)*(1-sigmoid(x))
plt.plot(x, gr_sigm)
[<matplotlib.lines.Line2D at 0x116792f90>]
Net1: no hidden layer
Net2: 1 hidden layer, 12 hidden units fully connected
Net3: 2 hidden layers, locally connected
Net4: 2 hidden layers, locally connected with weight sharing
Net5: 2 hidden layers, locally connected, 2 levels of weight sharing
Advantages of neural networks:
Disadvantages of neural networks: