# 基于方差的梯度下降算法

## 梯度下降算法简介

(可以证明, 沿着梯度方向或反方向, 可以获得最大、最快的进度. 对于梯度的详细了解,可参照高数课本导数部分关于梯度的介绍).

## 梯度下降算法步骤

### 1. 通用步骤

1. 计算当前误差
2. 查看当前误差是否符合要求, 或是否已达到迭代上限, 若是,结束
3. 计算当前点的梯度
4. 反方向根据梯度更新当前点, 计算误差,走2

### 2. 例子

1 2 4 4 -0.4
2 1.6 2.56 3.2 -0.32
3 1.28 1.6384 2.56 -0.256
4 1.024 1.048576 2.048 -0.2048
5 0.8192 0.6710 1.6384 -0.16384
6 0.65536 0.42949673 1.31072 -0.131072
7 0.524288 0.274877907 1.048576 -0.1048576
8 0.4194304 0.17592186 0.8388608 -0.08388608
9 0.335544... 0.112589... 0.671088... -0.067108...
10 0.268435... 0.072057... 0.536870... -0.053687...
11 0.214748... 0.046116... 0.429496... -0.042949...
12 0.171798... 0.029514... 0.343597... -0.034359...
13 0.137438... 0.018889... 0.274877... -0.027487...
14 0.109951... 0.012089... 0.219902... -0.021990...
15 0.087960... 0.007737... 0.175921... -0.017592...
16 0.070368... 0.004951... 0.140737... -0.014073...
17 0.056294... 0.003169... 0.112589... -0.011258...
18 0.045035... 0.002028... 0.090071... -0.009007...
19 0.036028... 0.001298... 0.072057... -0.007205...
20 0.028823... 0.000830... 0.057646... -0.005764...

### 3. 代码(以python为例)

target = 0
x = 2
rate = 0.1
precision = 0.01
previous_step_size = 1
max_iters = 20 # maximum number of iterations
iters = 0 #iteration counter

def fun(x):
return x * x

def derivative_fun(x):
return 2 * x

while (abs(fun(x) - target) > precision) and (iters < max_iters):
dx = -rate * derivative_fun(x)
x += dx
iters+=1

print("The local minimum occurs at", str(x)[:8])
#The output for the above will be: ('The local minimum occurs at 0.087960')

## 基于方差的梯度下降

### 2. 推导(以神经网络中迭代更新weights为例)

$$E=(y - \check y)^2$$ 这里, 我们的目的是选取适当的 $w_i$ , 使得E最小, 也就是求E的极小值点, E关于 $w_i$ 的函数为:

$$E = (y - \check y)^2 =(y - f(h))^2 =(y - f(\sum_i^n{w_ix_i}))^2$$ 为简单起见,我们令:

$$E = \frac{1}{2}*(y - \check y)^2$$

$$\frac{\partial E}{\partial w_i} = \frac{1}{2}*\frac{\partial (y - f(h))^2}{\partial w_i} \newline = (y-f(h)) * \frac{\partial(y-f(h))}{\partial w_i}\newline = -(y-f(h)) * \frac{\partial f(h)}{\partial w_i}\newline = -(y-f(h)) * f^{'}(h) * \frac{\partial h}{\partial w_i}\newline = -(y-f(h)) * f^{'}(h) * \frac{\partial \sum_i^n{w_ix_i}}{\partial w_i}\newline = -(y-f(h)) * f^{'}(h) * \frac{\partial (w_1x_1 +w_2x_2 + ... + w_ix_i + ...+ w_nx_n)}{\partial w_i}\newline = -(y-f(h)) * f^{'}(h) * (\frac{\partial w_1x_1}{\partial w_i}+\frac{\partial w_2x_2}{\partial w_i} + ... + \frac{\partial w_ix_i}{\partial w_i} + ...+ \frac{\partial w_nx_n}{\partial w_i})\newline = -(y-f(h)) * f^{'}(h) * (0+0 + ... + x_i + ...+ 0)\newline = -(y-f(h)) * f^{'}(h) * x_i\newline = -\delta * x_i$$