Header img   

楚虽三户,亡秦必楚



机器学习中的牛顿法和拟牛顿法

1. 训练神经网络的5大算法

在神经网络中, 处理学习进程的程序称为训练算法. 有许多这样的训练算法, 并有不同的特点和效用.

在训练神经网络中, 算法的目的就是求出极值点, 在此之中, 有5大算法.

1. 梯度下降法

梯度下降法式最简单的训练算法, 它需要计算出梯度向量, 根据梯度向量以及当前的点和学习率计算出当前步长, 以此来向极值点更近一步, 经过多次迭代, 最终到达极值点.

2. 牛顿法

与梯度下降法不同, 牛顿法的理论依据是极值点的导函数为0, 所以就是求函数的驻点. 所以需要求二阶导数, 是二阶算法. 拥有更少的迭代次数, 但因为要计算海森矩阵, 计算量比较大.

3. 共轭梯度法

共轭梯度法是中和梯度下降法和牛顿法的一种算法, 该算法的目的是加快梯

Read More
神经网络中的正向传播与反向传播

1. 正向传播

1. 概述

我们知道, 一个简单的神经网络可以是这样的, 分为多个层, 输入层、隐藏层和输出层. 在层与层之间, 有权重 $w_{ij}^k$ , 表示第k层的第i个节点到下一层的第j个节点的权重. 再每一层, 都会有个激活函数, 用于归一化, 也可以说是计算该层节点的输出.

所谓正向传播, 就是依据这一过程由输入层一直延续到输出层.

2. 例

以下图所示为例. neural-network 在Input Layer,输出项就是4个元素 $x_1, x_2, x_3, x_4$

在Hid

Read More
基于方差的梯度下降算法

梯度下降算法简介

梯度下降算法是寻找函数极小值点的迭代优化算法. 必须基于当前点的梯度(或近似梯度)方向的反方向进行规定步长距离进行迭代搜索, 以达到接近极小值的算法. 假如按梯度相同方向迭代搜索, 就会接近极大值, 称为梯度上升法.

该简介译自: Gradient descent from wikipedia

(可以证明, 沿着梯度方向或反方向, 可以获得最大、最快的进度. 对于梯度的详细了解,可参照高数课本导数部分关于梯度的介绍).

梯度下降算法步骤

1. 通用步骤

  1. 计算当前误差
  2. 查看当前误差是否符合要求, 或是否已达到迭代上限, 若是,结束
Read More
PAT(A) 1135. Is It A Red-Black Tree (30)

原题目:

原题链接:https://www.patest.cn/contests/pat-a-practise/1135

1135. Is It A Red-Black Tree (30)


There is a kind of balanced binary search tree named red-black tree in the data structure. It has the following 5 properties:

(1) Every node is either red or black. (2) The root is black. (3) Every leaf (NULL) is black. (4) If

Read More
PAT(A) 1134. Vertex Cover (25)

原题目:

原题链接:https://www.patest.cn/contests/pat-a-practise/1134

1134. Vertex Cover (25)


A vertex cover of a graph is a set of vertices such that each edge of the graph is incident to at least one vertex of the set. Now given a graph with several vertex sets, you are supposed to tell if each of them is a vertex cover or not.

Read More
PAT(A) 1133. Splitting A Linked List (25)

原题目:

原题链接:https://www.patest.cn/contests/pat-a-practise/1133

1133. Splitting A Linked List (25)


Given a singly linked list, you are supposed to rearrange its elements so that all the negative values appear before all of the non-negatives, and all the values in [0, K] appear before all those greater than K. The order of th

Read More
PAT(A) 1132. Cut Integer (20)

原题目:

原题链接:https://www.patest.cn/contests/pat-a-practise/1132

1132. Cut Integer (20)


Cutting an integer means to cut a K digits long integer Z into two integers of (K/2) digits long integers A and B. For example, after cutting Z = 167334, we have A = 167 and B = 334. It is interesting to see that Z can be devided by the pr

Read More
PAT(A) 1127. ZigZagging on a Tree (30)

原题目:

原题链接:https://www.patest.cn/contests/pat-a-practise/1127

1127. ZigZagging on a Tree (30)


Suppose that all the keys in a binary tree are distinct positive integers. A unique binary tree can be determined by a given pair of postorder and inorder traversal sequences. And it is a simple standard routine to print the numbe

Read More
PAT(A) 1126. Eulerian Path (25)

原题目:

原题链接:https://www.patest.cn/contests/pat-a-practise/1126

1126. Eulerian Path (25)


In graph theory, an Eulerian path is a path in a graph which visits every edge exactly once. Similarly, an Eulerian circuit is an Eulerian path which starts and ends on the same vertex. They were first discussed by Leonhard Euler while s

Read More
PAT(A) 1125. Chain the Ropes (25)

原题目:

原题链接:https://www.patest.cn/contests/pat-a-practise/1125

1125. Chain the Ropes (25)


Given some segments of rope, you are supposed to chain them into one rope. Each time you may only fold two segments into loops and chain them into one piece, as shown by the figure. The resulting chain will be treated as another segmen

Read More