概述
1. 固定数据矩阵维度
X = (特征数,样本数m)
Y = (1, 样本数)
w = (n[L],n[L-1])
b = (n[L], 1)
2. 如何防止梯度消失或爆炸
1-4里面 assignment2里面关于深层神经网络的初始化,用了2-1里面讲的方法,即为了防止梯度消失或爆炸,可以使其权重除以输入层神经单元n[l-1]的个数(在初始化时),这样新得到的z就不会变化过大。具体的对于不同激活函数,人们研究了其对应的最优值:
最后一种又称Xavier initialization
在2-1的编程练习中,练习了0初始化,任意初始化,He初始化(上图第二个),得出的结论是:
Model | Train accuracy | Problem/Comment |
---|---|---|
3-layer NN with zeros initialization | 50% | fails to break symmetry |
3-layer NN with large random initialization | 83% | too large weights |
3-layer NN with He initialization | 99% | recommended method |
推荐He初始化。
但文中为了支持这个结论,对任意初始化的W乘以10,如果去掉乘以10的操作,可以发现虽然任意初始化的初始cost比较高,但是收敛很快,精度也很高。但是还是推荐He初始化,因为它可以有效的防止梯度消失和爆炸问题。
He初始化:
任意初始化:
3. 正则化和dropout
L2正则化后可以看到,得到的最终参数比不正则化的参数要小,权重越小认为模型就越简单,因而可以防止过拟合。
Dropout注意事项:
1. Dropout is a regularization technique.
2. You only use dropout during training. Don’t use dropout (randomly eliminate nodes) during test time.
3. Apply dropout both during forward and backward propagation.
4. During training time, divide each dropout layer by keep_prob to keep the same expected value for the activations. For example, if keep_prob is 0.5, then we will on average shut down half the nodes, so the output will be scaled by 0.5 since only the remaining half are contributing to the solution. Dividing by 0.5 is equivalent to multiplying by 2. Hence, the output now has the same expected value. You can check that this works even when keep_prob is other values than 0.5.这一条是为了保持各层激活值的期望不变
model | train accuracy | test accuracy |
---|---|---|
3-layer NN without regularization | 95% | 91.5% |
3-layer NN with L2-regularization | 94% | 93% |
3-layer NN with dropout | 93% | 95% |
可以看出,正则化降低了测试集的准确度,提高了测试准确度。这是因为正则化简化了模型,由于我们更关心测试准确度,所以正则化之后performance提高了
4. 优化算法
1. 动量梯度下降
How do you choose ββ ?
The larger the momentum β is, the smoother the update because the more we take the past gradients into account. But if β is too big, it could also smooth out the updates too much.
Common values for β range from 0.8 to 0.999. If you don’t feel inclined to tune this, β=0.9 is often a reasonable default.
Tuning the optimal β for your model might need trying several values to see what works best in term of reducing the value of the cost function J .
动量梯度的公式实际是:
但文中为了方便计算采用了简化模式:
动量梯度下降缺点:
动量初始化为0,和真实动量偏差较大,导致前面几个动量尤其是v1,其值远远小于实际值,所以在前几次迭代基本是在建立动量,然后才能真正的发挥动量作用。
解决办法:
偏差修正,即用计算的动量比上 1−β 1 − β :
t是算法运行的步数,也可以理解为迭代的次数,由于 β<1 β < 1 ,当t较小时,分母小于1,会对V进行修正,当t较大时,分母趋近于1,即随着t的增加会逐渐降低修正强度。
Adam
将momentum和RMSprop结合起来,并加入偏差修正。
v计算了之前梯度的指数加权平均
s计算了之前梯度平方的指数加权平均
optimization method | accuracy cost | shape |
---|---|---|
Gradient descent | 79.7% | oscillations震荡 |
Momentum | 79.7% | oscillations |
Adam | 94% | smoother |
mini-batch with GD:
mini-batch with Momentum:
mini-batch with Adam:
5. YOLO
Autonomous driving - Car detection
tf.boolean_mask(a,b)
tensorflow 里的一个函数,在做目标检测(YOLO)时常常用到。其中b一般是bool型的n维向量,若a.shape=[3,3,3] b.shape=[3,3] 则 tf.boolean_mask(a,b) 将使a (m维)矩阵仅保留与b中“True”元素同下标的部分。
6 FaceRecNet
model.count_params() return total parameters of the model
img_to_encoding(image_path, model) which basically runs the forward propagation of the model on the specified image
7 神经风格迁移
tf.transpose(a, perm=None, name=’transpose’)
Transposes a. Permutes the dimensions according to perm.
The returned tensor’s dimension i will correspond to the input dimension perm[i]. If perm is not given, it is set to (n-1…0), where n is the rank of the input tensor. Hence by default, this operation performs a regular matrix transpose on 2-D input Tensors.
8 RNN
Building a Recurrent Neural Network-Step by Step-v3
- basic unit
- LSTM
- Backpropagation
Dinosaurus Island–Character-level language model
- Gradient clipping: to avoid exploding gradients
- Sampling: a technique used to generate characters
最后
以上就是柔弱麦片为你收集整理的吴恩达深度学习课后编程心得1. 固定数据矩阵维度2. 如何防止梯度消失或爆炸3. 正则化和dropout4. 优化算法5. YOLO6 FaceRecNet7 神经风格迁移8 RNN的全部内容,希望文章能够帮你解决吴恩达深度学习课后编程心得1. 固定数据矩阵维度2. 如何防止梯度消失或爆炸3. 正则化和dropout4. 优化算法5. YOLO6 FaceRecNet7 神经风格迁移8 RNN所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
发表评论 取消回复