吴恩达深度学习课后编程心得1. 固定数据矩阵维度2. 如何防止梯度消失或爆炸3. 正则化和dropout4. 优化算法5. YOLO6 FaceRecNet7 神经风格迁移8 RNN

65 阅读 0 评论 43 点赞

我是靠谱客的博主柔弱麦片，这篇文章主要介绍吴恩达深度学习课后编程心得1. 固定数据矩阵维度2. 如何防止梯度消失或爆炸3. 正则化和dropout4. 优化算法5. YOLO6 FaceRecNet7 神经风格迁移8 RNN，现在分享给大家，希望可以做个参考。

1. 固定数据矩阵维度

X = (特征数，样本数m)
Y = （1，样本数）
w = （n[L]，n[L-1]）
b = (n[L], 1)

2. 如何防止梯度消失或爆炸

1-4里面 assignment2里面关于深层神经网络的初始化，用了2-1里面讲的方法，即为了防止梯度消失或爆炸，可以使其权重除以输入层神经单元n[l-1]的个数(在初始化时)，这样新得到的z就不会变化过大。具体的对于不同激活函数，人们研究了其对应的最优值：
这里写图片描述
最后一种又称Xavier initialization

在2-1的编程练习中，练习了0初始化，任意初始化，He初始化(上图第二个)，得出的结论是：

Model	Train accuracy	Problem/Comment
3-layer NN with zeros initialization	50%	fails to break symmetry
3-layer NN with large random initialization	83%	too large weights
3-layer NN with He initialization	99%	recommended method

推荐He初始化。
但文中为了支持这个结论，对任意初始化的W乘以10，如果去掉乘以10的操作，可以发现虽然任意初始化的初始cost比较高，但是收敛很快，精度也很高。但是还是推荐He初始化，因为它可以有效的防止梯度消失和爆炸问题。
He初始化：
这里写图片描述

任意初始化：
这里写图片描述

3. 正则化和dropout

L2正则化后可以看到，得到的最终参数比不正则化的参数要小，权重越小认为模型就越简单，因而可以防止过拟合。

Dropout注意事项：
1. Dropout is a regularization technique.
2. You only use dropout during training. Don’t use dropout (randomly eliminate nodes) during test time.
3. Apply dropout both during forward and backward propagation.
4. During training time, divide each dropout layer by keep_prob to keep the same expected value for the activations. For example, if keep_prob is 0.5, then we will on average shut down half the nodes, so the output will be scaled by 0.5 since only the remaining half are contributing to the solution. Dividing by 0.5 is equivalent to multiplying by 2. Hence, the output now has the same expected value. You can check that this works even when keep_prob is other values than 0.5.这一条是为了保持各层激活值的期望不变

model	train accuracy	test accuracy
3-layer NN without regularization	95%	91.5%
3-layer NN with L2-regularization	94%	93%
3-layer NN with dropout	93%	95%

可以看出，正则化降低了测试集的准确度，提高了测试准确度。这是因为正则化简化了模型，由于我们更关心测试准确度，所以正则化之后performance提高了

4. 优化算法

1. 动量梯度下降

How do you choose ββ ?
The larger the momentum β is, the smoother the update because the more we take the past gradients into account. But if β is too big, it could also smooth out the updates too much.
Common values for β range from 0.8 to 0.999. If you don’t feel inclined to tune this, β=0.9 is often a reasonable default.
Tuning the optimal β for your model might need trying several values to see what works best in term of reducing the value of the cost function J .
动量梯度的公式实际是：

v d W [l] = β * v d W [l - 1] + (1 - β) * d W [l - 1] v d W [ l ] = β * v d W [ l - 1 ] + ( 1 - β ) * d W [ l - 1 ]

但文中为了方便计算采用了简化模式：

v d W [l] = β * v d W [l] + (1 - β) * d W [l - 1] v d W [ l ] = β * v d W [ l ] + ( 1 - β ) * d W [ l - 1 ]

动量梯度下降缺点：
动量初始化为0，和真实动量偏差较大，导致前面几个动量尤其是v1，其值远远小于实际值，所以在前几次迭代基本是在建立动量，然后才能真正的发挥动量作用。
解决办法：
偏差修正，即用计算的动量比上

1−β 1 − β ：

v d W [l] = v d W [ l ] ( 1 - β t ) v d W [ l ] = v d W [ l ] ( 1 - β t )

t是算法运行的步数，也可以理解为迭代的次数，由于

β<1 β < 1 ，当t较小时，分母小于1，会对V进行修正，当t较大时，分母趋近于1，即随着t的增加会逐渐降低修正强度。

Adam

将momentum和RMSprop结合起来，并加入偏差修正。
这里写图片描述
v计算了之前梯度的指数加权平均
s计算了之前梯度平方的指数加权平均

optimization method	accuracy cost	shape
Gradient descent	79.7%	oscillations震荡
Momentum	79.7%	oscillations
Adam	94%	smoother

mini-batch with GD：
这里写图片描述
mini-batch with Momentum：

mini-batch with Adam：

5. YOLO

Autonomous driving - Car detection

tf.boolean_mask(a,b)
tensorflow 里的一个函数，在做目标检测（YOLO）时常常用到。其中b一般是bool型的n维向量，若a.shape=[3,3,3] b.shape=[3,3] 则 tf.boolean_mask(a,b) 将使a (m维)矩阵仅保留与b中“True”元素同下标的部分。

6 FaceRecNet

model.count_params() return total parameters of the model

img_to_encoding(image_path, model) which basically runs the forward propagation of the model on the specified image

7 神经风格迁移

tf.transpose(a, perm=None, name=’transpose’)

Transposes a. Permutes the dimensions according to perm.
The returned tensor’s dimension i will correspond to the input dimension perm[i]. If perm is not given, it is set to (n-1…0), where n is the rank of the input tensor. Hence by default, this operation performs a regular matrix transpose on 2-D input Tensors.

8 RNN

Building a Recurrent Neural Network-Step by Step-v3

basic unit
LSTM
Backpropagation

Dinosaurus Island–Character-level language model

Gradient clipping: to avoid exploding gradients
Sampling: a technique used to generate characters

最后

以上就是柔弱麦片最近收集整理的关于吴恩达深度学习课后编程心得1. 固定数据矩阵维度2. 如何防止梯度消失或爆炸3. 正则化和dropout4. 优化算法5. YOLO6 FaceRecNet7 神经风格迁移8 RNN的全部内容，更多相关吴恩达深度学习课后编程心得1.内容请搜索靠谱客的其他文章。

本图文内容来源于网友提供，作为学习参考使用，或来自网络收集整理，版权属于原作者所有。

本文分类：深度学习
浏览次数：65 次浏览
发布日期：2024-10-15 03:50:02
本文链接：https://www.kaopuke.com/article/k-p-k_13_u_7_o_10_f4_14__7_g5.html

吴恩达深度学习课后编程心得1. 固定数据矩阵维度2. 如何防止梯度消失或爆炸3. 正则化和dropout4. 优化算法5. YOLO6 FaceRecNet7 神经风格迁移8 RNN

1. 固定数据矩阵维度

2. 如何防止梯度消失或爆炸

3. 正则化和dropout

4. 优化算法

1. 动量梯度下降

Adam

5. YOLO

6 FaceRecNet

7 神经风格迁移

8 RNN

Building a Recurrent Neural Network-Step by Step-v3

Dinosaurus Island–Character-level language model

最后

评论列表共有 0 条评论

发表评论取消回复

吴恩达深度学习课后编程心得1. 固定数据矩阵维度2. 如何防止梯度消失或爆炸3. 正则化和dropout4. 优化算法5. YOLO6 FaceRecNet7 神经风格迁移8 RNN

1. 固定数据矩阵维度

2. 如何防止梯度消失或爆炸

3. 正则化和dropout

4. 优化算法

1. 动量梯度下降

Adam

5. YOLO

6 FaceRecNet

7 神经风格迁移

8 RNN

Building a Recurrent Neural Network-Step by Step-v3

Dinosaurus Island–Character-level language model

最后

相关文章

评论列表共有 0 条评论

发表评论 取消回复

发表评论取消回复