概述
我的个人博客:https://huaxuan0720.github.io/ ,欢迎访问
前言
在前面的文章中,介绍了二维平面上的卷积及其反向传播的算法,但是,步长为1和2毕竟都是两个比较小的数字,如果换成更大的数字,反向传播的方式是不是还适合呢?所以,我们考虑下面这个十分极端的例子,来验证反向传播算法的有效性。
一、参数设置
在之前的参数设置中,我们使用的输入矩阵都是5x5,在这篇文章中,我们使用10x10大小的矩阵,在卷积核方面,我们依然使用3x3大小的卷积核,步长stride方面,我们使用一个很大的数字7,padding方式依然设置为VALID。
因此,我们的参数汇总如下:
参数 | 设置 |
---|---|
输入矩阵 x x x | 一个二维矩阵,大小为10x10 |
输入卷积核 k k k | 一个二维矩阵,大小为3x3 |
步长 s t r i d e stride stride | 设置为7 |
padding | VALID |
偏置项 b b b | 一个浮点数 |
和前面一样,我们定义卷积操作的符号为
c
o
n
v
conv
conv,我们可以将卷积表示为(需要注意的是这里步长选取为7):
x
  
c
o
n
v
  
k
+
b
=
u
x ; conv ; k + b = u
xconvk+b=u
展开之后,我们可以得到:
[
x
1
,
1
x
1
,
2
x
1
,
3
x
1
,
4
x
1
,
5
x
1
,
6
x
1
,
7
x
1
,
8
x
1
,
9
x
1
,
10
x
2
,
1
x
2
,
2
x
2
,
3
x
2
,
4
x
2
,
5
x
2
,
6
x
2
,
7
x
2
,
8
x
2
,
9
x
2
,
10
x
3
,
1
x
3
,
2
x
3
,
3
x
3
,
4
x
3
,
5
x
3
,
6
x
3
,
7
x
3
,
8
x
3
,
9
x
3
,
10
x
4
,
1
x
4
,
2
x
4
,
3
x
4
,
4
x
4
,
5
x
4
,
6
x
4
,
7
x
4
,
8
x
4
,
9
x
4
,
10
x
5
,
1
x
5
,
2
x
5
,
3
x
5
,
4
x
5
,
5
x
5
,
6
x
5
,
7
x
5
,
8
x
5
,
9
x
5
,
10
x
6
,
1
x
6
,
2
x
6
,
3
x
6
,
4
x
6
,
5
x
6
,
6
x
6
,
7
x
6
,
8
x
6
,
9
x
6
,
10
x
7
,
1
x
7
,
2
x
7
,
3
x
7
,
4
x
7
,
5
x
7
,
6
x
7
,
7
x
7
,
8
x
7
,
9
x
7
,
10
x
8
,
1
x
8
,
2
x
8
,
3
x
8
,
4
x
8
,
5
x
8
,
6
x
8
,
7
x
8
,
8
x
8
,
9
x
8
,
10
x
9
,
1
x
9
,
2
x
9
,
3
x
9
,
4
x
9
,
5
x
9
,
6
x
9
,
7
x
9
,
8
x
9
,
9
x
9
,
10
x
10
,
1
x
10
,
2
x
10
,
3
x
10
,
4
x
10
,
5
x
10
,
6
x
10
,
7
x
10
,
8
x
10
,
9
x
10
,
10
]
  
c
o
n
v
  
[
k
1
,
1
k
1
,
2
k
1
,
3
k
2
,
1
k
2
,
2
k
2
,
3
k
3
,
1
k
3
,
2
k
3
,
3
]
+
b
=
[
u
1
,
1
u
1
,
2
u
2
,
1
u
2
,
2
]
begin{bmatrix} x_{1, 1} & x_{1, 2} & x_{1, 3} &x_{1, 4} &x_{1, 5} & x_{1, 6} & x_{1, 7} & x_{1, 8} &x_{1, 9} &x_{1, 10} \ x_{2, 1} & x_{2, 2} & x_{2, 3} &x_{2, 4} &x_{2, 5} & x_{2, 6} & x_{2, 7} & x_{2, 8} &x_{2, 9} &x_{2, 10} \ x_{3, 1} & x_{3, 2} & x_{3, 3} &x_{3, 4} &x_{3, 5} & x_{3, 6} & x_{3, 7} & x_{3, 8} &x_{3, 9} &x_{3, 10} \ x_{4, 1} & x_{4, 2} & x_{4, 3} &x_{4, 4} &x_{4, 5} & x_{4, 6} & x_{4, 7} & x_{4, 8} &x_{4, 9} &x_{4, 10} \ x_{5, 1} & x_{5, 2} & x_{5, 3} &x_{5, 4} &x_{5, 5} & x_{5, 6} & x_{5, 7} & x_{5, 8} &x_{5, 9} &x_{5, 10} \ x_{6, 1} & x_{6, 2} & x_{6, 3} &x_{6, 4} &x_{6, 5} & x_{6, 6} & x_{6, 7} & x_{6, 8} &x_{6, 9} &x_{6, 10} \ x_{7, 1} & x_{7, 2} & x_{7, 3} &x_{7, 4} &x_{7, 5} & x_{7, 6} & x_{7, 7} & x_{7, 8} &x_{7, 9} &x_{7, 10} \ x_{8, 1} & x_{8, 2} & x_{8, 3} &x_{8, 4} &x_{8, 5} & x_{8, 6} & x_{8, 7} & x_{8, 8} &x_{8, 9} &x_{8, 10} \ x_{9, 1} & x_{9, 2} & x_{9, 3} &x_{9, 4} &x_{9, 5} & x_{9, 6} & x_{9, 7} & x_{9, 8} &x_{9, 9} &x_{9, 10} \ x_{10, 1} & x_{10, 2} & x_{10, 3} &x_{10, 4} &x_{10, 5} & x_{10, 6} & x_{10, 7} & x_{10, 8} &x_{10, 9} &x_{10, 10} \ end{bmatrix} ; conv ; begin{bmatrix} k_{1, 1} & k_{1, 2} & k_{1, 3}\ k_{2, 1} & k_{2, 2} & k_{2, 3}\ k_{3, 1} & k_{3, 2} & k_{3, 3}\ end{bmatrix} + b = begin{bmatrix} u_{1, 1} & u_{1, 2} \ u_{2, 1} & u_{2, 2} \ end{bmatrix}
⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡x1,1x2,1x3,1x4,1x5,1x6,1x7,1x8,1x9,1x10,1x1,2x2,2x3,2x4,2x5,2x6,2x7,2x8,2x9,2x10,2x1,3x2,3x3,3x4,3x5,3x6,3x7,3x8,3x9,3x10,3x1,4x2,4x3,4x4,4x5,4x6,4x7,4x8,4x9,4x10,4x1,5x2,5x3,5x4,5x5,5x6,5x7,5x8,5x9,5x10,5x1,6x2,6x3,6x4,6x5,6x6,6x7,6x8,6x9,6x10,6x1,7x2,7x3,7x4,7x5,7x6,7x7,7x8,7x9,7x10,7x1,8x2,8x3,8x4,8x5,8x6,8x7,8x8,8x9,8x10,8x1,9x2,9x3,9x4,9x5,9x6,9x7,9x8,9x9,9x10,9x1,10x2,10x3,10x4,10x5,10x6,10x7,10x8,10x9,10x10,10⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤conv⎣⎡k1,1k2,1k3,1k1,2k2,2k3,2k1,3k2,3k3,3⎦⎤+b=[u1,1u2,1u1,2u2,2]
将矩阵
u
u
u进一步展开,我们有:
[
u
1
,
1
u
1
,
2
u
2
,
1
u
2
,
2
]
=
[
x
1
,
1
k
1
,
1
+
x
1
,
2
k
1
,
2
+
x
1
,
3
k
1
,
3
+
x
2
,
1
k
2
,
1
+
x
2
,
2
k
2
,
2
+
x
2
,
3
k
2
,
3
+
x
3
,
1
k
3
,
1
+
x
3
,
2
k
3
,
2
+
x
3
,
3
k
3
,
3
+
b
x
1
,
8
k
1
,
1
+
x
1
,
9
k
1
,
2
+
x
1
,
10
k
1
,
3
+
x
2
,
8
k
2
,
1
+
x
2
,
9
k
2
,
2
+
x
2
,
10
k
2
,
3
+
x
3
,
8
k
3
,
1
+
x
3
,
9
k
3
,
2
+
x
3
,
10
k
3
,
3
+
b
x
8
,
1
k
1
,
1
+
x
8
,
2
k
1
,
2
+
x
8
,
3
k
1
,
3
+
x
9
,
1
k
2
,
1
+
x
9
,
2
k
2
,
2
+
x
9
,
3
k
2
,
3
+
x
10
,
1
k
3
,
1
+
x
10
,
2
k
3
,
2
+
x
10
,
3
k
3
,
3
+
b
x
8
,
8
k
1
,
1
+
x
8
,
9
k
1
,
2
+
x
8
,
10
k
1
,
3
+
x
9
,
8
k
2
,
1
+
x
9
,
9
k
2
,
2
+
x
9
,
10
k
2
,
3
+
x
10
,
8
k
3
,
1
+
x
10
,
9
k
3
,
2
+
x
10
,
10
k
3
,
3
+
b
]
begin{bmatrix} u_{1, 1} & u_{1, 2} \ u_{2, 1} & u_{2, 2} \ end{bmatrix} = \ begin{bmatrix} begin{matrix} x_{1, 1}k_{1, 1} + x_{1, 2}k_{1, 2} +x_{1, 3}k_{1, 3} + \ x_{2, 1}k_{2, 1} + x_{2, 2}k_{2, 2} +x_{2, 3}k_{2, 3} + \ x_{3, 1}k_{3, 1} + x_{3, 2}k_{3, 2} +x_{3, 3}k_{3, 3} + b \ end{matrix} & begin{matrix} x_{1, 8}k_{1, 1} + x_{1, 9}k_{1, 2} +x_{1, 10}k_{1, 3} + \ x_{2, 8}k_{2, 1} + x_{2, 9}k_{2, 2} +x_{2, 10}k_{2, 3} + \ x_{3, 8}k_{3, 1} + x_{3, 9}k_{3, 2} +x_{3, 10}k_{3, 3} + b \ end{matrix} \ \ begin{matrix} x_{8, 1}k_{1, 1} + x_{8, 2}k_{1, 2} +x_{8, 3}k_{1, 3} + \ x_{9, 1}k_{2, 1} + x_{9, 2}k_{2, 2} +x_{9, 3}k_{2, 3} + \ x_{10, 1}k_{3, 1} + x_{10, 2}k_{3, 2} +x_{10, 3}k_{3, 3} + b \ end{matrix} & begin{matrix} x_{8, 8}k_{1, 1} + x_{8, 9}k_{1, 2} +x_{8, 10}k_{1, 3} + \ x_{9, 8}k_{2, 1} + x_{9, 9}k_{2, 2} +x_{9, 10}k_{2, 3} + \ x_{10, 8}k_{3, 1} + x_{10, 9}k_{3, 2} +x_{10, 10}k_{3, 3} + b \ end{matrix} \ end{bmatrix}
[u1,1u2,1u1,2u2,2]=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎡x1,1k1,1+x1,2k1,2+x1,3k1,3+x2,1k2,1+x2,2k2,2+x2,3k2,3+x3,1k3,1+x3,2k3,2+x3,3k3,3+bx8,1k1,1+x8,2k1,2+x8,3k1,3+x9,1k2,1+x9,2k2,2+x9,3k2,3+x10,1k3,1+x10,2k3,2+x10,3k3,3+bx1,8k1,1+x1,9k1,2+x1,10k1,3+x2,8k2,1+x2,9k2,2+x2,10k2,3+x3,8k3,1+x3,9k3,2+x3,10k3,3+bx8,8k1,1+x8,9k1,2+x8,10k1,3+x9,8k2,1+x9,9k2,2+x9,10k2,3+x10,8k3,1+x10,9k3,2+x10,10k3,3+b⎦⎥⎥⎥⎥⎥⎥⎥⎥⎤
二、误差传递
和之前一样,为了方便计算,也为了方便观察,我们计算如下的表格,每一列表示的是一个特定的输出 ∂ u i , j partial u_{i, j} ∂ui,j,每一行表示的是一个特定的输入值 ∂ x p , k partial x_{p, k} ∂xp,k,行与列相交的地方表示的就是二者相除的结果,表示的是输出对于输入的偏导数,即 ∂ u i , j ∂ x p , k frac{partial u_{i, j}}{partial x_{p, k}} ∂xp,k∂ui,j。最后一列显示的是计算出的需要传递的误差的偏导数,具体计算方法和前面一样,在这里不再赘述:
∂ u 1 , 1 partial u_{1, 1} ∂u1,1 | ∂ u 1 , 2 partial u_{1, 2} ∂u1,2 | ∂ u 2 , 1 partial u_{2, 1} ∂u2,1 | ∂ u 2 , 2 partial u_{2, 2} ∂u2,2 | ∂ L ∂ x i , j frac{partial L}{partial x_{i, j}} ∂xi,j∂L | |
---|---|---|---|---|---|
∂ x 1 , 1 partial x_{1, 1} ∂x1,1 | k 1 , 1 k_{1, 1} k1,1 | 0 | 0 | 0 | ∂ L ∂ x 1 , 1 = δ 1 , 1 k 1 , 1 frac{partial L}{partial x_{1, 1}} = delta_{1, 1} k_{1, 1} ∂x1,1∂L=δ1,1k1,1 |
∂ x 1 , 2 partial x_{1, 2} ∂x1,2 | k 1 , 2 k_{1, 2} k1,2 | 0 | 0 | 0 | ∂ L ∂ x 1 , 2 = δ 1 , 1 k 1 , 2 frac{partial L}{partial x_{1, 2}} = delta_{1, 1} k_{1, 2} ∂x1,2∂L=δ1,1k1,2 |
∂ x 1 , 3 partial x_{1, 3} ∂x1,3 | k 1 , 3 k_{1, 3} k1,3 | 0 | 0 | 0 | ∂ L ∂ x 1 , 3 = δ 1 , 1 k 1 , 3 frac{partial L}{partial x_{1, 3}} = delta_{1, 1} k_{1, 3} ∂x1,3∂L=δ1,1k1,3 |
∂ x 1 , 8 partial x_{1, 8} ∂x1,8 | 0 | k 1 , 1 k_{1, 1} k1,1 | 0 | 0 | ∂ L ∂ x 1 , 8 = δ 1 , 2 k 1 , 1 frac{partial L}{partial x_{1, 8}} = delta_{1, 2}k_{1, 1} ∂x1,8∂L=δ1,2k1,1 |
∂ x 1 , 9 partial x_{1, 9} ∂x1,9 | 0 | k 1 , 2 k_{1, 2} k1,2 | 0 | 0 | ∂ L ∂ x 1 , 9 = δ 1 , 2 k 1 , 2 frac{partial L}{partial x_{1, 9}} = delta_{1, 2}k_{1, 2} ∂x1,9∂L=δ1,2k1,2 |
∂ x 1 , 10 partial x_{1, 10} ∂x1,10 | 0 | k 1 , 3 k_{1, 3} k1,3 | 0 | 0 | ∂ L ∂ x 1 , 10 = δ 1 , 2 k 1 , 3 frac{partial L}{partial x_{1, 10}} = delta_{1, 2}k_{1, 3} ∂x1,10∂L=δ1,2k1,3 |
∂ x 2 , 1 partial x_{2, 1} ∂x2,1 | k 2 , 1 k_{2, 1} k2,1 | 0 | 0 | 0 | ∂ L ∂ x 2 , 1 = δ 1 , 1 k 2 , 1 frac{partial L}{partial x_{2, 1}} = delta_{1, 1} k_{2, 1} ∂x2,1∂L=δ1,1k2,1 |
∂ x 2 , 2 partial x_{2, 2} ∂x2,2 | k 2 , 2 k_{2, 2} k2,2 | 0 | 0 | 0 | ∂ L ∂ x 2 , 2 = δ 1 , 1 k 2 , 2 frac{partial L}{partial x_{2, 2}} = delta_{1, 1} k_{2, 2} ∂x2,2∂L=δ1,1k2,2 |
∂ x 2 , 3 partial x_{2, 3} ∂x2,3 | k 2 , 3 k_{2, 3} k2,3 | 0 | 0 | 0 | ∂ L ∂ x 2 , 3 = δ 1 , 1 k 2 , 3 frac{partial L}{partial x_{2, 3}} = delta_{1, 1} k_{2, 3} ∂x2,3∂L=δ1,1k2,3 |
∂ x 2 , 8 partial x_{2, 8} ∂x2,8 | 0 | k 2 , 1 k_{2, 1} k2,1 | 0 | 0 | ∂ L ∂ x 2 , 8 = δ 1 , 2 k 2 , 1 frac{partial L}{partial x_{2, 8}} = delta_{1, 2}k_{2, 1} ∂x2,8∂L=δ1,2k2,1 |
∂ x 2 , 9 partial x_{2, 9} ∂x2,9 | 0 | k 2 , 2 k_{2, 2} k2,2 | 0 | 0 | ∂ L ∂ x 2 , 9 = δ 1 , 2 k 2 , 2 frac{partial L}{partial x_{2, 9}} = delta_{1, 2}k_{2, 2} ∂x2,9∂L=δ1,2k2,2 |
∂ x 2 , 10 partial x_{2, 10} ∂x2,10 | 0 | k 2 , 3 k_{2, 3} k2,3 | 0 | 0 | ∂ L ∂ x 2 , 10 = δ 1 , 2 k 2 , 3 frac{partial L}{partial x_{2, 10}} = delta_{1, 2}k_{2, 3} ∂x2,10∂L=δ1,2k2,3 |
∂ x 3 , 1 partial x_{3, 1} ∂x3,1 | k 3 , 1 k_{3, 1} k3,1 | 0 | 0 | 0 | ∂ L ∂ x 3 , 1 = δ 1 , 1 k 3 , 1 frac{partial L}{partial x_{3, 1}} = delta_{1, 1} k_{3, 1} ∂x3,1∂L=δ1,1k3,1 |
∂ x 3 , 2 partial x_{3, 2} ∂x3,2 | k 3 , 2 k_{3, 2} k3,2 | 0 | 0 | 0 | ∂ L ∂ x 3 , 2 = δ 1 , 1 k 3 , 2 frac{partial L}{partial x_{3, 2}} = delta_{1, 1} k_{3, 2} ∂x3,2∂L=δ1,1k3,2 |
∂ x 3 , 3 partial x_{3, 3} ∂x3,3 | k 3 , 3 k_{3, 3} k3,3 | 0 | 0 | 0 | ∂ L ∂ x 3 , 3 = δ 1 , 1 k 3 , 3 frac{partial L}{partial x_{3, 3}} = delta_{1, 1} k_{3, 3} ∂x3,3∂L=δ1,1k3,3 |
∂ x 3 , 8 partial x_{3, 8} ∂x3,8 | 0 | k 3 , 1 k_{3, 1} k3,1 | 0 | 0 | ∂ L ∂ x 3 , 8 = δ 1 , 2 k 3 , 1 frac{partial L}{partial x_{3, 8}} = delta_{1, 2}k_{3, 1} ∂x3,8∂L=δ1,2k3,1 |
∂ x 3 , 9 partial x_{3, 9} ∂x3,9 | 0 | k 3 , 2 k_{3, 2} k3,2 | 0 | 0 | ∂ L ∂ x 3 , 9 = δ 1 , 2 k 3 , 2 frac{partial L}{partial x_{3, 9}} = delta_{1, 2}k_{3, 2} ∂x3,9∂L=δ1,2k3,2 |
∂ x 3 , 10 partial x_{3, 10} ∂x3,10 | 0 | k 3 , 3 k_{3, 3} k3,3 | 0 | 0 | ∂ L ∂ x 3 , 10 = δ 1 , 2 k 3 , 3 frac{partial L}{partial x_{3, 10}} = delta_{1, 2}k_{3, 3} ∂x3,10∂L=δ1,2k3,3 |
∂ x 8 , 1 partial x_{8, 1} ∂x8,1 | 0 | 0 | k 1 , 1 k_{1, 1} k1,1 | 0 | ∂ L ∂ x 8 , 1 = δ 2 , 1 k 1 , 1 frac{partial L}{partial x_{8, 1}} = delta_{2, 1} k_{1, 1} ∂x8,1∂L=δ2,1k1,1 |
∂ x 8 , 2 partial x_{8, 2} ∂x8,2 | 0 | 0 | k 1 , 2 k_{1, 2} k1,2 | 0 | ∂ L ∂ x 8 , 2 = δ 2 , 1 k 1 , 2 frac{partial L}{partial x_{8, 2}} = delta_{2, 1} k_{1, 2} ∂x8,2∂L=δ2,1k1,2 |
∂ x 8 , 3 partial x_{8, 3} ∂x8,3 | 0 | 0 | k 1 , 3 k_{1, 3} k1,3 | 0 | ∂ L ∂ x 8 , 3 = δ 2 , 1 k 1 , 3 frac{partial L}{partial x_{8, 3}} = delta_{2, 1} k_{1, 3} ∂x8,3∂L=δ2,1k1,3 |
∂ x 8 , 8 partial x_{8, 8} ∂x8,8 | 0 | 0 | 0 | k 1 , 1 k_{1, 1} k1,1 | ∂ L ∂ x 8 , 8 = δ 2 , 2 k 1 , 1 frac{partial L}{partial x_{8, 8}} = delta_{2, 2}k_{1, 1} ∂x8,8∂L=δ2,2k1,1 |
∂ x 8 , 9 partial x_{8, 9} ∂x8,9 | 0 | 0 | 0 | k 1 , 2 k_{1, 2} k1,2 | ∂ L ∂ x 8 , 9 = δ 2 , 2 k 1 , 2 frac{partial L}{partial x_{8, 9}} = delta_{2, 2}k_{1, 2} ∂x8,9∂L=δ2,2k1,2 |
∂ x 8 , 10 partial x_{8, 10} ∂x8,10 | 0 | 0 | 0 | k 1 , 3 k_{1, 3} k1,3 | ∂ L ∂ x 8 , 10 = δ 2 , 2 k 1 , 3 frac{partial L}{partial x_{8, 10}} = delta_{2, 2}k_{1, 3} ∂x8,10∂L=δ2,2k1,3 |
∂ x 9 , 1 partial x_{9, 1} ∂x9,1 | 0 | 0 | k 2 , 1 k_{2, 1} k2,1 | 0 | ∂ L ∂ x 9 , 1 = δ 2 , 1 k 2 , 1 frac{partial L}{partial x_{9, 1}} = delta_{2, 1} k_{2, 1} ∂x9,1∂L=δ2,1k2,1 |
∂ x 9 , 2 partial x_{9, 2} ∂x9,2 | 0 | 0 | k 2 , 2 k_{2, 2} k2,2 | 0 | ∂ L ∂ x 9 , 2 = δ 2 , 1 k 2 , 2 frac{partial L}{partial x_{9, 2}} = delta_{2, 1} k_{2, 2} ∂x9,2∂L=δ2,1k2,2 |
∂ x 9 , 3 partial x_{9, 3} ∂x9,3 | 0 | 0 | k 2 , 3 k_{2, 3} k2,3 | 0 | ∂ L ∂ x 9 , 3 = δ 2 , 1 k 2 , 3 frac{partial L}{partial x_{9, 3}} = delta_{2, 1} k_{2, 3} ∂x9,3∂L=δ2,1k2,3 |
∂ x 9 , 8 partial x_{9, 8} ∂x9,8 | 0 | 0 | 0 | k 2 , 1 k_{2, 1} k2,1 | ∂ L ∂ x 9 , 8 = δ 2 , 2 k 2 , 1 frac{partial L}{partial x_{9, 8}} = delta_{2, 2}k_{2, 1} ∂x9,8∂L=δ2,2k2,1 |
∂ x 9 , 9 partial x_{9, 9} ∂x9,9 | 0 | 0 | 0 | k 2 , 2 k_{2, 2} k2,2 | ∂ L ∂ x 9 , 9 = δ 2 , 2 k 2 , 2 frac{partial L}{partial x_{9, 9}} = delta_{2, 2}k_{2, 2} ∂x9,9∂L=δ2,2k2,2 |
∂ x 9 , 10 partial x_{9, 10} ∂x9,10 | 0 | 0 | 0 | k 2 , 3 k_{2, 3} k2,3 | ∂ L ∂ x 9 , 10 = δ 2 , 2 k 2 , 3 frac{partial L}{partial x_{9, 10}} = delta_{2, 2}k_{2, 3} ∂x9,10∂L=δ2,2k2,3 |
∂ x 10 , 1 partial x_{10, 1} ∂x10,1 | 0 | 0 | k 3 , 1 k_{3, 1} k3,1 | 0 | ∂ L ∂ x 10 , 1 = δ 2 , 1 k 3 , 1 frac{partial L}{partial x_{10, 1}} = delta_{2, 1} k_{3, 1} ∂x10,1∂L=δ2,1k3,1 |
∂ x 10 , 2 partial x_{10, 2} ∂x10,2 | 0 | 0 | k 3 , 2 k_{3, 2} k3,2 | 0 | ∂ L ∂ x 10 , 2 = δ 2 , 1 k 3 , 2 frac{partial L}{partial x_{10, 2}} = delta_{2, 1} k_{3, 2} ∂x10,2∂L=δ2,1k3,2 |
∂ x 10 , 3 partial x_{10, 3} ∂x10,3 | 0 | 0 | k 3 , 3 k_{3, 3} k3,3 | 0 | ∂ L ∂ x 10 , 3 = δ 2 , 1 k 3 , 3 frac{partial L}{partial x_{10, 3}} = delta_{2, 1} k_{3, 3} ∂x10,3∂L=δ2,1k3,3 |
∂ x 10 , 8 partial x_{10, 8} ∂x10,8 | 0 | 0 | 0 | k 3 , 1 k_{3, 1} k3,1 | ∂ L ∂ x 10 , 8 = δ 2 , 2 k 3 , 1 frac{partial L}{partial x_{10, 8}} = delta_{2, 2}k_{3, 1} ∂x10,8∂L=δ2,2k3,1 |
∂ x 10 , 9 partial x_{10, 9} ∂x10,9 | 0 | 0 | 0 | k 3 , 2 k_{3, 2} k3,2 | ∂ L ∂ x 10 , 9 = δ 2 , 2 k 3 , 2 frac{partial L}{partial x_{10, 9}} = delta_{2, 2}k_{3, 2} ∂x10,9∂L=δ2,2k3,2 |
∂ x 10 , 10 partial x_{10, 10} ∂x10,10 | 0 | 0 | 0 | k 3 , 3 k_{3, 3} k3,3 | ∂ L ∂ x 10 , 10 = δ 2 , 2 k 3 , 3 frac{partial L}{partial x_{10, 10}} = delta_{2, 2}k_{3, 3} ∂x10,10∂L=δ2,2k3,3 |
e l s e else else | 0 | 0 | 0 | 0 | 0 |
可以看出,无论是何种卷积方式,数据都是十分有规律地进行分布。
我们假设后面传递过来的误差是
δ
delta
δ ,即:
δ
=
[
δ
1
,
1
δ
1
,
2
δ
2
,
1
δ
2
,
2
]
delta = begin{bmatrix} delta_{1, 1} & delta_{1, 2} \ delta_{2, 1} & delta_{2, 2} \ end{bmatrix}
δ=[δ1,1δ2,1δ1,2δ2,2]
其中,
δ
i
,
j
=
∂
L
∂
u
i
,
j
delta_{i, j} = frac{partial L}{partial u_{i, j}}
δi,j=∂ui,j∂L,误差分别对应于每一个输出项。这里的
L
L
L表示的是最后的Loss损失。我们的目的就是希望这个损失尽可能小。
根据前面的方法,我们先要求应该传递给下一层的误差。所以第一步,我们先在接受来的误差矩阵中插入合适数目的0,由于这里前向卷积采用的步长stride是7,所以接收到误差矩阵中的每个元素之间应该插入(7 - 1 = 6)个0,即:
[
δ
1
,
1
0
0
0
0
0
0
δ
1
,
2
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
δ
2
,
1
0
0
0
0
0
0
δ
2
,
2
]
begin{bmatrix} delta_{1, 1} & 0 & 0 & 0 & 0 & 0 & 0 & delta_{1, 2} \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \ delta_{2, 1} & 0 & 0 & 0 & 0 & 0 & 0 & delta_{2, 2} \ end{bmatrix}
⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡δ1,1000000δ2,1000000000000000000000000000000000000000000000000δ1,2000000δ2,2⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤
接着,由于我们采用的卷积核的大小是3x3,所有,我们依然需要在上面矩阵的外围补上(3 - 1 = 2)层0,即:
[
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
δ
1
,
1
0
0
0
0
0
0
δ
1
,
2
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
δ
2
,
1
0
0
0
0
0
0
δ
2
,
2
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
]
begin{bmatrix} 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ 0 & 0 & delta_{1, 1} & 0 & 0 & 0 & 0 & 0 & 0 & delta_{1, 2} & 0 & 0 \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ 0 & 0 & delta_{2, 1} & 0 & 0 & 0 & 0 & 0 & 0 & delta_{2, 2} & 0 & 0 \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ end{bmatrix}
⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡00000000000000000000000000δ1,1000000δ2,10000000000000000000000000000000000000000000000000000000000000000000000000000δ1,2000000δ2,200000000000000000000000000⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤
下一步就是将正向卷积的卷积核旋转180°,即:
[
k
3
,
3
k
3
,
2
k
3
,
1
k
2
,
3
k
2
,
2
k
2
,
1
k
1
,
3
k
1
,
2
k
1
,
1
]
begin{bmatrix} k_{3, 3} & k_{3, 2} & k_{3, 1} \ k_{2, 3} & k_{2, 2} & k_{2, 1} \ k_{1, 3} & k_{1, 2} & k_{1, 1} \ end{bmatrix}
⎣⎡k3,3k2,3k1,3k3,2k2,2k1,2k3,1k2,1k1,1⎦⎤
最后一步就是将上面的误差矩阵和旋转后的卷积核进行步长为1的卷积,即:
[
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
δ
1
,
1
0
0
0
0
0
0
δ
1
,
2
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
δ
2
,
1
0
0
0
0
0
0
δ
2
,
2
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
]
  
c
o
n
v
(
s
t
r
i
d
e
=
1
)
  
[
k
3
,
3
k
3
,
2
k
3
,
1
k
2
,
3
k
2
,
2
k
2
,
1
k
1
,
3
k
1
,
2
k
1
,
1
]
=
[
δ
1
,
1
k
1
,
1
δ
1
,
1
k
1
,
2
δ
1
,
1
k
1
,
3
0
0
0
0
δ
1
,
2
k
1
,
1
δ
1
,
2
k
1
,
2
δ
1
,
2
k
1
,
3
δ
1
,
1
k
2
,
1
δ
1
,
1
k
2
,
2
δ
1
,
1
k
2
,
3
0
0
0
0
δ
1
,
2
k
2
,
1
δ
1
,
2
k
2
,
2
δ
1
,
2
k
2
,
3
δ
1
,
1
k
3
,
1
δ
1
,
1
k
3
,
2
δ
1
,
1
k
3
,
3
0
0
0
0
δ
1
,
2
k
3
,
1
δ
1
,
2
k
3
,
2
δ
1
,
2
k
3
,
3
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
δ
2
,
1
k
1
,
1
δ
2
,
1
k
1
,
2
δ
2
,
1
k
1
,
3
0
0
0
0
δ
2
,
2
k
1
,
1
δ
2
,
2
k
1
,
2
δ
2
,
2
k
1
,
3
δ
2
,
1
k
2
,
1
δ
2
,
1
k
2
,
2
δ
2
,
1
k
2
,
3
0
0
0
0
δ
2
,
2
k
2
,
1
δ
2
,
2
k
2
,
2
δ
2
,
2
k
2
,
3
δ
2
,
1
k
3
,
1
δ
2
,
1
k
3
,
2
δ
2
,
1
k
3
,
3
0
0
0
0
δ
2
,
2
k
3
,
1
δ
2
,
2
k
3
,
2
δ
2
,
2
k
3
,
3
]
=
[
∂
L
∂
x
1
,
1
∂
L
∂
x
1
,
2
∂
L
∂
x
1
,
3
∂
L
∂
x
1
,
4
∂
L
∂
x
1
,
5
∂
L
∂
x
1
,
6
∂
L
∂
x
1
,
7
∂
L
∂
x
1
,
8
∂
L
∂
x
1
,
9
∂
L
∂
x
1
,
10
∂
L
∂
x
2
,
1
∂
L
∂
x
2
,
2
∂
L
∂
x
2
,
3
∂
L
∂
x
2
,
4
∂
L
∂
x
2
,
5
∂
L
∂
x
2
,
6
∂
L
∂
x
2
,
7
∂
L
∂
x
2
,
8
∂
L
∂
x
2
,
9
∂
L
∂
x
2
,
10
∂
L
∂
x
3
,
1
∂
L
∂
x
3
,
2
∂
L
∂
x
3
,
3
∂
L
∂
x
3
,
4
∂
L
∂
x
3
,
5
∂
L
∂
x
3
,
6
∂
L
∂
x
3
,
7
∂
L
∂
x
3
,
8
∂
L
∂
x
3
,
9
∂
L
∂
x
3
,
10
∂
L
∂
x
4
,
1
∂
L
∂
x
4
,
2
∂
L
∂
x
4
,
3
∂
L
∂
x
4
,
4
∂
L
∂
x
4
,
5
∂
L
∂
x
4
,
6
∂
L
∂
x
4
,
7
∂
L
∂
x
4
,
8
∂
L
∂
x
4
,
9
∂
L
∂
x
4
,
10
∂
L
∂
x
5
,
1
∂
L
∂
x
5
,
2
∂
L
∂
x
5
,
3
∂
L
∂
x
5
,
4
∂
L
∂
x
5
,
5
∂
L
∂
x
5
,
6
∂
L
∂
x
5
,
7
∂
L
∂
x
5
,
8
∂
L
∂
x
5
,
9
∂
L
∂
x
5
,
10
∂
L
∂
x
6
,
1
∂
L
∂
x
6
,
2
∂
L
∂
x
6
,
3
∂
L
∂
x
6
,
4
∂
L
∂
x
6
,
5
∂
L
∂
x
6
,
6
∂
L
∂
x
6
,
7
∂
L
∂
x
6
,
8
∂
L
∂
x
6
,
9
∂
L
∂
x
6
,
10
∂
L
∂
x
7
,
1
∂
L
∂
x
7
,
2
∂
L
∂
x
7
,
3
∂
L
∂
x
7
,
4
∂
L
∂
x
7
,
5
∂
L
∂
x
7
,
6
∂
L
∂
x
7
,
7
∂
L
∂
x
7
,
8
∂
L
∂
x
7
,
9
∂
L
∂
x
7
,
10
∂
L
∂
x
8
,
1
∂
L
∂
x
8
,
2
∂
L
∂
x
8
,
3
∂
L
∂
x
8
,
4
∂
L
∂
x
8
,
5
∂
L
∂
x
8
,
6
∂
L
∂
x
8
,
7
∂
L
∂
x
8
,
8
∂
L
∂
x
8
,
9
∂
L
∂
x
8
,
10
∂
L
∂
x
9
,
1
∂
L
∂
x
9
,
2
∂
L
∂
x
9
,
3
∂
L
∂
x
9
,
4
∂
L
∂
x
9
,
5
∂
L
∂
x
9
,
6
∂
L
∂
x
9
,
7
∂
L
∂
x
9
,
8
∂
L
∂
x
9
,
9
∂
L
∂
x
9
,
10
∂
L
∂
x
10
,
1
∂
L
∂
x
10
,
2
∂
L
∂
x
10
,
3
∂
L
∂
x
10
,
4
∂
L
∂
x
10
,
5
∂
L
∂
x
10
,
6
∂
L
∂
x
10
,
7
∂
L
∂
x
10
,
8
∂
L
∂
x
10
,
9
∂
L
∂
x
10
,
10
]
begin{bmatrix} 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ 0 & 0 & delta_{1, 1} & 0 & 0 & 0 & 0 & 0 & 0 & delta_{1, 2} & 0 & 0 \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ 0 & 0 & delta_{2, 1} & 0 & 0 & 0 & 0 & 0 & 0 & delta_{2, 2} & 0 & 0 \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ end{bmatrix} ; conv(stride = 1); begin{bmatrix} k_{3, 3} & k_{3, 2} & k_{3, 1} \ k_{2, 3} & k_{2, 2} & k_{2, 1} \ k_{1, 3} & k_{1, 2} & k_{1, 1} \ end{bmatrix} = \ begin{bmatrix} delta_{1, 1} k_{1, 1} & delta_{1, 1} k_{1, 2} & delta_{1, 1} k_{1, 3} & 0 & 0 & 0 & 0 & delta_{1, 2}k_{1, 1} & delta_{1, 2}k_{1, 2} & delta_{1, 2}k_{1, 3} \ delta_{1, 1} k_{2, 1} & delta_{1, 1} k_{2, 2} & delta_{1, 1} k_{2, 3} & 0 & 0 & 0 & 0 & delta_{1, 2}k_{2, 1} & delta_{1, 2}k_{2, 2} & delta_{1, 2}k_{2, 3} \ delta_{1, 1} k_{3, 1} & delta_{1, 1} k_{3, 2} & delta_{1, 1} k_{3, 3} & 0 & 0 & 0 & 0 & delta_{1, 2}k_{3, 1} & delta_{1, 2}k_{3, 2} & delta_{1, 2}k_{3, 3} \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ delta_{2, 1} k_{1, 1} & delta_{2, 1} k_{1, 2} & delta_{2, 1} k_{1, 3} & 0 & 0 & 0 & 0 & delta_{2, 2}k_{1, 1} & delta_{2, 2}k_{1, 2} & delta_{2, 2}k_{1, 3} \ delta_{2, 1} k_{2, 1} & delta_{2, 1} k_{2, 2} & delta_{2, 1} k_{2, 3} & 0 & 0 & 0 & 0 & delta_{2, 2}k_{2, 1} & delta_{2, 2}k_{2, 2} & delta_{2, 2}k_{2, 3} \ delta_{2, 1} k_{3, 1} & delta_{2, 1} k_{3, 2} & delta_{2, 1} k_{3, 3} & 0 & 0 & 0 & 0 & delta_{2, 2}k_{3, 1} & delta_{2, 2}k_{3, 2} & delta_{2, 2}k_{3, 3} \ end{bmatrix} = \ begin{bmatrix} frac{partial L}{partial x_{1, 1}} & frac{partial L}{partial x_{1, 2}} & frac{partial L}{partial x_{1, 3}} & frac{partial L}{partial x_{1, 4}} & frac{partial L}{partial x_{1, 5}} & frac{partial L}{partial x_{1, 6}} & frac{partial L}{partial x_{1, 7}} & frac{partial L}{partial x_{1, 8}} & frac{partial L}{partial x_{1, 9}} & frac{partial L}{partial x_{1, 10}} \ frac{partial L}{partial x_{2, 1}} & frac{partial L}{partial x_{2, 2}} & frac{partial L}{partial x_{2, 3}} & frac{partial L}{partial x_{2, 4}} & frac{partial L}{partial x_{2, 5}} & frac{partial L}{partial x_{2, 6}} & frac{partial L}{partial x_{2, 7}} & frac{partial L}{partial x_{2, 8}} & frac{partial L}{partial x_{2, 9}} & frac{partial L}{partial x_{2, 10}} \ frac{partial L}{partial x_{3, 1}} & frac{partial L}{partial x_{3, 2}} & frac{partial L}{partial x_{3, 3}} & frac{partial L}{partial x_{3, 4}} & frac{partial L}{partial x_{3, 5}} & frac{partial L}{partial x_{3, 6}} & frac{partial L}{partial x_{3, 7}} & frac{partial L}{partial x_{3, 8}} & frac{partial L}{partial x_{3, 9}} & frac{partial L}{partial x_{3, 10}} \ frac{partial L}{partial x_{4, 1}} & frac{partial L}{partial x_{4, 2}} & frac{partial L}{partial x_{4, 3}} & frac{partial L}{partial x_{4, 4}} & frac{partial L}{partial x_{4, 5}} & frac{partial L}{partial x_{4, 6}} & frac{partial L}{partial x_{4, 7}} & frac{partial L}{partial x_{4, 8}} & frac{partial L}{partial x_{4, 9}} & frac{partial L}{partial x_{4, 10}} \ frac{partial L}{partial x_{5, 1}} & frac{partial L}{partial x_{5, 2}} & frac{partial L}{partial x_{5, 3}} & frac{partial L}{partial x_{5, 4}} & frac{partial L}{partial x_{5, 5}} & frac{partial L}{partial x_{5, 6}} & frac{partial L}{partial x_{5, 7}} & frac{partial L}{partial x_{5, 8}} & frac{partial L}{partial x_{5, 9}} & frac{partial L}{partial x_{5, 10}} \ frac{partial L}{partial x_{6, 1}} & frac{partial L}{partial x_{6, 2}} & frac{partial L}{partial x_{6, 3}} & frac{partial L}{partial x_{6, 4}} & frac{partial L}{partial x_{6, 5}} & frac{partial L}{partial x_{6, 6}} & frac{partial L}{partial x_{6, 7}} & frac{partial L}{partial x_{6, 8}} & frac{partial L}{partial x_{6, 9}} & frac{partial L}{partial x_{6, 10}} \ frac{partial L}{partial x_{7, 1}} & frac{partial L}{partial x_{7, 2}} & frac{partial L}{partial x_{7, 3}} & frac{partial L}{partial x_{7, 4}} & frac{partial L}{partial x_{7, 5}} & frac{partial L}{partial x_{7, 6}} & frac{partial L}{partial x_{7, 7}} & frac{partial L}{partial x_{7, 8}} & frac{partial L}{partial x_{7, 9}} & frac{partial L}{partial x_{7, 10}} \ frac{partial L}{partial x_{8, 1}} & frac{partial L}{partial x_{8, 2}} & frac{partial L}{partial x_{8, 3}} & frac{partial L}{partial x_{8, 4}} & frac{partial L}{partial x_{8, 5}} & frac{partial L}{partial x_{8, 6}} & frac{partial L}{partial x_{8, 7}} & frac{partial L}{partial x_{8, 8}} & frac{partial L}{partial x_{8, 9}} & frac{partial L}{partial x_{8, 10}} \ frac{partial L}{partial x_{9, 1}} & frac{partial L}{partial x_{9, 2}} & frac{partial L}{partial x_{9, 3}} & frac{partial L}{partial x_{9, 4}} & frac{partial L}{partial x_{9, 5}} & frac{partial L}{partial x_{9, 6}} & frac{partial L}{partial x_{9, 7}} & frac{partial L}{partial x_{9, 8}} & frac{partial L}{partial x_{9, 9}} & frac{partial L}{partial x_{9, 10}} \ frac{partial L}{partial x_{10, 1}} & frac{partial L}{partial x_{10, 2}} & frac{partial L}{partial x_{10, 3}} & frac{partial L}{partial x_{10, 4}} & frac{partial L}{partial x_{10, 5}} & frac{partial L}{partial x_{10, 6}} & frac{partial L}{partial x_{10, 7}} & frac{partial L}{partial x_{10, 8}} & frac{partial L}{partial x_{10, 9}} & frac{partial L}{partial x_{10, 10}} \ end{bmatrix}
⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡00000000000000000000000000δ1,1000000δ2,10000000000000000000000000000000000000000000000000000000000000000000000000000δ1,2000000δ2,200000000000000000000000000⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤conv(stride=1)⎣⎡k3,3k2,3k1,3k3,2k2,2k1,2k3,1k2,1k1,1⎦⎤=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡δ1,1k1,1δ1,1k2,1δ1,1k3,10000δ2,1k1,1δ2,1k2,1δ2,1k3,1δ1,1k1,2δ1,1k2,2δ1,1k3,20000δ2,1k1,2δ2,1k2,2δ2,1k3,2δ1,1k1,3δ1,1k2,3δ1,1k3,30000δ2,1k1,3δ2,1k2,3δ2,1k3,30000000000000000000000000000000000000000δ1,2k1,1δ1,2k2,1δ1,2k3,10000δ2,2k1,1δ2,2k2,1δ2,2k3,1δ1,2k1,2δ1,2k2,2δ1,2k3,20000δ2,2k1,2δ2,2k2,2δ2,2k3,2δ1,2k1,3δ1,2k2,3δ1,2k3,30000δ2,2k1,3δ2,2k2,3δ2,2k3,3⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡∂x1,1∂L∂x2,1∂L∂x3,1∂L∂x4,1∂L∂x5,1∂L∂x6,1∂L∂x7,1∂L∂x8,1∂L∂x9,1∂L∂x10,1∂L∂x1,2∂L∂x2,2∂L∂x3,2∂L∂x4,2∂L∂x5,2∂L∂x6,2∂L∂x7,2∂L∂x8,2∂L∂x9,2∂L∂x10,2∂L∂x1,3∂L∂x2,3∂L∂x3,3∂L∂x4,3∂L∂x5,3∂L∂x6,3∂L∂x7,3∂L∂x8,3∂L∂x9,3∂L∂x10,3∂L∂x1,4∂L∂x2,4∂L∂x3,4∂L∂x4,4∂L∂x5,4∂L∂x6,4∂L∂x7,4∂L∂x8,4∂L∂x9,4∂L∂x10,4∂L∂x1,5∂L∂x2,5∂L∂x3,5∂L∂x4,5∂L∂x5,5∂L∂x6,5∂L∂x7,5∂L∂x8,5∂L∂x9,5∂L∂x10,5∂L∂x1,6∂L∂x2,6∂L∂x3,6∂L∂x4,6∂L∂x5,6∂L∂x6,6∂L∂x7,6∂L∂x8,6∂L∂x9,6∂L∂x10,6∂L∂x1,7∂L∂x2,7∂L∂x3,7∂L∂x4,7∂L∂x5,7∂L∂x6,7∂L∂x7,7∂L∂x8,7∂L∂x9,7∂L∂x10,7∂L∂x1,8∂L∂x2,8∂L∂x3,8∂L∂x4,8∂L∂x5,8∂L∂x6,8∂L∂x7,8∂L∂x8,8∂L∂x9,8∂L∂x10,8∂L∂x1,9∂L∂x2,9∂L∂x3,9∂L∂x4,9∂L∂x5,9∂L∂x6,9∂L∂x7,9∂L∂x8,9∂L∂x9,9∂L∂x10,9∂L∂x1,10∂L∂x2,10∂L∂x3,10∂L∂x4,10∂L∂x5,10∂L∂x6,10∂L∂x7,10∂L∂x8,10∂L∂x9,10∂L∂x10,10∂L⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤
经过上面的计算,在误差传递上,我们的算法可以正确运行,即使步长stride是一个任意的数字。接下来我们来验证更新梯度的计算。
三、更新梯度
和前面的定义一样,假设我们在这一阶段接收到的后方传递过来的误差为
δ
delta
δ, ,即:
δ
=
[
δ
1
,
1
δ
1
,
2
δ
2
,
1
δ
2
,
2
]
delta = begin{bmatrix} delta_{1, 1} & delta_{1, 2} \ delta_{2, 1} & delta_{2, 2} \ end{bmatrix}
δ=[δ1,1δ2,1δ1,2δ2,2]
那么根据偏导数求解的链式法则,我们可以计算出所有的需要的偏导数,这里的计算过程和前面的计算过程是一样的,这里不再赘述。汇总如下:
∂
L
∂
k
1
,
1
=
x
1
,
1
δ
1
,
1
+
x
1
,
8
δ
1
,
2
+
x
8
,
1
δ
2
,
1
+
x
8
,
8
δ
2
,
2
frac{partial L}{partial k_{1, 1}} = x_{1, 1}delta_{1, 1} + x_{1, 8}delta_{1, 2} + x_{8, 1}delta_{2, 1} + x_{8, 8}delta_{2, 2}
∂k1,1∂L=x1,1δ1,1+x1,8δ1,2+x8,1δ2,1+x8,8δ2,2
∂
L
∂
k
1
,
2
=
x
1
,
2
δ
1
,
1
+
x
1
,
9
δ
1
,
2
+
x
8
,
2
δ
2
,
1
+
x
8
,
9
δ
2
,
2
frac{partial L}{partial k_{1, 2}} = x_{1, 2}delta_{1, 1} + x_{1, 9}delta_{1, 2} + x_{8, 2}delta_{2, 1} + x_{8, 9}delta_{2, 2}
∂k1,2∂L=x1,2δ1,1+x1,9δ1,2+x8,2δ2,1+x8,9δ2,2
∂
L
∂
k
1
,
3
=
x
1
,
3
δ
1
,
1
+
x
1
,
10
δ
1
,
2
+
x
8
,
3
δ
2
,
1
+
x
8
,
10
δ
2
,
2
frac{partial L}{partial k_{1, 3}} = x_{1, 3}delta_{1, 1} + x_{1, 10}delta_{1, 2} + x_{8, 3}delta_{2, 1} + x_{8, 10}delta_{2, 2}
∂k1,3∂L=x1,3δ1,1+x1,10δ1,2+x8,3δ2,1+x8,10δ2,2
∂
L
∂
k
2
,
1
=
x
2
,
1
δ
1
,
1
+
x
2
,
8
δ
1
,
2
+
x
9
,
1
δ
2
,
1
+
x
9
,
8
δ
2
,
2
frac{partial L}{partial k_{2, 1}} = x_{2, 1}delta_{1, 1} + x_{2, 8}delta_{1, 2} + x_{9, 1}delta_{2, 1} + x_{9, 8}delta_{2, 2}
∂k2,1∂L=x2,1δ1,1+x2,8δ1,2+x9,1δ2,1+x9,8δ2,2
∂
L
∂
k
2
,
2
=
x
2
,
2
δ
1
,
1
+
x
2
,
9
δ
1
,
2
+
x
9
,
2
δ
2
,
1
+
x
9
,
9
δ
2
,
2
frac{partial L}{partial k_{2, 2}} = x_{2, 2}delta_{1, 1} + x_{2, 9}delta_{1, 2} + x_{9, 2}delta_{2, 1} + x_{9, 9}delta_{2, 2}
∂k2,2∂L=x2,2δ1,1+x2,9δ1,2+x9,2δ2,1+x9,9δ2,2
∂
L
∂
k
2
,
3
=
x
2
,
3
δ
1
,
1
+
x
2
,
10
δ
1
,
2
+
x
9
,
3
δ
2
,
1
+
x
9
,
10
δ
2
,
2
frac{partial L}{partial k_{2, 3}} = x_{2, 3}delta_{1, 1} + x_{2, 10}delta_{1, 2} + x_{9, 3}delta_{2, 1} + x_{9, 10}delta_{2, 2}
∂k2,3∂L=x2,3δ1,1+x2,10δ1,2+x9,3δ2,1+x9,10δ2,2
∂
L
∂
k
3
,
1
=
x
3
,
1
δ
1
,
1
+
x
3
,
8
δ
1
,
2
+
x
10
,
1
δ
2
,
1
+
x
10
,
8
δ
2
,
2
frac{partial L}{partial k_{3, 1}} = x_{3, 1}delta_{1, 1} + x_{3, 8}delta_{1, 2} + x_{10, 1}delta_{2, 1} + x_{10, 8}delta_{2, 2}
∂k3,1∂L=x3,1δ1,1+x3,8δ1,2+x10,1δ2,1+x10,8δ2,2
∂
L
∂
k
3
,
2
=
x
3
,
2
δ
1
,
1
+
x
3
,
9
δ
1
,
2
+
x
10
,
2
δ
2
,
1
+
x
10
,
9
δ
2
,
2
frac{partial L}{partial k_{3, 2}} = x_{3, 2}delta_{1, 1} + x_{3, 9}delta_{1, 2} + x_{10, 2}delta_{2, 1} + x_{10, 9}delta_{2, 2}
∂k3,2∂L=x3,2δ1,1+x3,9δ1,2+x10,2δ2,1+x10,9δ2,2
∂
L
∂
k
3
,
3
=
x
3
,
3
δ
1
,
1
+
x
3
,
10
δ
1
,
2
+
x
10
,
3
δ
2
,
1
+
x
10
,
10
δ
2
,
2
frac{partial L}{partial k_{3, 3}} = x_{3, 3}delta_{1, 1} + x_{3, 10}delta_{1, 2} + x_{10, 3}delta_{2, 1} + x_{10, 10}delta_{2, 2}
∂k3,3∂L=x3,3δ1,1+x3,10δ1,2+x10,3δ2,1+x10,10δ2,2
∂ L ∂ b = δ 1 , 1 + δ 1 , 2 + δ 2 , 1 + δ 2 , 2 frac{partial L}{partial b} = delta_{1, 1} + delta_{1, 2} + delta_{2, 1} + delta_{2, 2} ∂b∂L=δ1,1+δ1,2+δ2,1+δ2,2
按照之前的算法,由于正向卷积中的步长stride为7,因此,在计算更新梯度的过程中,我们依然需要在接收到的误差矩阵的每两个相邻的元素之间插入(7 - 1 = 6)个0,即:
[
δ
1
,
1
0
0
0
0
0
0
δ
1
,
2
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
δ
2
,
1
0
0
0
0
0
0
δ
2
,
2
]
begin{bmatrix} delta_{1, 1} & 0 & 0 & 0 & 0 & 0 & 0 & delta_{1, 2} \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \ delta_{2, 1} & 0 & 0 & 0 & 0 & 0 & 0 & delta_{2, 2} \ end{bmatrix}
⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡δ1,1000000δ2,1000000000000000000000000000000000000000000000000δ1,2000000δ2,2⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤
接着我们拿输入矩阵
x
x
x和上面的矩阵进行步长为1的卷积,则可以得到卷积核参数的更新梯度。即:
[
∂
L
∂
k
1
,
1
∂
L
∂
k
1
,
2
∂
L
∂
k
1
,
3
∂
L
∂
k
2
,
1
∂
L
∂
k
2
,
2
∂
L
∂
k
2
,
3
∂
L
∂
k
3
,
1
∂
L
∂
k
3
,
2
∂
L
∂
k
3
,
3
]
=
[
x
1
,
1
x
1
,
2
x
1
,
3
x
1
,
4
x
1
,
5
x
1
,
6
x
1
,
7
x
1
,
8
x
1
,
9
x
1
,
10
x
2
,
1
x
2
,
2
x
2
,
3
x
2
,
4
x
2
,
5
x
2
,
6
x
2
,
7
x
2
,
8
x
2
,
9
x
2
,
10
x
3
,
1
x
3
,
2
x
3
,
3
x
3
,
4
x
3
,
5
x
3
,
6
x
3
,
7
x
3
,
8
x
3
,
9
x
3
,
10
x
4
,
1
x
4
,
2
x
4
,
3
x
4
,
4
x
4
,
5
x
4
,
6
x
4
,
7
x
4
,
8
x
4
,
9
x
4
,
10
x
5
,
1
x
5
,
2
x
5
,
3
x
5
,
4
x
5
,
5
x
5
,
6
x
5
,
7
x
5
,
8
x
5
,
9
x
5
,
10
x
6
,
1
x
6
,
2
x
6
,
3
x
6
,
4
x
6
,
5
x
6
,
6
x
6
,
7
x
6
,
8
x
6
,
9
x
6
,
10
x
7
,
1
x
7
,
2
x
7
,
3
x
7
,
4
x
7
,
5
x
7
,
6
x
7
,
7
x
7
,
8
x
7
,
9
x
7
,
10
x
8
,
1
x
8
,
2
x
8
,
3
x
8
,
4
x
8
,
5
x
8
,
6
x
8
,
7
x
8
,
8
x
8
,
9
x
8
,
10
x
9
,
1
x
9
,
2
x
9
,
3
x
9
,
4
x
9
,
5
x
9
,
6
x
9
,
7
x
9
,
8
x
9
,
9
x
9
,
10
x
10
,
1
x
10
,
2
x
10
,
3
x
10
,
4
x
10
,
5
x
10
,
6
x
10
,
7
x
10
,
8
x
10
,
9
x
10
,
10
]
  
c
o
n
v
(
s
t
r
i
d
e
=
1
)
  
[
δ
1
,
1
0
0
0
0
0
0
δ
1
,
2
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
δ
2
,
1
0
0
0
0
0
0
δ
2
,
2
]
begin{bmatrix} frac{partial L}{partial k_{1, 1}} & frac{partial L}{partial k_{1, 2}} & frac{partial L}{partial k_{1, 3}} \ frac{partial L}{partial k_{2, 1}} & frac{partial L}{partial k_{2, 2}} & frac{partial L}{partial k_{2, 3}} \ frac{partial L}{partial k_{3, 1}} & frac{partial L}{partial k_{3, 2}} & frac{partial L}{partial k_{3, 3}} \ end{bmatrix} = \ begin{bmatrix} x_{1, 1} & x_{1, 2} & x_{1, 3} &x_{1, 4} &x_{1, 5} & x_{1, 6} & x_{1, 7} & x_{1, 8} &x_{1, 9} &x_{1, 10} \ x_{2, 1} & x_{2, 2} & x_{2, 3} &x_{2, 4} &x_{2, 5} & x_{2, 6} & x_{2, 7} & x_{2, 8} &x_{2, 9} &x_{2, 10} \ x_{3, 1} & x_{3, 2} & x_{3, 3} &x_{3, 4} &x_{3, 5} & x_{3, 6} & x_{3, 7} & x_{3, 8} &x_{3, 9} &x_{3, 10} \ x_{4, 1} & x_{4, 2} & x_{4, 3} &x_{4, 4} &x_{4, 5} & x_{4, 6} & x_{4, 7} & x_{4, 8} &x_{4, 9} &x_{4, 10} \ x_{5, 1} & x_{5, 2} & x_{5, 3} &x_{5, 4} &x_{5, 5} & x_{5, 6} & x_{5, 7} & x_{5, 8} &x_{5, 9} &x_{5, 10} \ x_{6, 1} & x_{6, 2} & x_{6, 3} &x_{6, 4} &x_{6, 5} & x_{6, 6} & x_{6, 7} & x_{6, 8} &x_{6, 9} &x_{6, 10} \ x_{7, 1} & x_{7, 2} & x_{7, 3} &x_{7, 4} &x_{7, 5} & x_{7, 6} & x_{7, 7} & x_{7, 8} &x_{7, 9} &x_{7, 10} \ x_{8, 1} & x_{8, 2} & x_{8, 3} &x_{8, 4} &x_{8, 5} & x_{8, 6} & x_{8, 7} & x_{8, 8} &x_{8, 9} &x_{8, 10} \ x_{9, 1} & x_{9, 2} & x_{9, 3} &x_{9, 4} &x_{9, 5} & x_{9, 6} & x_{9, 7} & x_{9, 8} &x_{9, 9} &x_{9, 10} \ x_{10, 1} & x_{10, 2} & x_{10, 3} &x_{10, 4} &x_{10, 5} & x_{10, 6} & x_{10, 7} & x_{10, 8} &x_{10, 9} &x_{10, 10} \ end{bmatrix} ; conv(stride = 1); begin{bmatrix} delta_{1, 1} & 0 & 0 & 0 & 0 & 0 & 0 & delta_{1, 2} \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\ delta_{2, 1} & 0 & 0 & 0 & 0 & 0 & 0 & delta_{2, 2} \ end{bmatrix}
⎣⎢⎡∂k1,1∂L∂k2,1∂L∂k3,1∂L∂k1,2∂L∂k2,2∂L∂k3,2∂L∂k1,3∂L∂k2,3∂L∂k3,3∂L⎦⎥⎤=⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡x1,1x2,1x3,1x4,1x5,1x6,1x7,1x8,1x9,1x10,1x1,2x2,2x3,2x4,2x5,2x6,2x7,2x8,2x9,2x10,2x1,3x2,3x3,3x4,3x5,3x6,3x7,3x8,3x9,3x10,3x1,4x2,4x3,4x4,4x5,4x6,4x7,4x8,4x9,4x10,4x1,5x2,5x3,5x4,5x5,5x6,5x7,5x8,5x9,5x10,5x1,6x2,6x3,6x4,6x5,6x6,6x7,6x8,6x9,6x10,6x1,7x2,7x3,7x4,7x5,7x6,7x7,7x8,7x9,7x10,7x1,8x2,8x3,8x4,8x5,8x6,8x7,8x8,8x9,8x10,8x1,9x2,9x3,9x4,9x5,9x6,9x7,9x8,9x9,9x10,9x1,10x2,10x3,10x4,10x5,10x6,10x7,10x8,10x9,10x10,10⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤conv(stride=1)⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎡δ1,1000000δ2,1000000000000000000000000000000000000000000000000δ1,2000000δ2,2⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎤
经过计算,两者的结果是相同的,这也就验证了我们的算法在一些比较极端的情况下也是正确的。
四、总结
经过一个比较极端的卷积实例的讲解,我们验证了我们算法的正确性,而下一步就是用代码实现二维平面上的卷积及其反向传播算法。
最后
以上就是拉长百褶裙为你收集整理的机器学习复习:卷积的方向传播之三:步长stride为s的二维卷积方法的反向传播算法:一个十分极端的例子的全部内容,希望文章能够帮你解决机器学习复习:卷积的方向传播之三:步长stride为s的二维卷积方法的反向传播算法:一个十分极端的例子所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
发表评论 取消回复