我是靠谱客的博主 拉长百褶裙,最近开发中收集的这篇文章主要介绍机器学习复习:卷积的方向传播之三:步长stride为s的二维卷积方法的反向传播算法:一个十分极端的例子,觉得挺不错的,现在分享给大家,希望可以做个参考。

概述

我的个人博客:https://huaxuan0720.github.io/ ,欢迎访问

前言

  在前面的文章中,介绍了二维平面上的卷积及其反向传播的算法,但是,步长为1和2毕竟都是两个比较小的数字,如果换成更大的数字,反向传播的方式是不是还适合呢?所以,我们考虑下面这个十分极端的例子,来验证反向传播算法的有效性。

一、参数设置

  在之前的参数设置中,我们使用的输入矩阵都是5x5,在这篇文章中,我们使用10x10大小的矩阵,在卷积核方面,我们依然使用3x3大小的卷积核,步长stride方面,我们使用一个很大的数字7,padding方式依然设置为VALID。

  因此,我们的参数汇总如下:

参数设置
输入矩阵 x x x一个二维矩阵,大小为10x10
输入卷积核 k k k一个二维矩阵,大小为3x3
步长 s t r i d e stride stride设置为7
paddingVALID
偏置项 b b b一个浮点数

  和前面一样,我们定义卷积操作的符号为 c o n v conv conv,我们可以将卷积表示为(需要注意的是这里步长选取为7):
x    c o n v    k + b = u x ; conv ; k + b = u xconvk+b=u
  展开之后,我们可以得到:
[ x 1 , 1 x 1 , 2 x 1 , 3 x 1 , 4 x 1 , 5 x 1 , 6 x 1 , 7 x 1 , 8 x 1 , 9 x 1 , 10 x 2 , 1 x 2 , 2 x 2 , 3 x 2 , 4 x 2 , 5 x 2 , 6 x 2 , 7 x 2 , 8 x 2 , 9 x 2 , 10 x 3 , 1 x 3 , 2 x 3 , 3 x 3 , 4 x 3 , 5 x 3 , 6 x 3 , 7 x 3 , 8 x 3 , 9 x 3 , 10 x 4 , 1 x 4 , 2 x 4 , 3 x 4 , 4 x 4 , 5 x 4 , 6 x 4 , 7 x 4 , 8 x 4 , 9 x 4 , 10 x 5 , 1 x 5 , 2 x 5 , 3 x 5 , 4 x 5 , 5 x 5 , 6 x 5 , 7 x 5 , 8 x 5 , 9 x 5 , 10 x 6 , 1 x 6 , 2 x 6 , 3 x 6 , 4 x 6 , 5 x 6 , 6 x 6 , 7 x 6 , 8 x 6 , 9 x 6 , 10 x 7 , 1 x 7 , 2 x 7 , 3 x 7 , 4 x 7 , 5 x 7 , 6 x 7 , 7 x 7 , 8 x 7 , 9 x 7 , 10 x 8 , 1 x 8 , 2 x 8 , 3 x 8 , 4 x 8 , 5 x 8 , 6 x 8 , 7 x 8 , 8 x 8 , 9 x 8 , 10 x 9 , 1 x 9 , 2 x 9 , 3 x 9 , 4 x 9 , 5 x 9 , 6 x 9 , 7 x 9 , 8 x 9 , 9 x 9 , 10 x 10 , 1 x 10 , 2 x 10 , 3 x 10 , 4 x 10 , 5 x 10 , 6 x 10 , 7 x 10 , 8 x 10 , 9 x 10 , 10 ]    c o n v    [ k 1 , 1 k 1 , 2 k 1 , 3 k 2 , 1 k 2 , 2 k 2 , 3 k 3 , 1 k 3 , 2 k 3 , 3 ] + b = [ u 1 , 1 u 1 , 2 u 2 , 1 u 2 , 2 ] begin{bmatrix} x_{1, 1} & x_{1, 2} & x_{1, 3} &x_{1, 4} &x_{1, 5} & x_{1, 6} & x_{1, 7} & x_{1, 8} &x_{1, 9} &x_{1, 10} \ x_{2, 1} & x_{2, 2} & x_{2, 3} &x_{2, 4} &x_{2, 5} & x_{2, 6} & x_{2, 7} & x_{2, 8} &x_{2, 9} &x_{2, 10} \ x_{3, 1} & x_{3, 2} & x_{3, 3} &x_{3, 4} &x_{3, 5} & x_{3, 6} & x_{3, 7} & x_{3, 8} &x_{3, 9} &x_{3, 10} \ x_{4, 1} & x_{4, 2} & x_{4, 3} &x_{4, 4} &x_{4, 5} & x_{4, 6} & x_{4, 7} & x_{4, 8} &x_{4, 9} &x_{4, 10} \ x_{5, 1} & x_{5, 2} & x_{5, 3} &x_{5, 4} &x_{5, 5} & x_{5, 6} & x_{5, 7} & x_{5, 8} &x_{5, 9} &x_{5, 10} \ x_{6, 1} & x_{6, 2} & x_{6, 3} &x_{6, 4} &x_{6, 5} & x_{6, 6} & x_{6, 7} & x_{6, 8} &x_{6, 9} &x_{6, 10} \ x_{7, 1} & x_{7, 2} & x_{7, 3} &x_{7, 4} &x_{7, 5} & x_{7, 6} & x_{7, 7} & x_{7, 8} &x_{7, 9} &x_{7, 10} \ x_{8, 1} & x_{8, 2} & x_{8, 3} &x_{8, 4} &x_{8, 5} & x_{8, 6} & x_{8, 7} & x_{8, 8} &x_{8, 9} &x_{8, 10} \ x_{9, 1} & x_{9, 2} & x_{9, 3} &x_{9, 4} &x_{9, 5} & x_{9, 6} & x_{9, 7} & x_{9, 8} &x_{9, 9} &x_{9, 10} \ x_{10, 1} & x_{10, 2} & x_{10, 3} &x_{10, 4} &x_{10, 5} & x_{10, 6} & x_{10, 7} & x_{10, 8} &x_{10, 9} &x_{10, 10} \ end{bmatrix} ; conv ; begin{bmatrix} k_{1, 1} & k_{1, 2} & k_{1, 3}\ k_{2, 1} & k_{2, 2} & k_{2, 3}\ k_{3, 1} & k_{3, 2} & k_{3, 3}\ end{bmatrix} + b = begin{bmatrix} u_{1, 1} & u_{1, 2} \ u_{2, 1} & u_{2, 2} \ end{bmatrix} x1,1x2,1x3,1x4,1x5,1x6,1x7,1x8,1x9,1x10,1x1,2x2,2x3,2x4,2x5,2x6,2x7,2x8,2x9,2x10,2x1,3x2,3x3,3x4,3x5,3x6,3x7,3x8,3x9,3x10,3x1,4x2,4x3,4x4,4x5,4x6,4x7,4x8,4x9,4x10,4x1,5x2,5x3,5x4,5x5,5x6,5x7,5x8,5x9,5x10,5x1,6x2,6x3,6x4,6x5,6x6,6x7,6x8,6x9,6x10,6x1,7x2,7x3,7x4,7x5,7x6,7x7,7x8,7x9,7x10,7x1,8x2,8x3,8x4,8x5,8x6,8x7,8x8,8x9,8x10,8x1,9x2,9x3,9x4,9x5,9x6,9x7,9x8,9x9,9x10,9x1,10x2,10x3,10x4,10x5,10x6,10x7,10x8,10x9,10x10,10convk1,1k2,1k3,1k1,2k2,2k3,2k1,3k2,3k3,3+b=[u1,1u2,1u1,2u2,2]
将矩阵 u u u进一步展开,我们有:
[ u 1 , 1 u 1 , 2 u 2 , 1 u 2 , 2 ] = [ x 1 , 1 k 1 , 1 + x 1 , 2 k 1 , 2 + x 1 , 3 k 1 , 3 + x 2 , 1 k 2 , 1 + x 2 , 2 k 2 , 2 + x 2 , 3 k 2 , 3 + x 3 , 1 k 3 , 1 + x 3 , 2 k 3 , 2 + x 3 , 3 k 3 , 3 + b x 1 , 8 k 1 , 1 + x 1 , 9 k 1 , 2 + x 1 , 10 k 1 , 3 + x 2 , 8 k 2 , 1 + x 2 , 9 k 2 , 2 + x 2 , 10 k 2 , 3 + x 3 , 8 k 3 , 1 + x 3 , 9 k 3 , 2 + x 3 , 10 k 3 , 3 + b x 8 , 1 k 1 , 1 + x 8 , 2 k 1 , 2 + x 8 , 3 k 1 , 3 + x 9 , 1 k 2 , 1 + x 9 , 2 k 2 , 2 + x 9 , 3 k 2 , 3 + x 10 , 1 k 3 , 1 + x 10 , 2 k 3 , 2 + x 10 , 3 k 3 , 3 + b x 8 , 8 k 1 , 1 + x 8 , 9 k 1 , 2 + x 8 , 10 k 1 , 3 + x 9 , 8 k 2 , 1 + x 9 , 9 k 2 , 2 + x 9 , 10 k 2 , 3 + x 10 , 8 k 3 , 1 + x 10 , 9 k 3 , 2 + x 10 , 10 k 3 , 3 + b ] begin{bmatrix} u_{1, 1} & u_{1, 2} \ u_{2, 1} & u_{2, 2} \ end{bmatrix} = \ begin{bmatrix} begin{matrix} x_{1, 1}k_{1, 1} + x_{1, 2}k_{1, 2} +x_{1, 3}k_{1, 3} + \ x_{2, 1}k_{2, 1} + x_{2, 2}k_{2, 2} +x_{2, 3}k_{2, 3} + \ x_{3, 1}k_{3, 1} + x_{3, 2}k_{3, 2} +x_{3, 3}k_{3, 3} + b \ end{matrix} & begin{matrix} x_{1, 8}k_{1, 1} + x_{1, 9}k_{1, 2} +x_{1, 10}k_{1, 3} + \ x_{2, 8}k_{2, 1} + x_{2, 9}k_{2, 2} +x_{2, 10}k_{2, 3} + \ x_{3, 8}k_{3, 1} + x_{3, 9}k_{3, 2} +x_{3, 10}k_{3, 3} + b \ end{matrix} \ \ begin{matrix} x_{8, 1}k_{1, 1} + x_{8, 2}k_{1, 2} +x_{8, 3}k_{1, 3} + \ x_{9, 1}k_{2, 1} + x_{9, 2}k_{2, 2} +x_{9, 3}k_{2, 3} + \ x_{10, 1}k_{3, 1} + x_{10, 2}k_{3, 2} +x_{10, 3}k_{3, 3} + b \ end{matrix} & begin{matrix} x_{8, 8}k_{1, 1} + x_{8, 9}k_{1, 2} +x_{8, 10}k_{1, 3} + \ x_{9, 8}k_{2, 1} + x_{9, 9}k_{2, 2} +x_{9, 10}k_{2, 3} + \ x_{10, 8}k_{3, 1} + x_{10, 9}k_{3, 2} +x_{10, 10}k_{3, 3} + b \ end{matrix} \ end{bmatrix} [u1,1u2,1u1,2u2,2]=x1,1k1,1+x1,2k1,2+x1,3k1,3+x2,1k2,1+x2,2k2,2+x2,3k2,3+x3,1k3,1+x3,2k3,2+x3,3k3,3+bx8,1k1,1+x8,2k1,2+x8,3k1,3+x9,1k2,1+x9,2k2,2+x9,3k2,3+x10,1k3,1+x10,2k3,2+x10,3k3,3+bx1,8k1,1+x1,9k1,2+x1,10k1,3+x2,8k2,1+x2,9k2,2+x2,10k2,3+x3,8k3,1+x3,9k3,2+x3,10k3,3+bx8,8k1,1+x8,9k1,2+x8,10k1,3+x9,8k2,1+x9,9k2,2+x9,10k2,3+x10,8k3,1+x10,9k3,2+x10,10k3,3+b

二、误差传递

  和之前一样,为了方便计算,也为了方便观察,我们计算如下的表格,每一列表示的是一个特定的输出 ∂ u i , j partial u_{i, j} ui,j,每一行表示的是一个特定的输入值 ∂ x p , k partial x_{p, k} xp,k,行与列相交的地方表示的就是二者相除的结果,表示的是输出对于输入的偏导数,即 ∂ u i , j ∂ x p , k frac{partial u_{i, j}}{partial x_{p, k}} xp,kui,j。最后一列显示的是计算出的需要传递的误差的偏导数,具体计算方法和前面一样,在这里不再赘述:

∂ u 1 , 1 partial u_{1, 1} u1,1 ∂ u 1 , 2 partial u_{1, 2} u1,2 ∂ u 2 , 1 partial u_{2, 1} u2,1 ∂ u 2 , 2 partial u_{2, 2} u2,2 ∂ L ∂ x i , j frac{partial L}{partial x_{i, j}} xi,jL
∂ x 1 , 1 partial x_{1, 1} x1,1 k 1 , 1 k_{1, 1} k1,1000 ∂ L ∂ x 1 , 1 = δ 1 , 1 k 1 , 1 frac{partial L}{partial x_{1, 1}} = delta_{1, 1} k_{1, 1} x1,1L=δ1,1k1,1
∂ x 1 , 2 partial x_{1, 2} x1,2 k 1 , 2 k_{1, 2} k1,2000 ∂ L ∂ x 1 , 2 = δ 1 , 1 k 1 , 2 frac{partial L}{partial x_{1, 2}} = delta_{1, 1} k_{1, 2} x1,2L=δ1,1k1,2
∂ x 1 , 3 partial x_{1, 3} x1,3 k 1 , 3 k_{1, 3} k1,3000 ∂ L ∂ x 1 , 3 = δ 1 , 1 k 1 , 3 frac{partial L}{partial x_{1, 3}} = delta_{1, 1} k_{1, 3} x1,3L=δ1,1k1,3
∂ x 1 , 8 partial x_{1, 8} x1,80 k 1 , 1 k_{1, 1} k1,100 ∂ L ∂ x 1 , 8 = δ 1 , 2 k 1 , 1 frac{partial L}{partial x_{1, 8}} = delta_{1, 2}k_{1, 1} x1,8L=δ1,2k1,1
∂ x 1 , 9 partial x_{1, 9} x1,90 k 1 , 2 k_{1, 2} k1,200 ∂ L ∂ x 1 , 9 = δ 1 , 2 k 1 , 2 frac{partial L}{partial x_{1, 9}} = delta_{1, 2}k_{1, 2} x1,9L=δ1,2k1,2
∂ x 1 , 10 partial x_{1, 10} x1,100 k 1 , 3 k_{1, 3} k1,300 ∂ L ∂ x 1 , 10 = δ 1 , 2 k 1 , 3 frac{partial L}{partial x_{1, 10}} = delta_{1, 2}k_{1, 3} x1,10L=δ1,2k1,3
∂ x 2 , 1 partial x_{2, 1} x2,1 k 2 , 1 k_{2, 1} k2,1000 ∂ L ∂ x 2 , 1 = δ 1 , 1 k 2 , 1 frac{partial L}{partial x_{2, 1}} = delta_{1, 1} k_{2, 1} x2,1L=δ1,1k2,1
∂ x 2 , 2 partial x_{2, 2} x2,2 k 2 , 2 k_{2, 2} k2,2000 ∂ L ∂ x 2 , 2 = δ 1 , 1 k 2 , 2 frac{partial L}{partial x_{2, 2}} = delta_{1, 1} k_{2, 2} x2,2L=δ1,1k2,2
∂ x 2 , 3 partial x_{2, 3} x2,3 k 2 , 3 k_{2, 3} k2,3000 ∂ L ∂ x 2 , 3 = δ 1 , 1 k 2 , 3 frac{partial L}{partial x_{2, 3}} = delta_{1, 1} k_{2, 3} x2,3L=δ1,1k2,3
∂ x 2 , 8 partial x_{2, 8} x2,80 k 2 , 1 k_{2, 1} k2,100 ∂ L ∂ x 2 , 8 = δ 1 , 2 k 2 , 1 frac{partial L}{partial x_{2, 8}} = delta_{1, 2}k_{2, 1} x2,8L=δ1,2k2,1
∂ x 2 , 9 partial x_{2, 9} x2,90 k 2 , 2 k_{2, 2} k2,200 ∂ L ∂ x 2 , 9 = δ 1 , 2 k 2 , 2 frac{partial L}{partial x_{2, 9}} = delta_{1, 2}k_{2, 2} x2,9L=δ1,2k2,2
∂ x 2 , 10 partial x_{2, 10} x2,100 k 2 , 3 k_{2, 3} k2,300 ∂ L ∂ x 2 , 10 = δ 1 , 2 k 2 , 3 frac{partial L}{partial x_{2, 10}} = delta_{1, 2}k_{2, 3} x2,10L=δ1,2k2,3
∂ x 3 , 1 partial x_{3, 1} x3,1 k 3 , 1 k_{3, 1} k3,1000 ∂ L ∂ x 3 , 1 = δ 1 , 1 k 3 , 1 frac{partial L}{partial x_{3, 1}} = delta_{1, 1} k_{3, 1} x3,1L=δ1,1k3,1
∂ x 3 , 2 partial x_{3, 2} x3,2 k 3 , 2 k_{3, 2} k3,2000 ∂ L ∂ x 3 , 2 = δ 1 , 1 k 3 , 2 frac{partial L}{partial x_{3, 2}} = delta_{1, 1} k_{3, 2} x3,2L=δ1,1k3,2
∂ x 3 , 3 partial x_{3, 3} x3,3 k 3 , 3 k_{3, 3} k3,3000 ∂ L ∂ x 3 , 3 = δ 1 , 1 k 3 , 3 frac{partial L}{partial x_{3, 3}} = delta_{1, 1} k_{3, 3} x3,3L=δ1,1k3,3
∂ x 3 , 8 partial x_{3, 8} x3,80 k 3 , 1 k_{3, 1} k3,100 ∂ L ∂ x 3 , 8 = δ 1 , 2 k 3 , 1 frac{partial L}{partial x_{3, 8}} = delta_{1, 2}k_{3, 1} x3,8L=δ1,2k3,1
∂ x 3 , 9 partial x_{3, 9} x3,90 k 3 , 2 k_{3, 2} k3,200 ∂ L ∂ x 3 , 9 = δ 1 , 2 k 3 , 2 frac{partial L}{partial x_{3, 9}} = delta_{1, 2}k_{3, 2} x3,9L=δ1,2k3,2
∂ x 3 , 10 partial x_{3, 10} x3,100 k 3 , 3 k_{3, 3} k3,300 ∂ L ∂ x 3 , 10 = δ 1 , 2 k 3 , 3 frac{partial L}{partial x_{3, 10}} = delta_{1, 2}k_{3, 3} x3,10L=δ1,2k3,3
∂ x 8 , 1 partial x_{8, 1} x8,100 k 1 , 1 k_{1, 1} k1,10 ∂ L ∂ x 8 , 1 = δ 2 , 1 k 1 , 1 frac{partial L}{partial x_{8, 1}} = delta_{2, 1} k_{1, 1} x8,1L=δ2,1k1,1
∂ x 8 , 2 partial x_{8, 2} x8,200 k 1 , 2 k_{1, 2} k1,20 ∂ L ∂ x 8 , 2 = δ 2 , 1 k 1 , 2 frac{partial L}{partial x_{8, 2}} = delta_{2, 1} k_{1, 2} x8,2L=δ2,1k1,2
∂ x 8 , 3 partial x_{8, 3} x8,300 k 1 , 3 k_{1, 3} k1,30 ∂ L ∂ x 8 , 3 = δ 2 , 1 k 1 , 3 frac{partial L}{partial x_{8, 3}} = delta_{2, 1} k_{1, 3} x8,3L=δ2,1k1,3
∂ x 8 , 8 partial x_{8, 8} x8,8000 k 1 , 1 k_{1, 1} k1,1 ∂ L ∂ x 8 , 8 = δ 2 , 2 k 1 , 1 frac{partial L}{partial x_{8, 8}} = delta_{2, 2}k_{1, 1} x8,8L=δ2,2k1,1
∂ x 8 , 9 partial x_{8, 9} x8,9000 k 1 , 2 k_{1, 2} k1,2 ∂ L ∂ x 8 , 9 = δ 2 , 2 k 1 , 2 frac{partial L}{partial x_{8, 9}} = delta_{2, 2}k_{1, 2} x8,9L=δ2,2k1,2
∂ x 8 , 10 partial x_{8, 10} x8,10000 k 1 , 3 k_{1, 3} k1,3 ∂ L ∂ x 8 , 10 = δ 2 , 2 k 1 , 3 frac{partial L}{partial x_{8, 10}} = delta_{2, 2}k_{1, 3} x8,10L=δ2,2k1,3
∂ x 9 , 1 partial x_{9, 1} x9,100 k 2 , 1 k_{2, 1} k2,10 ∂ L ∂ x 9 , 1 = δ 2 , 1 k 2 , 1 frac{partial L}{partial x_{9, 1}} = delta_{2, 1} k_{2, 1} x9,1L=δ2,1k2,1
∂ x 9 , 2 partial x_{9, 2} x9,200 k 2 , 2 k_{2, 2} k2,20 ∂ L ∂ x 9 , 2 = δ 2 , 1 k 2 , 2 frac{partial L}{partial x_{9, 2}} = delta_{2, 1} k_{2, 2} x9,2L=δ2,1k2,2
∂ x 9 , 3 partial x_{9, 3} x9,300 k 2 , 3 k_{2, 3} k2,30 ∂ L ∂ x 9 , 3 = δ 2 , 1 k 2 , 3 frac{partial L}{partial x_{9, 3}} = delta_{2, 1} k_{2, 3} x9,3L=δ2,1k2,3
∂ x 9 , 8 partial x_{9, 8} x9,8000 k 2 , 1 k_{2, 1} k2,1 ∂ L ∂ x 9 , 8 = δ 2 , 2 k 2 , 1 frac{partial L}{partial x_{9, 8}} = delta_{2, 2}k_{2, 1} x9,8L=δ2,2k2,1
∂ x 9 , 9 partial x_{9, 9} x9,9000 k 2 , 2 k_{2, 2} k2,2 ∂ L ∂ x 9 , 9 = δ 2 , 2 k 2 , 2 frac{partial L}{partial x_{9, 9}} = delta_{2, 2}k_{2, 2} x9,9L=δ2,2k2,2
∂ x 9 , 10 partial x_{9, 10} x9,10000 k 2 , 3 k_{2, 3} k2,3 ∂ L ∂ x 9 , 10 = δ 2 , 2 k 2 , 3 frac{partial L}{partial x_{9, 10}} = delta_{2, 2}k_{2, 3} x9,10L=δ2,2k2,3
∂ x 10 , 1 partial x_{10, 1} x10,100 k 3 , 1 k_{3, 1} k3,10 ∂ L ∂ x 10 , 1 = δ 2 , 1 k 3 , 1 frac{partial L}{partial x_{10, 1}} = delta_{2, 1} k_{3, 1} x10,1L=δ2,1k3,1
∂ x 10 , 2 partial x_{10, 2} x10,200 k 3 , 2 k_{3, 2} k3,20 ∂ L ∂ x 10 , 2 = δ 2 , 1 k 3 , 2 frac{partial L}{partial x_{10, 2}} = delta_{2, 1} k_{3, 2} x10,2L=δ2,1k3,2
∂ x 10 , 3 partial x_{10, 3} x10,300 k 3 , 3 k_{3, 3} k3,30 ∂ L ∂ x 10 , 3 = δ 2 , 1 k 3 , 3 frac{partial L}{partial x_{10, 3}} = delta_{2, 1} k_{3, 3} x10,3L=δ2,1k3,3
∂ x 10 , 8 partial x_{10, 8} x10,8000 k 3 , 1 k_{3, 1} k3,1 ∂ L ∂ x 10 , 8 = δ 2 , 2 k 3 , 1 frac{partial L}{partial x_{10, 8}} = delta_{2, 2}k_{3, 1} x10,8L=δ2,2k3,1
∂ x 10 , 9 partial x_{10, 9} x10,9000 k 3 , 2 k_{3, 2} k3,2 ∂ L ∂ x 10 , 9 = δ 2 , 2 k 3 , 2 frac{partial L}{partial x_{10, 9}} = delta_{2, 2}k_{3, 2} x10,9L=δ2,2k3,2
∂ x 10 , 10 partial x_{10, 10} x10,10000 k 3 , 3 k_{3, 3} k3,3 ∂ L ∂ x 10 , 10 = δ 2 , 2 k 3 , 3 frac{partial L}{partial x_{10, 10}} = delta_{2, 2}k_{3, 3} x10,10L=δ2,2k3,3
e l s e else else00000

  可以看出,无论是何种卷积方式,数据都是十分有规律地进行分布。

  我们假设后面传递过来的误差是 δ delta δ ,即:
δ = [ δ 1 , 1 δ 1 , 2 δ 2 , 1 δ 2 , 2 ] delta = begin{bmatrix} delta_{1, 1} & delta_{1, 2} \ delta_{2, 1} & delta_{2, 2} \ end{bmatrix} δ=[δ1,1δ2,1δ1,2δ2,2]
  其中, δ i , j = ∂ L ∂ u i , j delta_{i, j} = frac{partial L}{partial u_{i, j}} δi,j=ui,jL,误差分别对应于每一个输出项。这里的 L L L表示的是最后的Loss损失。我们的目的就是希望这个损失尽可能小。

  根据前面的方法,我们先要求应该传递给下一层的误差。所以第一步,我们先在接受来的误差矩阵中插入合适数目的0,由于这里前向卷积采用的步长stride是7,所以接收到误差矩阵中的每个元素之间应该插入(7 - 1 = 6)个0,即:
[ δ 1 , 1 0 0 0 0 0 0 δ 1 , 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 δ 2 , 1 0 0 0 0 0 0 δ 2 , 2 ] begin{bmatrix} delta_{1, 1} & 0 & 0 & 0 & 0 & 0 & 0 & delta_{1, 2} \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \ delta_{2, 1} & 0 & 0 & 0 & 0 & 0 & 0 & delta_{2, 2} \ end{bmatrix} δ1,1000000δ2,1000000000000000000000000000000000000000000000000δ1,2000000δ2,2
  接着,由于我们采用的卷积核的大小是3x3,所有,我们依然需要在上面矩阵的外围补上(3 - 1 = 2)层0,即:
[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 δ 1 , 1 0 0 0 0 0 0 δ 1 , 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 δ 2 , 1 0 0 0 0 0 0 δ 2 , 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ] begin{bmatrix} 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ 0 & 0 & delta_{1, 1} & 0 & 0 & 0 & 0 & 0 & 0 & delta_{1, 2} & 0 & 0 \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ 0 & 0 & delta_{2, 1} & 0 & 0 & 0 & 0 & 0 & 0 & delta_{2, 2} & 0 & 0 \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ end{bmatrix} 00000000000000000000000000δ1,1000000δ2,10000000000000000000000000000000000000000000000000000000000000000000000000000δ1,2000000δ2,200000000000000000000000000
  下一步就是将正向卷积的卷积核旋转180°,即:
[ k 3 , 3 k 3 , 2 k 3 , 1 k 2 , 3 k 2 , 2 k 2 , 1 k 1 , 3 k 1 , 2 k 1 , 1 ] begin{bmatrix} k_{3, 3} & k_{3, 2} & k_{3, 1} \ k_{2, 3} & k_{2, 2} & k_{2, 1} \ k_{1, 3} & k_{1, 2} & k_{1, 1} \ end{bmatrix} k3,3k2,3k1,3k3,2k2,2k1,2k3,1k2,1k1,1
  最后一步就是将上面的误差矩阵和旋转后的卷积核进行步长为1的卷积,即:
[ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 δ 1 , 1 0 0 0 0 0 0 δ 1 , 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 δ 2 , 1 0 0 0 0 0 0 δ 2 , 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]    c o n v ( s t r i d e = 1 )    [ k 3 , 3 k 3 , 2 k 3 , 1 k 2 , 3 k 2 , 2 k 2 , 1 k 1 , 3 k 1 , 2 k 1 , 1 ] = [ δ 1 , 1 k 1 , 1 δ 1 , 1 k 1 , 2 δ 1 , 1 k 1 , 3 0 0 0 0 δ 1 , 2 k 1 , 1 δ 1 , 2 k 1 , 2 δ 1 , 2 k 1 , 3 δ 1 , 1 k 2 , 1 δ 1 , 1 k 2 , 2 δ 1 , 1 k 2 , 3 0 0 0 0 δ 1 , 2 k 2 , 1 δ 1 , 2 k 2 , 2 δ 1 , 2 k 2 , 3 δ 1 , 1 k 3 , 1 δ 1 , 1 k 3 , 2 δ 1 , 1 k 3 , 3 0 0 0 0 δ 1 , 2 k 3 , 1 δ 1 , 2 k 3 , 2 δ 1 , 2 k 3 , 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 δ 2 , 1 k 1 , 1 δ 2 , 1 k 1 , 2 δ 2 , 1 k 1 , 3 0 0 0 0 δ 2 , 2 k 1 , 1 δ 2 , 2 k 1 , 2 δ 2 , 2 k 1 , 3 δ 2 , 1 k 2 , 1 δ 2 , 1 k 2 , 2 δ 2 , 1 k 2 , 3 0 0 0 0 δ 2 , 2 k 2 , 1 δ 2 , 2 k 2 , 2 δ 2 , 2 k 2 , 3 δ 2 , 1 k 3 , 1 δ 2 , 1 k 3 , 2 δ 2 , 1 k 3 , 3 0 0 0 0 δ 2 , 2 k 3 , 1 δ 2 , 2 k 3 , 2 δ 2 , 2 k 3 , 3 ] = [ ∂ L ∂ x 1 , 1 ∂ L ∂ x 1 , 2 ∂ L ∂ x 1 , 3 ∂ L ∂ x 1 , 4 ∂ L ∂ x 1 , 5 ∂ L ∂ x 1 , 6 ∂ L ∂ x 1 , 7 ∂ L ∂ x 1 , 8 ∂ L ∂ x 1 , 9 ∂ L ∂ x 1 , 10 ∂ L ∂ x 2 , 1 ∂ L ∂ x 2 , 2 ∂ L ∂ x 2 , 3 ∂ L ∂ x 2 , 4 ∂ L ∂ x 2 , 5 ∂ L ∂ x 2 , 6 ∂ L ∂ x 2 , 7 ∂ L ∂ x 2 , 8 ∂ L ∂ x 2 , 9 ∂ L ∂ x 2 , 10 ∂ L ∂ x 3 , 1 ∂ L ∂ x 3 , 2 ∂ L ∂ x 3 , 3 ∂ L ∂ x 3 , 4 ∂ L ∂ x 3 , 5 ∂ L ∂ x 3 , 6 ∂ L ∂ x 3 , 7 ∂ L ∂ x 3 , 8 ∂ L ∂ x 3 , 9 ∂ L ∂ x 3 , 10 ∂ L ∂ x 4 , 1 ∂ L ∂ x 4 , 2 ∂ L ∂ x 4 , 3 ∂ L ∂ x 4 , 4 ∂ L ∂ x 4 , 5 ∂ L ∂ x 4 , 6 ∂ L ∂ x 4 , 7 ∂ L ∂ x 4 , 8 ∂ L ∂ x 4 , 9 ∂ L ∂ x 4 , 10 ∂ L ∂ x 5 , 1 ∂ L ∂ x 5 , 2 ∂ L ∂ x 5 , 3 ∂ L ∂ x 5 , 4 ∂ L ∂ x 5 , 5 ∂ L ∂ x 5 , 6 ∂ L ∂ x 5 , 7 ∂ L ∂ x 5 , 8 ∂ L ∂ x 5 , 9 ∂ L ∂ x 5 , 10 ∂ L ∂ x 6 , 1 ∂ L ∂ x 6 , 2 ∂ L ∂ x 6 , 3 ∂ L ∂ x 6 , 4 ∂ L ∂ x 6 , 5 ∂ L ∂ x 6 , 6 ∂ L ∂ x 6 , 7 ∂ L ∂ x 6 , 8 ∂ L ∂ x 6 , 9 ∂ L ∂ x 6 , 10 ∂ L ∂ x 7 , 1 ∂ L ∂ x 7 , 2 ∂ L ∂ x 7 , 3 ∂ L ∂ x 7 , 4 ∂ L ∂ x 7 , 5 ∂ L ∂ x 7 , 6 ∂ L ∂ x 7 , 7 ∂ L ∂ x 7 , 8 ∂ L ∂ x 7 , 9 ∂ L ∂ x 7 , 10 ∂ L ∂ x 8 , 1 ∂ L ∂ x 8 , 2 ∂ L ∂ x 8 , 3 ∂ L ∂ x 8 , 4 ∂ L ∂ x 8 , 5 ∂ L ∂ x 8 , 6 ∂ L ∂ x 8 , 7 ∂ L ∂ x 8 , 8 ∂ L ∂ x 8 , 9 ∂ L ∂ x 8 , 10 ∂ L ∂ x 9 , 1 ∂ L ∂ x 9 , 2 ∂ L ∂ x 9 , 3 ∂ L ∂ x 9 , 4 ∂ L ∂ x 9 , 5 ∂ L ∂ x 9 , 6 ∂ L ∂ x 9 , 7 ∂ L ∂ x 9 , 8 ∂ L ∂ x 9 , 9 ∂ L ∂ x 9 , 10 ∂ L ∂ x 10 , 1 ∂ L ∂ x 10 , 2 ∂ L ∂ x 10 , 3 ∂ L ∂ x 10 , 4 ∂ L ∂ x 10 , 5 ∂ L ∂ x 10 , 6 ∂ L ∂ x 10 , 7 ∂ L ∂ x 10 , 8 ∂ L ∂ x 10 , 9 ∂ L ∂ x 10 , 10 ] begin{bmatrix} 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ 0 & 0 & delta_{1, 1} & 0 & 0 & 0 & 0 & 0 & 0 & delta_{1, 2} & 0 & 0 \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ 0 & 0 & delta_{2, 1} & 0 & 0 & 0 & 0 & 0 & 0 & delta_{2, 2} & 0 & 0 \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ end{bmatrix} ; conv(stride = 1); begin{bmatrix} k_{3, 3} & k_{3, 2} & k_{3, 1} \ k_{2, 3} & k_{2, 2} & k_{2, 1} \ k_{1, 3} & k_{1, 2} & k_{1, 1} \ end{bmatrix} = \ begin{bmatrix} delta_{1, 1} k_{1, 1} & delta_{1, 1} k_{1, 2} & delta_{1, 1} k_{1, 3} & 0 & 0 & 0 & 0 & delta_{1, 2}k_{1, 1} & delta_{1, 2}k_{1, 2} & delta_{1, 2}k_{1, 3} \ delta_{1, 1} k_{2, 1} & delta_{1, 1} k_{2, 2} & delta_{1, 1} k_{2, 3} & 0 & 0 & 0 & 0 & delta_{1, 2}k_{2, 1} & delta_{1, 2}k_{2, 2} & delta_{1, 2}k_{2, 3} \ delta_{1, 1} k_{3, 1} & delta_{1, 1} k_{3, 2} & delta_{1, 1} k_{3, 3} & 0 & 0 & 0 & 0 & delta_{1, 2}k_{3, 1} & delta_{1, 2}k_{3, 2} & delta_{1, 2}k_{3, 3} \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ delta_{2, 1} k_{1, 1} & delta_{2, 1} k_{1, 2} & delta_{2, 1} k_{1, 3} & 0 & 0 & 0 & 0 & delta_{2, 2}k_{1, 1} & delta_{2, 2}k_{1, 2} & delta_{2, 2}k_{1, 3} \ delta_{2, 1} k_{2, 1} & delta_{2, 1} k_{2, 2} & delta_{2, 1} k_{2, 3} & 0 & 0 & 0 & 0 & delta_{2, 2}k_{2, 1} & delta_{2, 2}k_{2, 2} & delta_{2, 2}k_{2, 3} \ delta_{2, 1} k_{3, 1} & delta_{2, 1} k_{3, 2} & delta_{2, 1} k_{3, 3} & 0 & 0 & 0 & 0 & delta_{2, 2}k_{3, 1} & delta_{2, 2}k_{3, 2} & delta_{2, 2}k_{3, 3} \ end{bmatrix} = \ begin{bmatrix} frac{partial L}{partial x_{1, 1}} & frac{partial L}{partial x_{1, 2}} & frac{partial L}{partial x_{1, 3}} & frac{partial L}{partial x_{1, 4}} & frac{partial L}{partial x_{1, 5}} & frac{partial L}{partial x_{1, 6}} & frac{partial L}{partial x_{1, 7}} & frac{partial L}{partial x_{1, 8}} & frac{partial L}{partial x_{1, 9}} & frac{partial L}{partial x_{1, 10}} \ frac{partial L}{partial x_{2, 1}} & frac{partial L}{partial x_{2, 2}} & frac{partial L}{partial x_{2, 3}} & frac{partial L}{partial x_{2, 4}} & frac{partial L}{partial x_{2, 5}} & frac{partial L}{partial x_{2, 6}} & frac{partial L}{partial x_{2, 7}} & frac{partial L}{partial x_{2, 8}} & frac{partial L}{partial x_{2, 9}} & frac{partial L}{partial x_{2, 10}} \ frac{partial L}{partial x_{3, 1}} & frac{partial L}{partial x_{3, 2}} & frac{partial L}{partial x_{3, 3}} & frac{partial L}{partial x_{3, 4}} & frac{partial L}{partial x_{3, 5}} & frac{partial L}{partial x_{3, 6}} & frac{partial L}{partial x_{3, 7}} & frac{partial L}{partial x_{3, 8}} & frac{partial L}{partial x_{3, 9}} & frac{partial L}{partial x_{3, 10}} \ frac{partial L}{partial x_{4, 1}} & frac{partial L}{partial x_{4, 2}} & frac{partial L}{partial x_{4, 3}} & frac{partial L}{partial x_{4, 4}} & frac{partial L}{partial x_{4, 5}} & frac{partial L}{partial x_{4, 6}} & frac{partial L}{partial x_{4, 7}} & frac{partial L}{partial x_{4, 8}} & frac{partial L}{partial x_{4, 9}} & frac{partial L}{partial x_{4, 10}} \ frac{partial L}{partial x_{5, 1}} & frac{partial L}{partial x_{5, 2}} & frac{partial L}{partial x_{5, 3}} & frac{partial L}{partial x_{5, 4}} & frac{partial L}{partial x_{5, 5}} & frac{partial L}{partial x_{5, 6}} & frac{partial L}{partial x_{5, 7}} & frac{partial L}{partial x_{5, 8}} & frac{partial L}{partial x_{5, 9}} & frac{partial L}{partial x_{5, 10}} \ frac{partial L}{partial x_{6, 1}} & frac{partial L}{partial x_{6, 2}} & frac{partial L}{partial x_{6, 3}} & frac{partial L}{partial x_{6, 4}} & frac{partial L}{partial x_{6, 5}} & frac{partial L}{partial x_{6, 6}} & frac{partial L}{partial x_{6, 7}} & frac{partial L}{partial x_{6, 8}} & frac{partial L}{partial x_{6, 9}} & frac{partial L}{partial x_{6, 10}} \ frac{partial L}{partial x_{7, 1}} & frac{partial L}{partial x_{7, 2}} & frac{partial L}{partial x_{7, 3}} & frac{partial L}{partial x_{7, 4}} & frac{partial L}{partial x_{7, 5}} & frac{partial L}{partial x_{7, 6}} & frac{partial L}{partial x_{7, 7}} & frac{partial L}{partial x_{7, 8}} & frac{partial L}{partial x_{7, 9}} & frac{partial L}{partial x_{7, 10}} \ frac{partial L}{partial x_{8, 1}} & frac{partial L}{partial x_{8, 2}} & frac{partial L}{partial x_{8, 3}} & frac{partial L}{partial x_{8, 4}} & frac{partial L}{partial x_{8, 5}} & frac{partial L}{partial x_{8, 6}} & frac{partial L}{partial x_{8, 7}} & frac{partial L}{partial x_{8, 8}} & frac{partial L}{partial x_{8, 9}} & frac{partial L}{partial x_{8, 10}} \ frac{partial L}{partial x_{9, 1}} & frac{partial L}{partial x_{9, 2}} & frac{partial L}{partial x_{9, 3}} & frac{partial L}{partial x_{9, 4}} & frac{partial L}{partial x_{9, 5}} & frac{partial L}{partial x_{9, 6}} & frac{partial L}{partial x_{9, 7}} & frac{partial L}{partial x_{9, 8}} & frac{partial L}{partial x_{9, 9}} & frac{partial L}{partial x_{9, 10}} \ frac{partial L}{partial x_{10, 1}} & frac{partial L}{partial x_{10, 2}} & frac{partial L}{partial x_{10, 3}} & frac{partial L}{partial x_{10, 4}} & frac{partial L}{partial x_{10, 5}} & frac{partial L}{partial x_{10, 6}} & frac{partial L}{partial x_{10, 7}} & frac{partial L}{partial x_{10, 8}} & frac{partial L}{partial x_{10, 9}} & frac{partial L}{partial x_{10, 10}} \ end{bmatrix} 00000000000000000000000000δ1,1000000δ2,10000000000000000000000000000000000000000000000000000000000000000000000000000δ1,2000000δ2,200000000000000000000000000conv(stride=1)k3,3k2,3k1,3k3,2k2,2k1,2k3,1k2,1k1,1=δ1,1k1,1δ1,1k2,1δ1,1k3,10000δ2,1k1,1δ2,1k2,1δ2,1k3,1δ1,1k1,2δ1,1k2,2δ1,1k3,20000δ2,1k1,2δ2,1k2,2δ2,1k3,2δ1,1k1,3δ1,1k2,3δ1,1k3,30000δ2,1k1,3δ2,1k2,3δ2,1k3,30000000000000000000000000000000000000000δ1,2k1,1δ1,2k2,1δ1,2k3,10000δ2,2k1,1δ2,2k2,1δ2,2k3,1δ1,2k1,2δ1,2k2,2δ1,2k3,20000δ2,2k1,2δ2,2k2,2δ2,2k3,2δ1,2k1,3δ1,2k2,3δ1,2k3,30000δ2,2k1,3δ2,2k2,3δ2,2k3,3=x1,1Lx2,1Lx3,1Lx4,1Lx5,1Lx6,1Lx7,1Lx8,1Lx9,1Lx10,1Lx1,2Lx2,2Lx3,2Lx4,2Lx5,2Lx6,2Lx7,2Lx8,2Lx9,2Lx10,2Lx1,3Lx2,3Lx3,3Lx4,3Lx5,3Lx6,3Lx7,3Lx8,3Lx9,3Lx10,3Lx1,4Lx2,4Lx3,4Lx4,4Lx5,4Lx6,4Lx7,4Lx8,4Lx9,4Lx10,4Lx1,5Lx2,5Lx3,5Lx4,5Lx5,5Lx6,5Lx7,5Lx8,5Lx9,5Lx10,5Lx1,6Lx2,6Lx3,6Lx4,6Lx5,6Lx6,6Lx7,6Lx8,6Lx9,6Lx10,6Lx1,7Lx2,7Lx3,7Lx4,7Lx5,7Lx6,7Lx7,7Lx8,7Lx9,7Lx10,7Lx1,8Lx2,8Lx3,8Lx4,8Lx5,8Lx6,8Lx7,8Lx8,8Lx9,8Lx10,8Lx1,9Lx2,9Lx3,9Lx4,9Lx5,9Lx6,9Lx7,9Lx8,9Lx9,9Lx10,9Lx1,10Lx2,10Lx3,10Lx4,10Lx5,10Lx6,10Lx7,10Lx8,10Lx9,10Lx10,10L

  经过上面的计算,在误差传递上,我们的算法可以正确运行,即使步长stride是一个任意的数字。接下来我们来验证更新梯度的计算。

三、更新梯度

  和前面的定义一样,假设我们在这一阶段接收到的后方传递过来的误差为 δ delta δ, ,即:
δ = [ δ 1 , 1 δ 1 , 2 δ 2 , 1 δ 2 , 2 ] delta = begin{bmatrix} delta_{1, 1} & delta_{1, 2} \ delta_{2, 1} & delta_{2, 2} \ end{bmatrix} δ=[δ1,1δ2,1δ1,2δ2,2]
  那么根据偏导数求解的链式法则,我们可以计算出所有的需要的偏导数,这里的计算过程和前面的计算过程是一样的,这里不再赘述。汇总如下:
∂ L ∂ k 1 , 1 = x 1 , 1 δ 1 , 1 + x 1 , 8 δ 1 , 2 + x 8 , 1 δ 2 , 1 + x 8 , 8 δ 2 , 2 frac{partial L}{partial k_{1, 1}} = x_{1, 1}delta_{1, 1} + x_{1, 8}delta_{1, 2} + x_{8, 1}delta_{2, 1} + x_{8, 8}delta_{2, 2} k1,1L=x1,1δ1,1+x1,8δ1,2+x8,1δ2,1+x8,8δ2,2
∂ L ∂ k 1 , 2 = x 1 , 2 δ 1 , 1 + x 1 , 9 δ 1 , 2 + x 8 , 2 δ 2 , 1 + x 8 , 9 δ 2 , 2 frac{partial L}{partial k_{1, 2}} = x_{1, 2}delta_{1, 1} + x_{1, 9}delta_{1, 2} + x_{8, 2}delta_{2, 1} + x_{8, 9}delta_{2, 2} k1,2L=x1,2δ1,1+x1,9δ1,2+x8,2δ2,1+x8,9δ2,2
∂ L ∂ k 1 , 3 = x 1 , 3 δ 1 , 1 + x 1 , 10 δ 1 , 2 + x 8 , 3 δ 2 , 1 + x 8 , 10 δ 2 , 2 frac{partial L}{partial k_{1, 3}} = x_{1, 3}delta_{1, 1} + x_{1, 10}delta_{1, 2} + x_{8, 3}delta_{2, 1} + x_{8, 10}delta_{2, 2} k1,3L=x1,3δ1,1+x1,10δ1,2+x8,3δ2,1+x8,10δ2,2
∂ L ∂ k 2 , 1 = x 2 , 1 δ 1 , 1 + x 2 , 8 δ 1 , 2 + x 9 , 1 δ 2 , 1 + x 9 , 8 δ 2 , 2 frac{partial L}{partial k_{2, 1}} = x_{2, 1}delta_{1, 1} + x_{2, 8}delta_{1, 2} + x_{9, 1}delta_{2, 1} + x_{9, 8}delta_{2, 2} k2,1L=x2,1δ1,1+x2,8δ1,2+x9,1δ2,1+x9,8δ2,2
∂ L ∂ k 2 , 2 = x 2 , 2 δ 1 , 1 + x 2 , 9 δ 1 , 2 + x 9 , 2 δ 2 , 1 + x 9 , 9 δ 2 , 2 frac{partial L}{partial k_{2, 2}} = x_{2, 2}delta_{1, 1} + x_{2, 9}delta_{1, 2} + x_{9, 2}delta_{2, 1} + x_{9, 9}delta_{2, 2} k2,2L=x2,2δ1,1+x2,9δ1,2+x9,2δ2,1+x9,9δ2,2
∂ L ∂ k 2 , 3 = x 2 , 3 δ 1 , 1 + x 2 , 10 δ 1 , 2 + x 9 , 3 δ 2 , 1 + x 9 , 10 δ 2 , 2 frac{partial L}{partial k_{2, 3}} = x_{2, 3}delta_{1, 1} + x_{2, 10}delta_{1, 2} + x_{9, 3}delta_{2, 1} + x_{9, 10}delta_{2, 2} k2,3L=x2,3δ1,1+x2,10δ1,2+x9,3δ2,1+x9,10δ2,2
∂ L ∂ k 3 , 1 = x 3 , 1 δ 1 , 1 + x 3 , 8 δ 1 , 2 + x 10 , 1 δ 2 , 1 + x 10 , 8 δ 2 , 2 frac{partial L}{partial k_{3, 1}} = x_{3, 1}delta_{1, 1} + x_{3, 8}delta_{1, 2} + x_{10, 1}delta_{2, 1} + x_{10, 8}delta_{2, 2} k3,1L=x3,1δ1,1+x3,8δ1,2+x10,1δ2,1+x10,8δ2,2
∂ L ∂ k 3 , 2 = x 3 , 2 δ 1 , 1 + x 3 , 9 δ 1 , 2 + x 10 , 2 δ 2 , 1 + x 10 , 9 δ 2 , 2 frac{partial L}{partial k_{3, 2}} = x_{3, 2}delta_{1, 1} + x_{3, 9}delta_{1, 2} + x_{10, 2}delta_{2, 1} + x_{10, 9}delta_{2, 2} k3,2L=x3,2δ1,1+x3,9δ1,2+x10,2δ2,1+x10,9δ2,2
∂ L ∂ k 3 , 3 = x 3 , 3 δ 1 , 1 + x 3 , 10 δ 1 , 2 + x 10 , 3 δ 2 , 1 + x 10 , 10 δ 2 , 2 frac{partial L}{partial k_{3, 3}} = x_{3, 3}delta_{1, 1} + x_{3, 10}delta_{1, 2} + x_{10, 3}delta_{2, 1} + x_{10, 10}delta_{2, 2} k3,3L=x3,3δ1,1+x3,10δ1,2+x10,3δ2,1+x10,10δ2,2

∂ L ∂ b = δ 1 , 1 + δ 1 , 2 + δ 2 , 1 + δ 2 , 2 frac{partial L}{partial b} = delta_{1, 1} + delta_{1, 2} + delta_{2, 1} + delta_{2, 2} bL=δ1,1+δ1,2+δ2,1+δ2,2

  按照之前的算法,由于正向卷积中的步长stride为7,因此,在计算更新梯度的过程中,我们依然需要在接收到的误差矩阵的每两个相邻的元素之间插入(7 - 1 = 6)个0,即:
[ δ 1 , 1 0 0 0 0 0 0 δ 1 , 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 δ 2 , 1 0 0 0 0 0 0 δ 2 , 2 ] begin{bmatrix} delta_{1, 1} & 0 & 0 & 0 & 0 & 0 & 0 & delta_{1, 2} \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & \ delta_{2, 1} & 0 & 0 & 0 & 0 & 0 & 0 & delta_{2, 2} \ end{bmatrix} δ1,1000000δ2,1000000000000000000000000000000000000000000000000δ1,2000000δ2,2

  接着我们拿输入矩阵 x x x和上面的矩阵进行步长为1的卷积,则可以得到卷积核参数的更新梯度。即:
[ ∂ L ∂ k 1 , 1 ∂ L ∂ k 1 , 2 ∂ L ∂ k 1 , 3 ∂ L ∂ k 2 , 1 ∂ L ∂ k 2 , 2 ∂ L ∂ k 2 , 3 ∂ L ∂ k 3 , 1 ∂ L ∂ k 3 , 2 ∂ L ∂ k 3 , 3 ] = [ x 1 , 1 x 1 , 2 x 1 , 3 x 1 , 4 x 1 , 5 x 1 , 6 x 1 , 7 x 1 , 8 x 1 , 9 x 1 , 10 x 2 , 1 x 2 , 2 x 2 , 3 x 2 , 4 x 2 , 5 x 2 , 6 x 2 , 7 x 2 , 8 x 2 , 9 x 2 , 10 x 3 , 1 x 3 , 2 x 3 , 3 x 3 , 4 x 3 , 5 x 3 , 6 x 3 , 7 x 3 , 8 x 3 , 9 x 3 , 10 x 4 , 1 x 4 , 2 x 4 , 3 x 4 , 4 x 4 , 5 x 4 , 6 x 4 , 7 x 4 , 8 x 4 , 9 x 4 , 10 x 5 , 1 x 5 , 2 x 5 , 3 x 5 , 4 x 5 , 5 x 5 , 6 x 5 , 7 x 5 , 8 x 5 , 9 x 5 , 10 x 6 , 1 x 6 , 2 x 6 , 3 x 6 , 4 x 6 , 5 x 6 , 6 x 6 , 7 x 6 , 8 x 6 , 9 x 6 , 10 x 7 , 1 x 7 , 2 x 7 , 3 x 7 , 4 x 7 , 5 x 7 , 6 x 7 , 7 x 7 , 8 x 7 , 9 x 7 , 10 x 8 , 1 x 8 , 2 x 8 , 3 x 8 , 4 x 8 , 5 x 8 , 6 x 8 , 7 x 8 , 8 x 8 , 9 x 8 , 10 x 9 , 1 x 9 , 2 x 9 , 3 x 9 , 4 x 9 , 5 x 9 , 6 x 9 , 7 x 9 , 8 x 9 , 9 x 9 , 10 x 10 , 1 x 10 , 2 x 10 , 3 x 10 , 4 x 10 , 5 x 10 , 6 x 10 , 7 x 10 , 8 x 10 , 9 x 10 , 10 ]    c o n v ( s t r i d e = 1 )    [ δ 1 , 1 0 0 0 0 0 0 δ 1 , 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 δ 2 , 1 0 0 0 0 0 0 δ 2 , 2 ] begin{bmatrix} frac{partial L}{partial k_{1, 1}} & frac{partial L}{partial k_{1, 2}} & frac{partial L}{partial k_{1, 3}} \ frac{partial L}{partial k_{2, 1}} & frac{partial L}{partial k_{2, 2}} & frac{partial L}{partial k_{2, 3}} \ frac{partial L}{partial k_{3, 1}} & frac{partial L}{partial k_{3, 2}} & frac{partial L}{partial k_{3, 3}} \ end{bmatrix} = \ begin{bmatrix} x_{1, 1} & x_{1, 2} & x_{1, 3} &x_{1, 4} &x_{1, 5} & x_{1, 6} & x_{1, 7} & x_{1, 8} &x_{1, 9} &x_{1, 10} \ x_{2, 1} & x_{2, 2} & x_{2, 3} &x_{2, 4} &x_{2, 5} & x_{2, 6} & x_{2, 7} & x_{2, 8} &x_{2, 9} &x_{2, 10} \ x_{3, 1} & x_{3, 2} & x_{3, 3} &x_{3, 4} &x_{3, 5} & x_{3, 6} & x_{3, 7} & x_{3, 8} &x_{3, 9} &x_{3, 10} \ x_{4, 1} & x_{4, 2} & x_{4, 3} &x_{4, 4} &x_{4, 5} & x_{4, 6} & x_{4, 7} & x_{4, 8} &x_{4, 9} &x_{4, 10} \ x_{5, 1} & x_{5, 2} & x_{5, 3} &x_{5, 4} &x_{5, 5} & x_{5, 6} & x_{5, 7} & x_{5, 8} &x_{5, 9} &x_{5, 10} \ x_{6, 1} & x_{6, 2} & x_{6, 3} &x_{6, 4} &x_{6, 5} & x_{6, 6} & x_{6, 7} & x_{6, 8} &x_{6, 9} &x_{6, 10} \ x_{7, 1} & x_{7, 2} & x_{7, 3} &x_{7, 4} &x_{7, 5} & x_{7, 6} & x_{7, 7} & x_{7, 8} &x_{7, 9} &x_{7, 10} \ x_{8, 1} & x_{8, 2} & x_{8, 3} &x_{8, 4} &x_{8, 5} & x_{8, 6} & x_{8, 7} & x_{8, 8} &x_{8, 9} &x_{8, 10} \ x_{9, 1} & x_{9, 2} & x_{9, 3} &x_{9, 4} &x_{9, 5} & x_{9, 6} & x_{9, 7} & x_{9, 8} &x_{9, 9} &x_{9, 10} \ x_{10, 1} & x_{10, 2} & x_{10, 3} &x_{10, 4} &x_{10, 5} & x_{10, 6} & x_{10, 7} & x_{10, 8} &x_{10, 9} &x_{10, 10} \ end{bmatrix} ; conv(stride = 1); begin{bmatrix} delta_{1, 1} & 0 & 0 & 0 & 0 & 0 & 0 & delta_{1, 2} \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\ delta_{2, 1} & 0 & 0 & 0 & 0 & 0 & 0 & delta_{2, 2} \ end{bmatrix} k1,1Lk2,1Lk3,1Lk1,2Lk2,2Lk3,2Lk1,3Lk2,3Lk3,3L=x1,1x2,1x3,1x4,1x5,1x6,1x7,1x8,1x9,1x10,1x1,2x2,2x3,2x4,2x5,2x6,2x7,2x8,2x9,2x10,2x1,3x2,3x3,3x4,3x5,3x6,3x7,3x8,3x9,3x10,3x1,4x2,4x3,4x4,4x5,4x6,4x7,4x8,4x9,4x10,4x1,5x2,5x3,5x4,5x5,5x6,5x7,5x8,5x9,5x10,5x1,6x2,6x3,6x4,6x5,6x6,6x7,6x8,6x9,6x10,6x1,7x2,7x3,7x4,7x5,7x6,7x7,7x8,7x9,7x10,7x1,8x2,8x3,8x4,8x5,8x6,8x7,8x8,8x9,8x10,8x1,9x2,9x3,9x4,9x5,9x6,9x7,9x8,9x9,9x10,9x1,10x2,10x3,10x4,10x5,10x6,10x7,10x8,10x9,10x10,10conv(stride=1)δ1,1000000δ2,1000000000000000000000000000000000000000000000000δ1,2000000δ2,2

  经过计算,两者的结果是相同的,这也就验证了我们的算法在一些比较极端的情况下也是正确的。

四、总结

  经过一个比较极端的卷积实例的讲解,我们验证了我们算法的正确性,而下一步就是用代码实现二维平面上的卷积及其反向传播算法。

最后

以上就是拉长百褶裙为你收集整理的机器学习复习:卷积的方向传播之三:步长stride为s的二维卷积方法的反向传播算法:一个十分极端的例子的全部内容,希望文章能够帮你解决机器学习复习:卷积的方向传播之三:步长stride为s的二维卷积方法的反向传播算法:一个十分极端的例子所遇到的程序开发问题。

如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。

本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
点赞(208)

评论列表共有 0 条评论

立即
投稿
返回
顶部