多元线性回归方程正规方程解（Normal Equation）公式推导详细过程多元线性方程公式公式推导参考资料

118 阅读 0 评论 78 点赞

我是靠谱客的博主落后手链，这篇文章主要介绍多元线性回归方程正规方程解（Normal Equation）公式推导详细过程多元线性方程公式公式推导参考资料，现在分享给大家，希望可以做个参考。

多元线性方程公式

定义多元线性方程的损失函数如下：
$frac{1}{2m}sum_{i=1}^m (hat{y}^{(i)} - y^{(i)})^2~~~~~~~~~~~~(1)$

其中， $hat{y}^{(i)}$ 为：
$hat{y}^{(i)} = theta_0 + theta_1 X_1^{(i)} + theta_2 X_2^{(i)} + cdots + theta_n X_n^{(i)}~~~~~~~~~~(2)$

其中， $m$ 为样本个数， $n$ 为特征数量

定义向量：
$&=(theta_0,theta_1, cdots , theta_n)^T \\ X^{(i)} &= (X_0^{(i)}, X_1^{(i)},X_2^{(i)}, cdots, X_n^{(i)})^T~~~，其中i=(1,2,cdots, m)，X_0^{(i)}equiv1 \\ X_j &= (X_j^{(1)}, X_j^{(2)}, cdots, X_j^{(m)})^T~~~，其中j = (0, 1,2,cdots, n) \\ y &= (y^{(1)}, y^{(2)}, cdots, y^{(m)}) end{aligned}$

定义矩阵：

$(X^{(1)}, X^{(2)}, cdots, X^{(m)})^T = (X_0, X_1, X_2, cdots, X_n) = begin{pmatrix} X_0^{(1)} & X_1^{(1)} & X_2^{(1)} & cdots X_n^{(1)} \\ X_0^{(2)} & X_1^{(2)} & X_2^{(2)} & cdots X_n^{(2)} \\ cdots & & & cdots \\ X_0^{(n)} & X_1^{(m)} & X_2^{(m)} & cdots X_n^{(m)} \ end{pmatrix}_{m times (n+1)} end{aligned}$

当 $θ$ 取下值时，损失函数最小：
$(X^T X)^{-1} X^T y$

公式推导

温馨提示：公式推导过程不难，但很绕，请耐心…

将(2)式代入(1)式得：

$frac{1}{2m}sum_{i=1}^m (theta_0 + theta_1 X_1^{(i)} + theta_2 X_2^{(i)} + cdots + theta_n X_n^{(i)} - y^{(i)})^2 \\ & = frac{1}{2m}sum_{i=1}^m (theta_0X_0^{(i)} + theta_1 X_1^{(i)} + theta_2 X_2^{(i)} + cdots + theta_n X_n^{(i)} - y^{(i)})^2 ~~~（补个X_0^{(i)}）\\ &= frac{1}{2m}[ (theta_0X_0^{(1)} + theta_1X_1^{(1)} + theta_2X_2^{(1)} + cdots + theta_nX_n^{(1)} - y^{(1)})^2 \\ &~~~~~~~~+ (theta_0X_0^{(2)} + theta_1X_1^{(2)}+ theta_2X_2^{(2)} + cdots + theta_nX_n^{(2)}- y^{(2)})^2 \\ &~~~~~~~~+ cdots \\ &~~~~~~~~+ (theta_0X_0^{(m)} + theta_1X_1^{(2)}+ theta_2X_2^{(m)} + cdots + theta_nX_n^{(m)}- y^{(m)})^2 ] end{aligned}$

现在对 $θ$ 求偏导，下面只对 $theta_1$ 求偏导，其他的依次类推：

$frac{partial}{partialtheta_1} J(theta) & = frac{1}{m}[ X_1^{(1)} (theta_0X_0^{(1)} + theta_1X_1^{(1)} + theta_2X_2^{(1)} + cdots + theta_nX_n^{(1)} - y^{(1)}) \\ &~~~~~~ + X_1^{(2)}(theta_0X_0^{(2)} + theta_1X_1^{(2)}+ theta_2X_2^{(2)} + cdots + theta_nX_n^{(2)}- y^{(2)}) \\ &~~~~~~+ cdots \\ &~~~~~~+ X_1^{(m)}(theta_0X_0^{(m)} + theta_1X_1^{(2)}+ theta_2X_2^{(m)} + cdots + theta_nX_n^{(m)}- y^{(m)}) ] \\\ & = frac{1}{m}[(theta_0X_0^{(1)}X_1^{(1)} + theta_1X_1^{(1)}X_1^{(1)} + theta_2X_2^{(1)}X_1^{(1)} + cdots + theta_nX_n^{(1)}X_1^{(1)} - y^{(1)}X_1^{(1)}) \\ &~~~~~~ + (theta_0X_0^{(2)}X_1^{(2)} + theta_1X_1^{(2)}X_1^{(2)}+ theta_2X_2^{(2)}X_1^{(2)} + cdots + theta_nX_n^{(2)}X_1^{(2)}- y^{(2)}X_1^{(2)}) \\ &~~~~~~+ cdots \\ &~~~~~~+ (theta_0X_0^{(m)}X_1^{(m)} + theta_1X_1^{(2)}X_1^{(m)}+ theta_2X_2^{(m)}X_1^{(m)} + cdots + theta_nX_n^{(m)}- y^{(m)}X_1^{(m)}) ] ~~~~~（把 X_1^{(i)} 乘进去） \\\ & = frac{1}{m} [ (X_0^{(1)}X_1^{(1)} + X_0^{(2)}X_1^{(2)} + cdots + X_0^{(m)}X_1^{(m)}) cdot theta_0 \\ & ~~~~~~+(X_1^{(1)}X_1^{(1)} + X_1^{(2)}X_1^{(2)} + cdots + X_1^{(m)}X_1^{(m)}) cdot theta_1 \\ & ~~~~~~+(X_2^{(1)}X_1^{(1)} + X_2^{(2)}X_1^{(2)} + cdots + X_2^{(m)}X_1^{(m)}) cdot theta_2 \\ & ~~~~~~+ cdots \\ & ~~~~~~+ (X_n^{(1)}X_1^{(1)} + X_n^{(2)}X_1^{(2)} + cdots + X_n^{(m)}X_1^{(m)}) cdot theta_n\\ & ~~~~~~- (y^{(1)} X_1^{(1)} + y^{(2)} X_1^{(2)}) + cdots + y^{(m)} X_1^{(m)}] ~~~~~~~~~（将theta 提出来）\\\ & = frac{1}{m} [theta_0 cdot sum_{i=1}^m X_0^{(i)}X_1^{(i)} + theta_1 cdot sum_{i=1}^m X_1^{(i)}X_1^{(i)} + theta_2 cdot sum_{i=1}^m X_2^{(i)}X_1^{(i)} + cdots + theta_n cdot sum_{i=1}^m X_n^{(i)}X_1^{(i)} - sum_{i=1}^m y^{(i)}X_1^{(i)}] end{aligned}$

我们先对 $sum_{i=1}^m X_a^{(i)}X_b^{(i)}$ 做下研究：
$sum_{i=1}^m X_a^{(i)}X_b^{(i)} = (X_a^{(1)}, X_a^{(2)}, cdots , X_a^{(n)}) cdot begin{pmatrix} X_b^{(1)} \\ X_b^{(2)}\\ cdots\\ X_b^{(n)}\ end{pmatrix} = X_a^T cdot X_b = X_b^Tcdot X_a ~~~~~~~~~~~(3)$

将(3)式代入 $frac{partial}{partialtheta_1} J(theta)$ 得

$frac{partial}{partialtheta_1} J(theta) & = frac{1}{m} [theta_0 cdot sum_{i=1}^m X_0^{(i)}X_1^{(i)} + theta_1 cdot sum_{i=1}^m X_1^{(i)}X_1^{(i)} + theta_2 cdot sum_{i=1}^m X_2^{(i)}X_1^{(i)} + cdots + theta_n cdot sum_{i=1}^m X_n^{(i)}X_1^{(i)} - sum_{i=1}^m y^{(i)}X_1^{(i)}] \\ & = frac{1}{m} (X_0^T cdot X_1 cdot theta_0 + X_1^T cdot X_1 cdot theta_1 + X_2^T cdot X_1 cdot theta_2 +cdots + X_n^T cdot X_1 cdot theta_n - X_1^Tcdot y) \\ end{aligned}$

令 $frac{partial}{partialtheta_1} J(theta)=0$ ，得到如下等式：

$(X_0^T X_1, X_1^T X_1, X_2^T X_1 cdots, X_n^T X_1) cdot begin{pmatrix} theta_0 \\ theta_1 \\ cdots\\ theta_n\ end{pmatrix} = X_1^Tcdot y$

与 $frac{partial}{partialtheta_1} J(theta)=0$ 同理，对 $theta_0, theta_2, theta_3, cdots, theta_n$ 做偏导等零，最终得：

$X_0^T X_0, X_1^T X_0, X_2^T X_0 cdots, X_n^T X_0 \\ X_0^T X_1, X_1^T X_1, X_2^T X_1 cdots, X_n^T X_1 \\ X_0^T X_2, X_1^T X_2, X_2^T X_2 cdots, X_n^T X_2 \\ cdots\\ X_0^T X_n, X_1^T X_n, X_2^T X_n cdots, X_n^T X_n\ end{pmatrix} cdot begin{pmatrix} theta_0 \\ theta_1 \\ theta_2 \\ cdots\\ theta_n\ end{pmatrix} = begin{pmatrix} X_0^Tcdot y \\ X_1^Tcdot y \\ X_2^Tcdot y \\ cdots\\ X_n^Tcdot y\ end{pmatrix}$

左右两边进行处理，得（由(3)式子知 $X_a^T cdot X_b = X_b^Tcdot X_a$ ，这里交换了一下）：
$X_0^T \\ X_1^T \\ X_2^T \\ cdots\\ X_n^T\ end{pmatrix} cdot (X_0, X_1, X_2, cdots, X_n) cdot begin{pmatrix} theta_0 \\ theta_1 \\ theta_2 \\ cdots\\ theta_n\ end{pmatrix} = begin{pmatrix} X_0^Tcdot y \\ X_1^Tcdot y \\ X_2^Tcdot y \\ cdots\\ X_n^Tcdot y\ end{pmatrix}$