我是靠谱客的博主 落后手链,最近开发中收集的这篇文章主要介绍多元线性回归方程正规方程解(Normal Equation)公式推导详细过程多元线性方程公式公式推导参考资料,觉得挺不错的,现在分享给大家,希望可以做个参考。

概述

多元线性方程公式

定义多元线性方程的损失函数如下:
J ( θ ) = 1 2 m ∑ i = 1 m ( y ^ ( i ) − y ( i ) ) 2              ( 1 ) J(theta) = frac{1}{2m}sum_{i=1}^m (hat{y}^{(i)} - y^{(i)})^2~~~~~~~~~~~~(1) J(θ)=2m1i=1m(y^(i)y(i))2            (1)

其中, y ^ ( i ) hat{y}^{(i)} y^(i) 为:
y ^ ( i ) = θ 0 + θ 1 X 1 ( i ) + θ 2 X 2 ( i ) + ⋯ + θ n X n ( i )            ( 2 ) hat{y}^{(i)} = theta_0 + theta_1 X_1^{(i)} + theta_2 X_2^{(i)} + cdots + theta_n X_n^{(i)}~~~~~~~~~~(2) y^(i)=θ0+θ1X1(i)+θ2X2(i)++θnXn(i)          (2)

其中, m m m 为样本个数, n n n 为特征数量

定义向量:
θ = ( θ 0 , θ 1 , ⋯   , θ n ) T X ( i ) = ( X 0 ( i ) , X 1 ( i ) , X 2 ( i ) , ⋯   , X n ( i ) ) T     , 其 中 i = ( 1 , 2 , ⋯   , m ) , X 0 ( i ) ≡ 1 X j = ( X j ( 1 ) , X j ( 2 ) , ⋯   , X j ( m ) ) T     , 其 中 j = ( 0 , 1 , 2 , ⋯   , n ) y = ( y ( 1 ) , y ( 2 ) , ⋯   , y ( m ) ) begin{aligned} theta &=(theta_0,theta_1, cdots , theta_n)^T \\ X^{(i)} &= (X_0^{(i)}, X_1^{(i)},X_2^{(i)}, cdots, X_n^{(i)})^T~~~,其中i=(1,2,cdots, m),X_0^{(i)}equiv1 \\ X_j &= (X_j^{(1)}, X_j^{(2)}, cdots, X_j^{(m)})^T~~~,其中j = (0, 1,2,cdots, n) \\ y &= (y^{(1)}, y^{(2)}, cdots, y^{(m)}) end{aligned} θX(i)Xjy=(θ0,θ1,,θn)T=(X0(i),X1(i),X2(i),,Xn(i))T   i=(1,2,,m)X0(i)1=(Xj(1),Xj(2),,Xj(m))T   j=(0,1,2,,n)=(y(1),y(2),,y(m))

定义矩阵:

X = ( X ( 1 ) , X ( 2 ) , ⋯   , X ( m ) ) T = ( X 0 , X 1 , X 2 , ⋯   , X n ) = ( X 0 ( 1 ) X 1 ( 1 ) X 2 ( 1 ) ⋯ X n ( 1 ) X 0 ( 2 ) X 1 ( 2 ) X 2 ( 2 ) ⋯ X n ( 2 ) ⋯ ⋯ X 0 ( n ) X 1 ( m ) X 2 ( m ) ⋯ X n ( m ) ) m × ( n + 1 ) begin{aligned} X = (X^{(1)}, X^{(2)}, cdots, X^{(m)})^T = (X_0, X_1, X_2, cdots, X_n) = begin{pmatrix} X_0^{(1)} & X_1^{(1)} & X_2^{(1)} & cdots X_n^{(1)} \\ X_0^{(2)} & X_1^{(2)} & X_2^{(2)} & cdots X_n^{(2)} \\ cdots & & & cdots \\ X_0^{(n)} & X_1^{(m)} & X_2^{(m)} & cdots X_n^{(m)} \ end{pmatrix}_{m times (n+1)} end{aligned} X=(X(1),X(2),,X(m))T=(X0,X1,X2,,Xn)=X0(1)X0(2)X0(n)X1(1)X1(2)X1(m)X2(1)X2(2)X2(m)Xn(1)Xn(2)Xn(m)m×(n+1)

θ theta θ 取下值时,损失函数最小
θ = ( X T X ) − 1 X T y theta = (X^T X)^{-1} X^T y θ=(XTX)1XTy

公式推导

温馨提示:公式推导过程不难,但很绕,请耐心…

将(2)式代入(1)式得:

J ( θ ) = 1 2 m ∑ i = 1 m ( θ 0 + θ 1 X 1 ( i ) + θ 2 X 2 ( i ) + ⋯ + θ n X n ( i ) − y ( i ) ) 2 = 1 2 m ∑ i = 1 m ( θ 0 X 0 ( i ) + θ 1 X 1 ( i ) + θ 2 X 2 ( i ) + ⋯ + θ n X n ( i ) − y ( i ) ) 2     ( 补 个 X 0 ( i ) ) = 1 2 m [ ( θ 0 X 0 ( 1 ) + θ 1 X 1 ( 1 ) + θ 2 X 2 ( 1 ) + ⋯ + θ n X n ( 1 ) − y ( 1 ) ) 2          + ( θ 0 X 0 ( 2 ) + θ 1 X 1 ( 2 ) + θ 2 X 2 ( 2 ) + ⋯ + θ n X n ( 2 ) − y ( 2 ) ) 2          + ⋯          + ( θ 0 X 0 ( m ) + θ 1 X 1 ( 2 ) + θ 2 X 2 ( m ) + ⋯ + θ n X n ( m ) − y ( m ) ) 2 ] begin{aligned} J(theta) & = frac{1}{2m}sum_{i=1}^m (theta_0 + theta_1 X_1^{(i)} + theta_2 X_2^{(i)} + cdots + theta_n X_n^{(i)} - y^{(i)})^2 \\ & = frac{1}{2m}sum_{i=1}^m (theta_0X_0^{(i)} + theta_1 X_1^{(i)} + theta_2 X_2^{(i)} + cdots + theta_n X_n^{(i)} - y^{(i)})^2 ~~~(补个X_0^{(i)})\\ &= frac{1}{2m}[ (theta_0X_0^{(1)} + theta_1X_1^{(1)} + theta_2X_2^{(1)} + cdots + theta_nX_n^{(1)} - y^{(1)})^2 \\ &~~~~~~~~+ (theta_0X_0^{(2)} + theta_1X_1^{(2)}+ theta_2X_2^{(2)} + cdots + theta_nX_n^{(2)}- y^{(2)})^2 \\ &~~~~~~~~+ cdots \\ &~~~~~~~~+ (theta_0X_0^{(m)} + theta_1X_1^{(2)}+ theta_2X_2^{(m)} + cdots + theta_nX_n^{(m)}- y^{(m)})^2 ] end{aligned} J(θ)=2m1i=1m(θ0+θ1X1(i)+θ2X2(i)++θnXn(i)y(i))2=2m1i=1m(θ0X0(i)+θ1X1(i)+θ2X2(i)++θnXn(i)y(i))2   X0(i)=2m1[(θ0X0(1)+θ1X1(1)+θ2X2(1)++θnXn(1)y(1))2        +(θ0X0(2)+θ1X1(2)+θ2X2(2)++θnXn(2)y(2))2        +        +(θ0X0(m)+θ1X1(2)+θ2X2(m)++θnXn(m)y(m))2]

现在对 θ theta θ 求偏导,下面只对 θ 1 theta_1 θ1 求偏导,其他的依次类推:

∂ ∂ θ 1 J ( θ ) = 1 m [ X 1 ( 1 ) ( θ 0 X 0 ( 1 ) + θ 1 X 1 ( 1 ) + θ 2 X 2 ( 1 ) + ⋯ + θ n X n ( 1 ) − y ( 1 ) )        + X 1 ( 2 ) ( θ 0 X 0 ( 2 ) + θ 1 X 1 ( 2 ) + θ 2 X 2 ( 2 ) + ⋯ + θ n X n ( 2 ) − y ( 2 ) )        + ⋯        + X 1 ( m ) ( θ 0 X 0 ( m ) + θ 1 X 1 ( 2 ) + θ 2 X 2 ( m ) + ⋯ + θ n X n ( m ) − y ( m ) ) ] = 1 m [ ( θ 0 X 0 ( 1 ) X 1 ( 1 ) + θ 1 X 1 ( 1 ) X 1 ( 1 ) + θ 2 X 2 ( 1 ) X 1 ( 1 ) + ⋯ + θ n X n ( 1 ) X 1 ( 1 ) − y ( 1 ) X 1 ( 1 ) )        + ( θ 0 X 0 ( 2 ) X 1 ( 2 ) + θ 1 X 1 ( 2 ) X 1 ( 2 ) + θ 2 X 2 ( 2 ) X 1 ( 2 ) + ⋯ + θ n X n ( 2 ) X 1 ( 2 ) − y ( 2 ) X 1 ( 2 ) )        + ⋯        + ( θ 0 X 0 ( m ) X 1 ( m ) + θ 1 X 1 ( 2 ) X 1 ( m ) + θ 2 X 2 ( m ) X 1 ( m ) + ⋯ + θ n X n ( m ) − y ( m ) X 1 ( m ) ) ]       ( 把 X 1 ( i ) 乘 进 去 ) = 1 m [ ( X 0 ( 1 ) X 1 ( 1 ) + X 0 ( 2 ) X 1 ( 2 ) + ⋯ + X 0 ( m ) X 1 ( m ) ) ⋅ θ 0        + ( X 1 ( 1 ) X 1 ( 1 ) + X 1 ( 2 ) X 1 ( 2 ) + ⋯ + X 1 ( m ) X 1 ( m ) ) ⋅ θ 1        + ( X 2 ( 1 ) X 1 ( 1 ) + X 2 ( 2 ) X 1 ( 2 ) + ⋯ + X 2 ( m ) X 1 ( m ) ) ⋅ θ 2        + ⋯        + ( X n ( 1 ) X 1 ( 1 ) + X n ( 2 ) X 1 ( 2 ) + ⋯ + X n ( m ) X 1 ( m ) ) ⋅ θ n        − ( y ( 1 ) X 1 ( 1 ) + y ( 2 ) X 1 ( 2 ) ) + ⋯ + y ( m ) X 1 ( m ) ]           ( 将 θ 提 出 来 ) = 1 m [ θ 0 ⋅ ∑ i = 1 m X 0 ( i ) X 1 ( i ) + θ 1 ⋅ ∑ i = 1 m X 1 ( i ) X 1 ( i ) + θ 2 ⋅ ∑ i = 1 m X 2 ( i ) X 1 ( i ) + ⋯ + θ n ⋅ ∑ i = 1 m X n ( i ) X 1 ( i ) − ∑ i = 1 m y ( i ) X 1 ( i ) ] begin{aligned} frac{partial}{partialtheta_1} J(theta) & = frac{1}{m}[ X_1^{(1)} (theta_0X_0^{(1)} + theta_1X_1^{(1)} + theta_2X_2^{(1)} + cdots + theta_nX_n^{(1)} - y^{(1)}) \\ &~~~~~~ + X_1^{(2)}(theta_0X_0^{(2)} + theta_1X_1^{(2)}+ theta_2X_2^{(2)} + cdots + theta_nX_n^{(2)}- y^{(2)}) \\ &~~~~~~+ cdots \\ &~~~~~~+ X_1^{(m)}(theta_0X_0^{(m)} + theta_1X_1^{(2)}+ theta_2X_2^{(m)} + cdots + theta_nX_n^{(m)}- y^{(m)}) ] \\\ & = frac{1}{m}[(theta_0X_0^{(1)}X_1^{(1)} + theta_1X_1^{(1)}X_1^{(1)} + theta_2X_2^{(1)}X_1^{(1)} + cdots + theta_nX_n^{(1)}X_1^{(1)} - y^{(1)}X_1^{(1)}) \\ &~~~~~~ + (theta_0X_0^{(2)}X_1^{(2)} + theta_1X_1^{(2)}X_1^{(2)}+ theta_2X_2^{(2)}X_1^{(2)} + cdots + theta_nX_n^{(2)}X_1^{(2)}- y^{(2)}X_1^{(2)}) \\ &~~~~~~+ cdots \\ &~~~~~~+ (theta_0X_0^{(m)}X_1^{(m)} + theta_1X_1^{(2)}X_1^{(m)}+ theta_2X_2^{(m)}X_1^{(m)} + cdots + theta_nX_n^{(m)}- y^{(m)}X_1^{(m)}) ] ~~~~~(把 X_1^{(i)} 乘进去) \\\ & = frac{1}{m} [ (X_0^{(1)}X_1^{(1)} + X_0^{(2)}X_1^{(2)} + cdots + X_0^{(m)}X_1^{(m)}) cdot theta_0 \\ & ~~~~~~+(X_1^{(1)}X_1^{(1)} + X_1^{(2)}X_1^{(2)} + cdots + X_1^{(m)}X_1^{(m)}) cdot theta_1 \\ & ~~~~~~+(X_2^{(1)}X_1^{(1)} + X_2^{(2)}X_1^{(2)} + cdots + X_2^{(m)}X_1^{(m)}) cdot theta_2 \\ & ~~~~~~+ cdots \\ & ~~~~~~+ (X_n^{(1)}X_1^{(1)} + X_n^{(2)}X_1^{(2)} + cdots + X_n^{(m)}X_1^{(m)}) cdot theta_n\\ & ~~~~~~- (y^{(1)} X_1^{(1)} + y^{(2)} X_1^{(2)}) + cdots + y^{(m)} X_1^{(m)}] ~~~~~~~~~(将theta 提出来)\\\ & = frac{1}{m} [theta_0 cdot sum_{i=1}^m X_0^{(i)}X_1^{(i)} + theta_1 cdot sum_{i=1}^m X_1^{(i)}X_1^{(i)} + theta_2 cdot sum_{i=1}^m X_2^{(i)}X_1^{(i)} + cdots + theta_n cdot sum_{i=1}^m X_n^{(i)}X_1^{(i)} - sum_{i=1}^m y^{(i)}X_1^{(i)}] end{aligned} θ1J(θ)=m1[X1(1)(θ0X0(1)+θ1X1(1)+θ2X2(1)++θnXn(1)y(1))      +X1(2)(θ0X0(2)+θ1X1(2)+θ2X2(2)++θnXn(2)y(2))      +      +X1(m)(θ0X0(m)+θ1X1(2)+θ2X2(m)++θnXn(m)y(m))]=m1[(θ0X0(1)X1(1)+θ1X1(1)X1(1)+θ2X2(1)X1(1)++θnXn(1)X1(1)y(1)X1(1))      +(θ0X0(2)X1(2)+θ1X1(2)X1(2)+θ2X2(2)X1(2)++θnXn(2)X1(2)y(2)X1(2))      +      +(θ0X0(m)X1(m)+θ1X1(2)X1(m)+θ2X2(m)X1(m)++θnXn(m)y(m)X1(m))]     X1(i)=m1[(X0(1)X1(1)+X0(2)X1(2)++X0(m)X1(m))θ0      +(X1(1)X1(1)+X1(2)X1(2)++X1(m)X1(m))θ1      +(X2(1)X1(1)+X2(2)X1(2)++X2(m)X1(m))θ2      +      +(Xn(1)X1(1)+Xn(2)X1(2)++Xn(m)X1(m))θn      (y(1)X1(1)+y(2)X1(2))++y(m)X1(m)]         θ=m1[θ0i=1mX0(i)X1(i)+θ1i=1mX1(i)X1(i)+θ2i=1mX2(i)X1(i)++θni=1mXn(i)X1(i)i=1my(i)X1(i)]

我们先对 ∑ i = 1 m X a ( i ) X b ( i ) sum_{i=1}^m X_a^{(i)}X_b^{(i)} i=1mXa(i)Xb(i) 做下研究:
∑ i = 1 m X a ( i ) X b ( i ) = ( X a ( 1 ) , X a ( 2 ) , ⋯   , X a ( n ) ) ⋅ ( X b ( 1 ) X b ( 2 ) ⋯ X b ( n ) ) = X a T ⋅ X b = X b T ⋅ X a             ( 3 ) sum_{i=1}^m X_a^{(i)}X_b^{(i)} = (X_a^{(1)}, X_a^{(2)}, cdots , X_a^{(n)}) cdot begin{pmatrix} X_b^{(1)} \\ X_b^{(2)}\\ cdots\\ X_b^{(n)}\ end{pmatrix} = X_a^T cdot X_b = X_b^Tcdot X_a ~~~~~~~~~~~(3) i=1mXa(i)Xb(i)=(Xa(1),Xa(2),,Xa(n))Xb(1)Xb(2)Xb(n)=XaTXb=XbTXa           (3)

将(3)式代入 ∂ ∂ θ 1 J ( θ ) frac{partial}{partialtheta_1} J(theta) θ1J(θ)

∂ ∂ θ 1 J ( θ ) = 1 m [ θ 0 ⋅ ∑ i = 1 m X 0 ( i ) X 1 ( i ) + θ 1 ⋅ ∑ i = 1 m X 1 ( i ) X 1 ( i ) + θ 2 ⋅ ∑ i = 1 m X 2 ( i ) X 1 ( i ) + ⋯ + θ n ⋅ ∑ i = 1 m X n ( i ) X 1 ( i ) − ∑ i = 1 m y ( i ) X 1 ( i ) ] = 1 m ( X 0 T ⋅ X 1 ⋅ θ 0 + X 1 T ⋅ X 1 ⋅ θ 1 + X 2 T ⋅ X 1 ⋅ θ 2 + ⋯ + X n T ⋅ X 1 ⋅ θ n − X 1 T ⋅ y ) begin{aligned} frac{partial}{partialtheta_1} J(theta) & = frac{1}{m} [theta_0 cdot sum_{i=1}^m X_0^{(i)}X_1^{(i)} + theta_1 cdot sum_{i=1}^m X_1^{(i)}X_1^{(i)} + theta_2 cdot sum_{i=1}^m X_2^{(i)}X_1^{(i)} + cdots + theta_n cdot sum_{i=1}^m X_n^{(i)}X_1^{(i)} - sum_{i=1}^m y^{(i)}X_1^{(i)}] \\ & = frac{1}{m} (X_0^T cdot X_1 cdot theta_0 + X_1^T cdot X_1 cdot theta_1 + X_2^T cdot X_1 cdot theta_2 +cdots + X_n^T cdot X_1 cdot theta_n - X_1^Tcdot y) \\ end{aligned} θ1J(θ)=m1[θ0i=1mX0(i)X1(i)+θ1i=1mX1(i)X1(i)+θ2i=1mX2(i)X1(i)++θni=1mXn(i)X1(i)i=1my(i)X1(i)]=m1(X0TX1θ0+X1TX1θ1+X2TX1θ2++XnTX1θnX1Ty)

∂ ∂ θ 1 J ( θ ) = 0 frac{partial}{partialtheta_1} J(theta)=0 θ1J(θ)=0 ,得到如下等式:

( X 0 T X 1 , X 1 T X 1 , X 2 T X 1 ⋯   , X n T X 1 ) ⋅ ( θ 0 θ 1 ⋯ θ n ) = X 1 T ⋅ y (X_0^T X_1, X_1^T X_1, X_2^T X_1 cdots, X_n^T X_1) cdot begin{pmatrix} theta_0 \\ theta_1 \\ cdots\\ theta_n\ end{pmatrix} = X_1^Tcdot y (X0TX1,X1TX1,X2TX1,XnTX1)θ0θ1θn=X1Ty

∂ ∂ θ 1 J ( θ ) = 0 frac{partial}{partialtheta_1} J(theta)=0 θ1J(θ)=0 同理,对 θ 0 , θ 2 , θ 3 , ⋯   , θ n theta_0, theta_2, theta_3, cdots, theta_n θ0,θ2,θ3,,θn 做偏导等零,最终得:

( X 0 T X 0 , X 1 T X 0 , X 2 T X 0 ⋯   , X n T X 0 X 0 T X 1 , X 1 T X 1 , X 2 T X 1 ⋯   , X n T X 1 X 0 T X 2 , X 1 T X 2 , X 2 T X 2 ⋯   , X n T X 2 ⋯ X 0 T X n , X 1 T X n , X 2 T X n ⋯   , X n T X n ) ⋅ ( θ 0 θ 1 θ 2 ⋯ θ n ) = ( X 0 T ⋅ y X 1 T ⋅ y X 2 T ⋅ y ⋯ X n T ⋅ y ) begin{pmatrix} X_0^T X_0, X_1^T X_0, X_2^T X_0 cdots, X_n^T X_0 \\ X_0^T X_1, X_1^T X_1, X_2^T X_1 cdots, X_n^T X_1 \\ X_0^T X_2, X_1^T X_2, X_2^T X_2 cdots, X_n^T X_2 \\ cdots\\ X_0^T X_n, X_1^T X_n, X_2^T X_n cdots, X_n^T X_n\ end{pmatrix} cdot begin{pmatrix} theta_0 \\ theta_1 \\ theta_2 \\ cdots\\ theta_n\ end{pmatrix} = begin{pmatrix} X_0^Tcdot y \\ X_1^Tcdot y \\ X_2^Tcdot y \\ cdots\\ X_n^Tcdot y\ end{pmatrix} X0TX0,X1TX0,X2TX0,XnTX0X0TX1,X1TX1,X2TX1,XnTX1X0TX2,X1TX2,X2TX2,XnTX2X0TXn,X1TXn,X2TXn,XnTXnθ0θ1θ2θn=X0TyX1TyX2TyXnTy

左右两边进行处理,得(由(3)式子知 X a T ⋅ X b = X b T ⋅ X a X_a^T cdot X_b = X_b^Tcdot X_a XaTXb=XbTXa,这里交换了一下):
( X 0 T X 1 T X 2 T ⋯ X n T ) ⋅ ( X 0 , X 1 , X 2 , ⋯   , X n ) ⋅ ( θ 0 θ 1 θ 2 ⋯ θ n ) = ( X 0 T ⋅ y X 1 T ⋅ y X 2 T ⋅ y ⋯ X n T ⋅ y ) begin{pmatrix} X_0^T \\ X_1^T \\ X_2^T \\ cdots\\ X_n^T\ end{pmatrix} cdot (X_0, X_1, X_2, cdots, X_n) cdot begin{pmatrix} theta_0 \\ theta_1 \\ theta_2 \\ cdots\\ theta_n\ end{pmatrix} = begin{pmatrix} X_0^Tcdot y \\ X_1^Tcdot y \\ X_2^Tcdot y \\ cdots\\ X_n^Tcdot y\ end{pmatrix} X0TX1TX2TXnT(X0,X1,X2,,Xn)θ0θ1θ2θn=X0TyX1TyX2TyXnTy

将上式继续变换,得:
X T ⋅ X ⋅ θ = X T ⋅ y X^T cdot X cdot theta = X^T cdot y XTXθ=XTy

两边同时乘 ( X T ⋅ X ) − 1 (X^T cdot X)^{-1} (XTX)1 得最终结果:
θ = ( X T ⋅ X ) − 1 ⋅ X T ⋅ y theta = (X^T cdot X)^{-1} cdot X^Tcdot y θ=(XTX)1XTy






参考资料

考研必备数学公式大全: https://blog.csdn.net/zhaohongfei_358/article/details/106039576

机器学习纸上谈兵之线性回归: https://blog.csdn.net/zhaohongfei_358/article/details/117967229

最后

以上就是落后手链为你收集整理的多元线性回归方程正规方程解(Normal Equation)公式推导详细过程多元线性方程公式公式推导参考资料的全部内容,希望文章能够帮你解决多元线性回归方程正规方程解(Normal Equation)公式推导详细过程多元线性方程公式公式推导参考资料所遇到的程序开发问题。

如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。

本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
点赞(66)

评论列表共有 0 条评论

立即
投稿
返回
顶部