我是靠谱客的博主 甜蜜饼干,最近开发中收集的这篇文章主要介绍Kernel Method: 5. 线性回归5. Linear Regression Model and Kernel-based Linear Regression Model,觉得挺不错的,现在分享给大家,希望可以做个参考。

概述

文章目录

  • 5. Linear Regression Model and Kernel-based Linear Regression Model
    • 5.1 Linear Regression Model
    • 5.2 Kernel-based Linear Regression

5. Linear Regression Model and Kernel-based Linear Regression Model

5.1 Linear Regression Model

假设:随机变量 ε n varepsilon_n εn是满足独立同分布的均值为0、方差为 σ 2 sigma^2 σ2的高斯分布。随机变量 y n = f ( x n ) + ε n y_n=f(boldsymbol{x_n})+varepsilon_n yn=f(xn)+εn
E ( y n ∣ x n ) = f ( x n ) + E ( ε n ) = f ( x n ) E(y_n|boldsymbol x_n)=f(boldsymbol x_n)+E(varepsilon_n)=f(boldsymbol x_n) E(ynxn)=f(xn)+E(εn)=f(xn)
对于任意 n n n,假设条件期望满足
E ( y n ∣ x n ) = w 0 + w 1 x n 1 + ⋯ + w d x n d = [ 1 x n 1 ⋯ x n d ] [ w 0 ⋮ w d ] = [ 1 , x n T ] w = x ˉ n T w begin{aligned} E(y_n|boldsymbol x_n)&=w_0+w_1x_{n1}+cdots+w_dx_{nd}\ &= begin{bmatrix}1 & x_{n1} & cdots & x_{nd}end{bmatrix} begin{bmatrix}w_0 \ vdots \ w_dend{bmatrix}\ &=[1, boldsymbol x^T_n]boldsymbol w= bar{boldsymbol x}^T_n boldsymbol w end{aligned} E(ynxn)=w0+w1xn1++wdxnd=[1xn1xnd]w0wd=[1,xnT]w=xˉnTw
其中, x n = [ x n 1 , ⋯   , x n d ] T , x ˉ n = [ 1 , x n T ] T , w = [ w 0 , ⋯   , w d ] T boldsymbol x_n=[x_{n1},cdots,x_{nd}]^T,bar{boldsymbol x}_n=[1,boldsymbol x^T_n]^T, boldsymbol w=[w_0,cdots,w_d]^T xn=[xn1,,xnd]T,xˉn=[1,xnT]T,w=[w0,,wd]T w 0 w_0 w0为截距。
y n = f ( x n ) + ε n = E ( y n ∣ x n ) + ε n = x ˉ n T w y_n=f(boldsymbol x_n)+varepsilon_n=E(y_n|boldsymbol x_n)+varepsilon_n=bar{boldsymbol x}^T_n boldsymbol w yn=f(xn)+εn=E(ynxn)+εn=xˉnTw
假设有 N N N个样本 ( x 1 , y 1 ) , ⋯   , ( x N , y N ) {(boldsymbol x_1, y_1),cdots,(boldsymbol x_N, y_N)} (x1,y1),,(xN,yN),记为
X = [ x 1 T ⋮ x N T ] ,   X ˉ = [ x ˉ 1 T ⋮ x ˉ N T ] = [ 1 x 1 T ⋮ ⋮ 1 x N T ] ,   y = [ y 1 ⋮ y N ] X=begin{bmatrix}boldsymbol x^T_1 \ vdots \ boldsymbol x^T_Nend{bmatrix}, bar X=begin{bmatrix}bar{boldsymbol x}^T_1 \ vdots \ bar{boldsymbol x}^T_Nend{bmatrix}=begin{bmatrix}1 & boldsymbol x^T_1 \ vdots & vdots \ 1 & boldsymbol x^T_Nend{bmatrix}, boldsymbol y=begin{bmatrix}y_1 \ vdots \ y_Nend{bmatrix} X=x1TxNT, Xˉ=xˉ1TxˉNT=11x1TxNT, y=y1yN
线性回归模型描述的是 X X X y boldsymbol y y之间的关系:
{ y 1 ≈ x ˉ 1 T w + ε 1 ⋮ y N ≈ x ˉ N T w + ε N ⇒ [ y 1 ⋮ y N ] ≈ [ x ˉ 1 T ⋮ x ˉ N T ] w + [ ε 1 ⋮ ϵ N ] ⇒ y ≈ X ˉ w + ε begin{cases} y_1approx bar{boldsymbol x}^T_1 boldsymbol w + varepsilon_1\ vdots\ y_N approx bar{boldsymbol x}^T_N boldsymbol w + varepsilon_N end{cases}Rightarrow begin{bmatrix}y_1 \ vdots \ y_Nend{bmatrix}approxbegin{bmatrix}bar{boldsymbol x}^T_1 \ vdots \ bar{boldsymbol x}^T_Nend{bmatrix}boldsymbol w+begin{bmatrix}varepsilon_1 \ vdots \ epsilon_Nend{bmatrix}Rightarrow boldsymbol y approx bar X boldsymbol w + boldsymbol varepsilon y1xˉ1Tw+ε1yNxˉNTw+εNy1yNxˉ1TxˉNTw+ε1ϵNyXˉw+ε
最小化残差平方和(residual sum of squares, RSS)来估计 w boldsymbol w w的最优值
w ^ = arg ⁡ min ⁡ w ∣ ∣ y − X ˉ w ∣ ∣ 2 hat{boldsymbol w}=mathop{arg min}_{boldsymbol w}||boldsymbol y-bar Xboldsymbol w||^2 w^=argminwyXˉw2

∣ ∣ y − X ˉ w ∣ ∣ 2 = ( y − X ˉ w ) T ( y − X ˉ w ) = y T y − 2 w T X ˉ T y + w T X ˉ T X ˉ w ||boldsymbol y-bar Xboldsymbol w||^2=(boldsymbol y-bar Xboldsymbol w)^T(boldsymbol y-bar Xboldsymbol w)=boldsymbol y^T boldsymbol y-2boldsymbol w^T bar X^T boldsymbol y + boldsymbol w^T bar X^T bar X boldsymbol w yXˉw2=(yXˉw)T(yXˉw)=yTy2wTXˉTy+wTXˉTXˉw

∂ ∂ w ∣ ∣ y − X ˉ w ∣ ∣ 2 = − 2 X ˉ T y + 2 X ˉ T X ˉ w = 0 frac{partial}{partial boldsymbol w}||boldsymbol y-bar Xboldsymbol w||^2=-2bar X^T boldsymbol y + 2 bar X^T bar X boldsymbol w=0 wyXˉw2=2XˉTy+2XˉTXˉw=0

正规方程(normal equation)为
X ˉ T X ˉ w = X ˉ T y ⇒ w ^ = ( X ˉ T X ˉ ) − 1 X ˉ T y ⇒ y ^ = X ˉ w ^ = X ˉ ( X ˉ T X ˉ ) − 1 X ˉ T y bar X^T bar X boldsymbol w=bar X^T boldsymbol y\ begin{aligned} &Rightarrow hat{boldsymbol w}=(bar X^T bar X)^{-1}bar X^T boldsymbol y\ &Rightarrow hat{boldsymbol y}=bar X hat{boldsymbol w}=bar X (bar X^T bar X)^{-1}bar X^T boldsymbol y end{aligned} XˉTXˉw=XˉTyw^=(XˉTXˉ)1XˉTyy^=Xˉw^=Xˉ(XˉTXˉ)1XˉTy
对于未知样本 x boldsymbol x x f ( x ) = [ 1 ,   x T ] w ^ f(boldsymbol x)=[1, boldsymbol x^T]hat{boldsymbol w} f(x)=[1, xT]w^


y ≈ X ˉ w = [ 1 x 11 ⋯ x 1 d ⋮ ⋮ ⋱ ⋮ 1 x N 1 ⋯ x N d ] [ w 0 w 1 ⋮ w d ] = w 0 [ 1 ⋮ 1 ] + w 1 [ x 11 ⋮ x N 1 ] + ⋯ + w d [ x 1 d ⋮ x N d ] begin{aligned} boldsymbol y approx bar X boldsymbol w &= begin{bmatrix}1 & x_{11} & cdots & x_{1d}\ vdots & vdots & ddots & vdots \ 1 & x_{N1} & cdots & x_{Nd}end{bmatrix}begin{bmatrix}w_0 \ w_1 \ vdots \ w_dend{bmatrix}\ &=w_0begin{bmatrix}1 \ vdots \ 1end{bmatrix}+w_1begin{bmatrix}x_{11} \ vdots \ x_{N1}end{bmatrix}+cdots+w_dbegin{bmatrix}x_{1d} \ vdots \ x_{Nd}end{bmatrix} end{aligned} yXˉw=11x11xN1x1dxNdw0w1wd=w011+w1x11xN1++wdx1dxNd

可知 y boldsymbol y y的近似值可用 X ˉ bar X Xˉ的列向量线性组合表示。而残差可表示为两个向量之间的差,当残差垂直于 X ˉ bar X Xˉ的列向量构成的线性空间时,近似值最优。

  1. 0 = [ 1 , ⋯   , 1 ] e = [ 1 , ⋯   , 1 ] ( y − y ^ ) = ∑ n = 1 N y n − ∑ n = 1 N y ^ n ⇒ ∑ n = 1 N y n = ∑ n = 1 N y ^ n ⇒ y ˉ = y ^ ˉ 0=[1,cdots,1]boldsymbol e=[1,cdots,1](boldsymbol y-hat{boldsymbol y})=sum_{n=1}^N y_n -sum_{n=1}^N hat{y}_n Rightarrow sum_{n=1}^N y_n =sum_{n=1}^N hat{y}_n Rightarrow color{red}bar y=bar{hat y} 0=[1,,1]e=[1,,1](yy^)=n=1Nynn=1Ny^nn=1Nyn=n=1Ny^nyˉ=y^ˉ

  2. 计算原方差
    ∣ ∣ y − y ˉ 1 N × 1 ∣ ∣ 2 = ∣ ∣ y ^ − y ˉ 1 N × 1 + y − y ^ ∣ ∣ 2 = ∣ ∣ y ^ − y ˉ 1 N × 1 ∣ ∣ 2 + ∣ ∣ y − y ^ ∣ ∣ 2 − 2 ( y ^ − y ˉ 1 N × 1 ) T ( y − y ^ ) = ∣ ∣ y ^ − y ˉ 1 N × 1 ∣ ∣ 2 + ∣ ∣ y − y ^ ∣ ∣ 2 − 2 ( y ^ − y ˉ 1 N × 1 ) T e = ∣ ∣ y ^ − y ^ ˉ 1 N × 1 ∣ ∣ 2 + ∣ ∣ y − y ^ ∣ ∣ 2 begin{aligned} ||boldsymbol y - bar y boldsymbol 1_{Ntimes 1}||^2 &= ||hat{boldsymbol y}-bar y boldsymbol 1_{Ntimes 1}+boldsymbol y - hat{boldsymbol y}||^2\ &= ||hat{boldsymbol y}-bar{y}boldsymbol 1_{Ntimes 1}||^2+||boldsymbol y-hat{boldsymbol y}||^2-2(hat{boldsymbol y}-bar y boldsymbol 1_{Ntimes 1})^T(boldsymbol y-hat{boldsymbol y})\ &= ||hat{boldsymbol y}-bar{y}boldsymbol 1_{Ntimes 1}||^2+||boldsymbol y-hat{boldsymbol y}||^2-2(hat{boldsymbol y}-bar y boldsymbol 1_{Ntimes 1})^Tboldsymbol e \ &= ||hat{boldsymbol y}-bar{hat y}boldsymbol 1_{Ntimes 1}||^2+||boldsymbol y-hat{boldsymbol y}||^2 end{aligned} yyˉ1N×12=y^yˉ1N×1+yy^2=y^yˉ1N×12+yy^22(y^yˉ1N×1)T(yy^)=y^yˉ1N×12+yy^22(y^yˉ1N×1)Te=y^y^ˉ1N×12+yy^2
    估计的方差与原方差之间相差残差的平方和。

∑ n = 1 N ( y n − y ˉ n ) 2 = ∑ n = 1 N ( y ^ n − y ^ ˉ n ) 2 + ∑ n = 1 N ( y n − y ^ n ) 2 T S S = E S S + R S S sum_{n=1}^N(y_n-bar y_n)^2=sum_{n=1}^N(hat y_n-bar{hat y}_n)^2+sum_{n=1}^N(y_n-hat y_n)^2\ TSS=ESS+RSS n=1N(ynyˉn)2=n=1N(y^ny^ˉn)2+n=1N(yny^n)2TSS=ESS+RSS

  • TSS: Total sum of squares
  • ESS: Explained sum of squares
  • RSS: Residual (unexplained) sum of squares

ESS与TSS的比值被用于评价拟合效果的好坏:
R 2 = E S S T S S = ∑ n = 1 N ( y ^ n − y ^ ˉ n ) 2 ∑ n = 1 N ( y n − y ˉ n ) 2 = 1 N ∑ n = 1 N ( y ^ n − y ^ ˉ n ) 2 1 N ∑ n = 1 N ( y n − y ˉ n ) 2 = T S S − R S S T S S = 1 − R S S T S S R^2=frac{ESS}{TSS}=frac{sum_{n=1}^N(hat y_n-bar{hat y}_n)^2}{sum_{n=1}^N(y_n-bar y_n)^2}=frac{frac{1}{N}sum_{n=1}^N(hat y_n-bar{hat y}_n)^2}{frac{1}{N}sum_{n=1}^N(y_n-bar y_n)^2}=frac{TSS-RSS}{TSS}=1-frac{RSS}{TSS} R2=TSSESS=n=1N(ynyˉn)2n=1N(y^ny^ˉn)2=N1n=1N(ynyˉn)2N1n=1N(y^ny^ˉn)2=TSSTSSRSS=1TSSRSS

5.2 Kernel-based Linear Regression

X = [ ϕ ( x 1 T ) ⋮ ϕ ( x N T ) ] ,   X ˉ = [ 1 ϕ ( x 1 T ) ⋮ ⋮ 1 ϕ ( x N T ) ] = [ 1 N × 1 , X ] ,   y = [ y 1 ⋮ y N ] X=begin{bmatrix}boldsymbol phi(x^T_1) \ vdots \ boldsymbol phi(x^T_N)end{bmatrix}, bar X=begin{bmatrix}1 & phi(boldsymbol x^T_1) \ vdots & vdots \ 1 & phi(boldsymbol x^T_N)end{bmatrix}=[boldsymbol 1_{Ntimes 1}, X], boldsymbol y=begin{bmatrix}y_1 \ vdots \ y_Nend{bmatrix} X=ϕ(x1T)ϕ(xNT), Xˉ=11ϕ(x1T)ϕ(xNT)=[1N×1,X], y=y1yN

w ^ = ( X ˉ T X ˉ ) − 1 X ˉ T y = ( X ˉ T X ˉ ) ( X ˉ T X ˉ ) − 2 X ˉ T y = X ˉ T α ^ , 其 中 α ^ = X ˉ ( X ˉ T X ˉ ) − 2 X ˉ T y hat{boldsymbol w}=(bar X^T bar X)^{-1}bar X^T boldsymbol y=(bar X^T bar X)(bar X^T bar X)^{-2}bar X^T boldsymbol y=bar X^T hat{boldsymbol alpha},其中hat{boldsymbol alpha}=bar X(bar X^T bar X)^{-2}bar X^T boldsymbol y w^=(XˉTXˉ)1XˉTy=(XˉTXˉ)(XˉTXˉ)2XˉTy=XˉTα^,α^=Xˉ(XˉTXˉ)2XˉTy

代入正规方程中
X ˉ T X ˉ w = X ˉ T y ⇒ X ˉ X ˉ T X ˉ X ˉ T α ^ = X ˉ X ˉ T y ⇒ K ˉ 2 α ^ = K ˉ y ,   w h e r e   K ˉ = X ˉ X ˉ T ⇒ K ˉ α ^ = y ,   i f   d e t ( K ˉ ) ≠ 0 ⇒ α ^ = K ˉ − 1 y begin{aligned} & bar X^T bar X boldsymbol w=bar X^T boldsymbol y\ Rightarrow & bar X bar X^T bar X bar X^T hat{boldsymbol alpha}=bar Xbar X^T boldsymbol y\ Rightarrow & bar K^2hat{boldsymbol alpha}=bar K boldsymbol y, where bar K=bar X bar X^T\ Rightarrow & bar K hat{boldsymbol alpha}=boldsymbol y, if det(bar K)ne0\ Rightarrow & hat{boldsymbol alpha}=bar K^{-1}boldsymbol y end{aligned} XˉTXˉw=XˉTyXˉXˉTXˉXˉTα^=XˉXˉTyKˉ2α^=Kˉy, where Kˉ=XˉXˉTKˉα^=y, if det(Kˉ)=0α^=Kˉ1y

K ˉ = X ˉ X ˉ T = [ 1 N × 1 , X ] [ 1 1 × N X T ] = 1 N × 1 + X X T = 1 N × 1 + K bar K=bar Xbar X^T=[boldsymbol 1_{Ntimes 1}, X]begin{bmatrix}boldsymbol 1_{1times N}\ X^Tend{bmatrix}=boldsymbol 1_{Ntimes 1}+XX^T=boldsymbol 1_{Ntimes 1}+K Kˉ=XˉXˉT=[1N×1,X][11×NXT]=1N×1+XXT=1N×1+K

因此,
α ^ = ( 1 N × 1 + K ) − 1 y y ^ = X ˉ w = X ˉ X ˉ T α ^ = ( 1 N × 1 + K ) α ^ color{red} begin{aligned} hat{boldsymbol alpha}&=(boldsymbol 1_{Ntimes 1}+K)^{-1}boldsymbol y\ hat{boldsymbol y}&=bar X boldsymbol w=bar X bar X^T hat{boldsymbol alpha}=(boldsymbol 1_{Ntimes 1}+K)hat{boldsymbol alpha} end{aligned} α^y^=(1N×1+K)1y=Xˉw=XˉXˉTα^=(1N×1+K)α^
对于未知的 x boldsymbol x x,
f ( x ) = [ 1 , ϕ ( x ) T ] w ^ = [ 1 , ϕ ( x ) T ] X ˉ T α ^ = [ 1 , ϕ ( x ) T ] [ 1 1 × N X T ] α ^ = ( 1 1 × N + ( X ϕ ( x ) ) T ) α ^ = [ 1 + κ ( x 1 , x ) , ⋯   , 1 + κ ( x 1 , x ) ] α ^ = ( α ^ 1 + ⋯ + α ^ N ) + α ^ 1 κ ( x 1 , x ) + ⋯ + α ^ N κ ( x 1 , x ) begin{aligned} f(boldsymbol x) &= [1, phi(boldsymbol x)^T]hat{boldsymbol w}\ &= [1, phi(boldsymbol x)^T]bar X^T hat{boldsymbol alpha}\ &= [1, phi(boldsymbol x)^T] begin{bmatrix} boldsymbol 1_{1times N} \ X^T end{bmatrix} hat{boldsymbol alpha}\ &= (boldsymbol 1_{1times N} + (Xphi(boldsymbol x))^T)hat{boldsymbol alpha}\ &= [1+kappa(boldsymbol x_1, boldsymbol x),cdots,1+kappa(boldsymbol x_1, boldsymbol x)]hat{boldsymbol alpha}\ &= (hatalpha_1+cdots+hatalpha_N)+hatalpha_1kappa(boldsymbol x_1, boldsymbol x)+cdots+hatalpha_Nkappa(boldsymbol x_1, boldsymbol x) end{aligned} f(x)=[1,ϕ(x)T]w^=[1,ϕ(x)T]XˉTα^=[1,ϕ(x)T][11×NXT]α^=(11×N+(Xϕ(x))T)α^=[1+κ(x1,x),,1+κ(x1,x)]α^=(α^1++α^N)+α^1κ(x1,x)++α^Nκ(x1,x)

最后

以上就是甜蜜饼干为你收集整理的Kernel Method: 5. 线性回归5. Linear Regression Model and Kernel-based Linear Regression Model的全部内容,希望文章能够帮你解决Kernel Method: 5. 线性回归5. Linear Regression Model and Kernel-based Linear Regression Model所遇到的程序开发问题。

如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。

本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
点赞(39)

评论列表共有 0 条评论

立即
投稿
返回
顶部