概述
文章目录
- 5. Linear Regression Model and Kernel-based Linear Regression Model
- 5.1 Linear Regression Model
- 5.2 Kernel-based Linear Regression
5. Linear Regression Model and Kernel-based Linear Regression Model
5.1 Linear Regression Model
假设:随机变量
ε
n
varepsilon_n
εn是满足独立同分布的均值为0、方差为
σ
2
sigma^2
σ2的高斯分布。随机变量
y
n
=
f
(
x
n
)
+
ε
n
y_n=f(boldsymbol{x_n})+varepsilon_n
yn=f(xn)+εn。
E
(
y
n
∣
x
n
)
=
f
(
x
n
)
+
E
(
ε
n
)
=
f
(
x
n
)
E(y_n|boldsymbol x_n)=f(boldsymbol x_n)+E(varepsilon_n)=f(boldsymbol x_n)
E(yn∣xn)=f(xn)+E(εn)=f(xn)
对于任意
n
n
n,假设条件期望满足
E
(
y
n
∣
x
n
)
=
w
0
+
w
1
x
n
1
+
⋯
+
w
d
x
n
d
=
[
1
x
n
1
⋯
x
n
d
]
[
w
0
⋮
w
d
]
=
[
1
,
x
n
T
]
w
=
x
ˉ
n
T
w
begin{aligned} E(y_n|boldsymbol x_n)&=w_0+w_1x_{n1}+cdots+w_dx_{nd}\ &= begin{bmatrix}1 & x_{n1} & cdots & x_{nd}end{bmatrix} begin{bmatrix}w_0 \ vdots \ w_dend{bmatrix}\ &=[1, boldsymbol x^T_n]boldsymbol w= bar{boldsymbol x}^T_n boldsymbol w end{aligned}
E(yn∣xn)=w0+w1xn1+⋯+wdxnd=[1xn1⋯xnd]⎣⎢⎡w0⋮wd⎦⎥⎤=[1,xnT]w=xˉnTw
其中,
x
n
=
[
x
n
1
,
⋯
,
x
n
d
]
T
,
x
ˉ
n
=
[
1
,
x
n
T
]
T
,
w
=
[
w
0
,
⋯
,
w
d
]
T
boldsymbol x_n=[x_{n1},cdots,x_{nd}]^T,bar{boldsymbol x}_n=[1,boldsymbol x^T_n]^T, boldsymbol w=[w_0,cdots,w_d]^T
xn=[xn1,⋯,xnd]T,xˉn=[1,xnT]T,w=[w0,⋯,wd]T。
w
0
w_0
w0为截距。
y
n
=
f
(
x
n
)
+
ε
n
=
E
(
y
n
∣
x
n
)
+
ε
n
=
x
ˉ
n
T
w
y_n=f(boldsymbol x_n)+varepsilon_n=E(y_n|boldsymbol x_n)+varepsilon_n=bar{boldsymbol x}^T_n boldsymbol w
yn=f(xn)+εn=E(yn∣xn)+εn=xˉnTw
假设有
N
N
N个样本
(
x
1
,
y
1
)
,
⋯
,
(
x
N
,
y
N
)
{(boldsymbol x_1, y_1),cdots,(boldsymbol x_N, y_N)}
(x1,y1),⋯,(xN,yN),记为
X
=
[
x
1
T
⋮
x
N
T
]
,
X
ˉ
=
[
x
ˉ
1
T
⋮
x
ˉ
N
T
]
=
[
1
x
1
T
⋮
⋮
1
x
N
T
]
,
y
=
[
y
1
⋮
y
N
]
X=begin{bmatrix}boldsymbol x^T_1 \ vdots \ boldsymbol x^T_Nend{bmatrix}, bar X=begin{bmatrix}bar{boldsymbol x}^T_1 \ vdots \ bar{boldsymbol x}^T_Nend{bmatrix}=begin{bmatrix}1 & boldsymbol x^T_1 \ vdots & vdots \ 1 & boldsymbol x^T_Nend{bmatrix}, boldsymbol y=begin{bmatrix}y_1 \ vdots \ y_Nend{bmatrix}
X=⎣⎢⎡x1T⋮xNT⎦⎥⎤, Xˉ=⎣⎢⎡xˉ1T⋮xˉNT⎦⎥⎤=⎣⎢⎡1⋮1x1T⋮xNT⎦⎥⎤, y=⎣⎢⎡y1⋮yN⎦⎥⎤
线性回归模型描述的是
X
X
X与
y
boldsymbol y
y之间的关系:
{
y
1
≈
x
ˉ
1
T
w
+
ε
1
⋮
y
N
≈
x
ˉ
N
T
w
+
ε
N
⇒
[
y
1
⋮
y
N
]
≈
[
x
ˉ
1
T
⋮
x
ˉ
N
T
]
w
+
[
ε
1
⋮
ϵ
N
]
⇒
y
≈
X
ˉ
w
+
ε
begin{cases} y_1approx bar{boldsymbol x}^T_1 boldsymbol w + varepsilon_1\ vdots\ y_N approx bar{boldsymbol x}^T_N boldsymbol w + varepsilon_N end{cases}Rightarrow begin{bmatrix}y_1 \ vdots \ y_Nend{bmatrix}approxbegin{bmatrix}bar{boldsymbol x}^T_1 \ vdots \ bar{boldsymbol x}^T_Nend{bmatrix}boldsymbol w+begin{bmatrix}varepsilon_1 \ vdots \ epsilon_Nend{bmatrix}Rightarrow boldsymbol y approx bar X boldsymbol w + boldsymbol varepsilon
⎩⎪⎪⎨⎪⎪⎧y1≈xˉ1Tw+ε1⋮yN≈xˉNTw+εN⇒⎣⎢⎡y1⋮yN⎦⎥⎤≈⎣⎢⎡xˉ1T⋮xˉNT⎦⎥⎤w+⎣⎢⎡ε1⋮ϵN⎦⎥⎤⇒y≈Xˉw+ε
最小化残差平方和(residual sum of squares, RSS)来估计
w
boldsymbol w
w的最优值
w
^
=
arg
min
w
∣
∣
y
−
X
ˉ
w
∣
∣
2
hat{boldsymbol w}=mathop{arg min}_{boldsymbol w}||boldsymbol y-bar Xboldsymbol w||^2
w^=argminw∣∣y−Xˉw∣∣2
∣ ∣ y − X ˉ w ∣ ∣ 2 = ( y − X ˉ w ) T ( y − X ˉ w ) = y T y − 2 w T X ˉ T y + w T X ˉ T X ˉ w ||boldsymbol y-bar Xboldsymbol w||^2=(boldsymbol y-bar Xboldsymbol w)^T(boldsymbol y-bar Xboldsymbol w)=boldsymbol y^T boldsymbol y-2boldsymbol w^T bar X^T boldsymbol y + boldsymbol w^T bar X^T bar X boldsymbol w ∣∣y−Xˉw∣∣2=(y−Xˉw)T(y−Xˉw)=yTy−2wTXˉTy+wTXˉTXˉw
∂ ∂ w ∣ ∣ y − X ˉ w ∣ ∣ 2 = − 2 X ˉ T y + 2 X ˉ T X ˉ w = 0 frac{partial}{partial boldsymbol w}||boldsymbol y-bar Xboldsymbol w||^2=-2bar X^T boldsymbol y + 2 bar X^T bar X boldsymbol w=0 ∂w∂∣∣y−Xˉw∣∣2=−2XˉTy+2XˉTXˉw=0
正规方程(normal equation)为
X
ˉ
T
X
ˉ
w
=
X
ˉ
T
y
⇒
w
^
=
(
X
ˉ
T
X
ˉ
)
−
1
X
ˉ
T
y
⇒
y
^
=
X
ˉ
w
^
=
X
ˉ
(
X
ˉ
T
X
ˉ
)
−
1
X
ˉ
T
y
bar X^T bar X boldsymbol w=bar X^T boldsymbol y\ begin{aligned} &Rightarrow hat{boldsymbol w}=(bar X^T bar X)^{-1}bar X^T boldsymbol y\ &Rightarrow hat{boldsymbol y}=bar X hat{boldsymbol w}=bar X (bar X^T bar X)^{-1}bar X^T boldsymbol y end{aligned}
XˉTXˉw=XˉTy⇒w^=(XˉTXˉ)−1XˉTy⇒y^=Xˉw^=Xˉ(XˉTXˉ)−1XˉTy
对于未知样本
x
boldsymbol x
x,
f
(
x
)
=
[
1
,
x
T
]
w
^
f(boldsymbol x)=[1, boldsymbol x^T]hat{boldsymbol w}
f(x)=[1, xT]w^。
y ≈ X ˉ w = [ 1 x 11 ⋯ x 1 d ⋮ ⋮ ⋱ ⋮ 1 x N 1 ⋯ x N d ] [ w 0 w 1 ⋮ w d ] = w 0 [ 1 ⋮ 1 ] + w 1 [ x 11 ⋮ x N 1 ] + ⋯ + w d [ x 1 d ⋮ x N d ] begin{aligned} boldsymbol y approx bar X boldsymbol w &= begin{bmatrix}1 & x_{11} & cdots & x_{1d}\ vdots & vdots & ddots & vdots \ 1 & x_{N1} & cdots & x_{Nd}end{bmatrix}begin{bmatrix}w_0 \ w_1 \ vdots \ w_dend{bmatrix}\ &=w_0begin{bmatrix}1 \ vdots \ 1end{bmatrix}+w_1begin{bmatrix}x_{11} \ vdots \ x_{N1}end{bmatrix}+cdots+w_dbegin{bmatrix}x_{1d} \ vdots \ x_{Nd}end{bmatrix} end{aligned} y≈Xˉw=⎣⎢⎡1⋮1x11⋮xN1⋯⋱⋯x1d⋮xNd⎦⎥⎤⎣⎢⎢⎢⎡w0w1⋮wd⎦⎥⎥⎥⎤=w0⎣⎢⎡1⋮1⎦⎥⎤+w1⎣⎢⎡x11⋮xN1⎦⎥⎤+⋯+wd⎣⎢⎡x1d⋮xNd⎦⎥⎤
可知 y boldsymbol y y的近似值可用 X ˉ bar X Xˉ的列向量线性组合表示。而残差可表示为两个向量之间的差,当残差垂直于 X ˉ bar X Xˉ的列向量构成的线性空间时,近似值最优。
-
0 = [ 1 , ⋯ , 1 ] e = [ 1 , ⋯ , 1 ] ( y − y ^ ) = ∑ n = 1 N y n − ∑ n = 1 N y ^ n ⇒ ∑ n = 1 N y n = ∑ n = 1 N y ^ n ⇒ y ˉ = y ^ ˉ 0=[1,cdots,1]boldsymbol e=[1,cdots,1](boldsymbol y-hat{boldsymbol y})=sum_{n=1}^N y_n -sum_{n=1}^N hat{y}_n Rightarrow sum_{n=1}^N y_n =sum_{n=1}^N hat{y}_n Rightarrow color{red}bar y=bar{hat y} 0=[1,⋯,1]e=[1,⋯,1](y−y^)=n=1∑Nyn−n=1∑Ny^n⇒n=1∑Nyn=n=1∑Ny^n⇒yˉ=y^ˉ
-
计算原方差
∣ ∣ y − y ˉ 1 N × 1 ∣ ∣ 2 = ∣ ∣ y ^ − y ˉ 1 N × 1 + y − y ^ ∣ ∣ 2 = ∣ ∣ y ^ − y ˉ 1 N × 1 ∣ ∣ 2 + ∣ ∣ y − y ^ ∣ ∣ 2 − 2 ( y ^ − y ˉ 1 N × 1 ) T ( y − y ^ ) = ∣ ∣ y ^ − y ˉ 1 N × 1 ∣ ∣ 2 + ∣ ∣ y − y ^ ∣ ∣ 2 − 2 ( y ^ − y ˉ 1 N × 1 ) T e = ∣ ∣ y ^ − y ^ ˉ 1 N × 1 ∣ ∣ 2 + ∣ ∣ y − y ^ ∣ ∣ 2 begin{aligned} ||boldsymbol y - bar y boldsymbol 1_{Ntimes 1}||^2 &= ||hat{boldsymbol y}-bar y boldsymbol 1_{Ntimes 1}+boldsymbol y - hat{boldsymbol y}||^2\ &= ||hat{boldsymbol y}-bar{y}boldsymbol 1_{Ntimes 1}||^2+||boldsymbol y-hat{boldsymbol y}||^2-2(hat{boldsymbol y}-bar y boldsymbol 1_{Ntimes 1})^T(boldsymbol y-hat{boldsymbol y})\ &= ||hat{boldsymbol y}-bar{y}boldsymbol 1_{Ntimes 1}||^2+||boldsymbol y-hat{boldsymbol y}||^2-2(hat{boldsymbol y}-bar y boldsymbol 1_{Ntimes 1})^Tboldsymbol e \ &= ||hat{boldsymbol y}-bar{hat y}boldsymbol 1_{Ntimes 1}||^2+||boldsymbol y-hat{boldsymbol y}||^2 end{aligned} ∣∣y−yˉ1N×1∣∣2=∣∣y^−yˉ1N×1+y−y^∣∣2=∣∣y^−yˉ1N×1∣∣2+∣∣y−y^∣∣2−2(y^−yˉ1N×1)T(y−y^)=∣∣y^−yˉ1N×1∣∣2+∣∣y−y^∣∣2−2(y^−yˉ1N×1)Te=∣∣y^−y^ˉ1N×1∣∣2+∣∣y−y^∣∣2
估计的方差与原方差之间相差残差的平方和。
∑ n = 1 N ( y n − y ˉ n ) 2 = ∑ n = 1 N ( y ^ n − y ^ ˉ n ) 2 + ∑ n = 1 N ( y n − y ^ n ) 2 T S S = E S S + R S S sum_{n=1}^N(y_n-bar y_n)^2=sum_{n=1}^N(hat y_n-bar{hat y}_n)^2+sum_{n=1}^N(y_n-hat y_n)^2\ TSS=ESS+RSS n=1∑N(yn−yˉn)2=n=1∑N(y^n−y^ˉn)2+n=1∑N(yn−y^n)2TSS=ESS+RSS
- TSS: Total sum of squares
- ESS: Explained sum of squares
- RSS: Residual (unexplained) sum of squares
ESS与TSS的比值被用于评价拟合效果的好坏:
R
2
=
E
S
S
T
S
S
=
∑
n
=
1
N
(
y
^
n
−
y
^
ˉ
n
)
2
∑
n
=
1
N
(
y
n
−
y
ˉ
n
)
2
=
1
N
∑
n
=
1
N
(
y
^
n
−
y
^
ˉ
n
)
2
1
N
∑
n
=
1
N
(
y
n
−
y
ˉ
n
)
2
=
T
S
S
−
R
S
S
T
S
S
=
1
−
R
S
S
T
S
S
R^2=frac{ESS}{TSS}=frac{sum_{n=1}^N(hat y_n-bar{hat y}_n)^2}{sum_{n=1}^N(y_n-bar y_n)^2}=frac{frac{1}{N}sum_{n=1}^N(hat y_n-bar{hat y}_n)^2}{frac{1}{N}sum_{n=1}^N(y_n-bar y_n)^2}=frac{TSS-RSS}{TSS}=1-frac{RSS}{TSS}
R2=TSSESS=∑n=1N(yn−yˉn)2∑n=1N(y^n−y^ˉn)2=N1∑n=1N(yn−yˉn)2N1∑n=1N(y^n−y^ˉn)2=TSSTSS−RSS=1−TSSRSS
5.2 Kernel-based Linear Regression
X = [ ϕ ( x 1 T ) ⋮ ϕ ( x N T ) ] , X ˉ = [ 1 ϕ ( x 1 T ) ⋮ ⋮ 1 ϕ ( x N T ) ] = [ 1 N × 1 , X ] , y = [ y 1 ⋮ y N ] X=begin{bmatrix}boldsymbol phi(x^T_1) \ vdots \ boldsymbol phi(x^T_N)end{bmatrix}, bar X=begin{bmatrix}1 & phi(boldsymbol x^T_1) \ vdots & vdots \ 1 & phi(boldsymbol x^T_N)end{bmatrix}=[boldsymbol 1_{Ntimes 1}, X], boldsymbol y=begin{bmatrix}y_1 \ vdots \ y_Nend{bmatrix} X=⎣⎢⎡ϕ(x1T)⋮ϕ(xNT)⎦⎥⎤, Xˉ=⎣⎢⎡1⋮1ϕ(x1T)⋮ϕ(xNT)⎦⎥⎤=[1N×1,X], y=⎣⎢⎡y1⋮yN⎦⎥⎤
w ^ = ( X ˉ T X ˉ ) − 1 X ˉ T y = ( X ˉ T X ˉ ) ( X ˉ T X ˉ ) − 2 X ˉ T y = X ˉ T α ^ , 其 中 α ^ = X ˉ ( X ˉ T X ˉ ) − 2 X ˉ T y hat{boldsymbol w}=(bar X^T bar X)^{-1}bar X^T boldsymbol y=(bar X^T bar X)(bar X^T bar X)^{-2}bar X^T boldsymbol y=bar X^T hat{boldsymbol alpha},其中hat{boldsymbol alpha}=bar X(bar X^T bar X)^{-2}bar X^T boldsymbol y w^=(XˉTXˉ)−1XˉTy=(XˉTXˉ)(XˉTXˉ)−2XˉTy=XˉTα^,其中α^=Xˉ(XˉTXˉ)−2XˉTy
代入正规方程中
X
ˉ
T
X
ˉ
w
=
X
ˉ
T
y
⇒
X
ˉ
X
ˉ
T
X
ˉ
X
ˉ
T
α
^
=
X
ˉ
X
ˉ
T
y
⇒
K
ˉ
2
α
^
=
K
ˉ
y
,
w
h
e
r
e
K
ˉ
=
X
ˉ
X
ˉ
T
⇒
K
ˉ
α
^
=
y
,
i
f
d
e
t
(
K
ˉ
)
≠
0
⇒
α
^
=
K
ˉ
−
1
y
begin{aligned} & bar X^T bar X boldsymbol w=bar X^T boldsymbol y\ Rightarrow & bar X bar X^T bar X bar X^T hat{boldsymbol alpha}=bar Xbar X^T boldsymbol y\ Rightarrow & bar K^2hat{boldsymbol alpha}=bar K boldsymbol y, where bar K=bar X bar X^T\ Rightarrow & bar K hat{boldsymbol alpha}=boldsymbol y, if det(bar K)ne0\ Rightarrow & hat{boldsymbol alpha}=bar K^{-1}boldsymbol y end{aligned}
⇒⇒⇒⇒XˉTXˉw=XˉTyXˉXˉTXˉXˉTα^=XˉXˉTyKˉ2α^=Kˉy, where Kˉ=XˉXˉTKˉα^=y, if det(Kˉ)=0α^=Kˉ−1y
K ˉ = X ˉ X ˉ T = [ 1 N × 1 , X ] [ 1 1 × N X T ] = 1 N × 1 + X X T = 1 N × 1 + K bar K=bar Xbar X^T=[boldsymbol 1_{Ntimes 1}, X]begin{bmatrix}boldsymbol 1_{1times N}\ X^Tend{bmatrix}=boldsymbol 1_{Ntimes 1}+XX^T=boldsymbol 1_{Ntimes 1}+K Kˉ=XˉXˉT=[1N×1,X][11×NXT]=1N×1+XXT=1N×1+K
因此,
α
^
=
(
1
N
×
1
+
K
)
−
1
y
y
^
=
X
ˉ
w
=
X
ˉ
X
ˉ
T
α
^
=
(
1
N
×
1
+
K
)
α
^
color{red} begin{aligned} hat{boldsymbol alpha}&=(boldsymbol 1_{Ntimes 1}+K)^{-1}boldsymbol y\ hat{boldsymbol y}&=bar X boldsymbol w=bar X bar X^T hat{boldsymbol alpha}=(boldsymbol 1_{Ntimes 1}+K)hat{boldsymbol alpha} end{aligned}
α^y^=(1N×1+K)−1y=Xˉw=XˉXˉTα^=(1N×1+K)α^
对于未知的
x
boldsymbol x
x,
f
(
x
)
=
[
1
,
ϕ
(
x
)
T
]
w
^
=
[
1
,
ϕ
(
x
)
T
]
X
ˉ
T
α
^
=
[
1
,
ϕ
(
x
)
T
]
[
1
1
×
N
X
T
]
α
^
=
(
1
1
×
N
+
(
X
ϕ
(
x
)
)
T
)
α
^
=
[
1
+
κ
(
x
1
,
x
)
,
⋯
,
1
+
κ
(
x
1
,
x
)
]
α
^
=
(
α
^
1
+
⋯
+
α
^
N
)
+
α
^
1
κ
(
x
1
,
x
)
+
⋯
+
α
^
N
κ
(
x
1
,
x
)
begin{aligned} f(boldsymbol x) &= [1, phi(boldsymbol x)^T]hat{boldsymbol w}\ &= [1, phi(boldsymbol x)^T]bar X^T hat{boldsymbol alpha}\ &= [1, phi(boldsymbol x)^T] begin{bmatrix} boldsymbol 1_{1times N} \ X^T end{bmatrix} hat{boldsymbol alpha}\ &= (boldsymbol 1_{1times N} + (Xphi(boldsymbol x))^T)hat{boldsymbol alpha}\ &= [1+kappa(boldsymbol x_1, boldsymbol x),cdots,1+kappa(boldsymbol x_1, boldsymbol x)]hat{boldsymbol alpha}\ &= (hatalpha_1+cdots+hatalpha_N)+hatalpha_1kappa(boldsymbol x_1, boldsymbol x)+cdots+hatalpha_Nkappa(boldsymbol x_1, boldsymbol x) end{aligned}
f(x)=[1,ϕ(x)T]w^=[1,ϕ(x)T]XˉTα^=[1,ϕ(x)T][11×NXT]α^=(11×N+(Xϕ(x))T)α^=[1+κ(x1,x),⋯,1+κ(x1,x)]α^=(α^1+⋯+α^N)+α^1κ(x1,x)+⋯+α^Nκ(x1,x)
最后
以上就是甜蜜饼干为你收集整理的Kernel Method: 5. 线性回归5. Linear Regression Model and Kernel-based Linear Regression Model的全部内容,希望文章能够帮你解决Kernel Method: 5. 线性回归5. Linear Regression Model and Kernel-based Linear Regression Model所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
发表评论 取消回复