概述
In linear regression problems, we can use a method called Normal Equation to fit the parameters.
Suppose we have a training set like this:
X
=
[
(
x
(
1
)
)
T
(
x
(
2
)
)
T
.
.
.
(
x
(
m
)
)
T
]
X = left[begin{matrix}(x^{(1)})^T \ (x^{(2)})^T \ ... \ (x^{(m)})^Tend{matrix}right]
X=⎣⎢⎢⎡(x(1))T(x(2))T...(x(m))T⎦⎥⎥⎤
where:
x
(
i
)
=
[
x
0
(
i
)
x
1
(
i
)
.
.
.
x
n
(
i
)
]
x^{(i)} = left[begin{matrix}x_0^{(i)} \ x_1^{(i)} \ ... \ x_n^{(i)}end{matrix}right]
x(i)=⎣⎢⎢⎢⎡x0(i)x1(i)...xn(i)⎦⎥⎥⎥⎤
and the label set:
y
=
[
y
(
1
)
y
(
2
)
.
.
.
y
(
m
)
]
y = left[begin{matrix}y^{(1)} \ y^{(2)} \ ... \ y^{(m)}end{matrix}right]
y=⎣⎢⎢⎡y(1)y(2)...y(m)⎦⎥⎥⎤
We wants to fit parameters
θ
=
[
θ
0
θ
1
.
.
.
θ
n
]
theta = left[begin{matrix}theta_0 \ theta_1 \ ... \ theta_nend{matrix}right]
θ=⎣⎢⎢⎡θ0θ1...θn⎦⎥⎥⎤
to make this equation:
J
=
∣
∣
X
⋅
θ
−
y
∣
∣
2
J = ||Xcdottheta - y||^2
J=∣∣X⋅θ−y∣∣2
to have its global minimum. which is:
θ
=
a
r
g
m
i
n
θ
∣
∣
X
⋅
θ
−
y
∣
∣
2
=
(
X
T
X
)
−
1
⋅
X
T
y
theta = mathop {argmin}_{theta} ||Xcdottheta - y||^2 = (X^TX)^{-1}cdot X^Ty
θ=argminθ∣∣X⋅θ−y∣∣2=(XTX)−1⋅XTy
Let’s prove it.
We take the partial derivatives of each parameters. for
θ
j
theta_j
θj, we find that:
∂
J
∂
θ
j
=
∑
i
=
1
m
(
(
x
(
i
)
)
T
θ
−
y
(
i
)
)
⋅
x
j
(
i
)
=
0
frac{partial J}{partial theta_j} = sum_{i = 1}^{m} ((x^{(i)})^Ttheta -y^{(i)})cdot x_j^{(i)} = 0
∂θj∂J=i=1∑m((x(i))Tθ−y(i))⋅xj(i)=0
tranform this quation, we find that:
[
x
j
(
1
)
x
j
(
2
)
.
.
.
x
j
(
m
)
]
X
⋅
θ
=
[
x
j
(
1
)
x
j
(
2
)
.
.
.
x
j
(
m
)
]
⋅
y
left[ begin{matrix}x_j^{(1)} & x_j^{(2)} & ... & x_j^{(m)}end{matrix}right]Xcdottheta = left[ begin{matrix}x_j^{(1)} & x_j^{(2)} & ... & x_j^{(m)}end{matrix}right]cdot y
[xj(1)xj(2)...xj(m)]X⋅θ=[xj(1)xj(2)...xj(m)]⋅y
combine all the n+1 equations, we find that:
X
T
X
θ
=
X
T
y
X^TXtheta = X^Ty
XTXθ=XTy
θ
=
(
X
T
X
)
−
1
X
T
y
theta = (X^TX)^{-1}X^Ty
θ=(XTX)−1XTy
Then we involve reguarization, which means, we want to change the function J to be:
J
=
∣
∣
X
⋅
θ
−
y
∣
∣
2
+
λ
∑
j
=
1
n
θ
j
2
J = ||Xcdottheta - y||^2 + lambda sum_{j=1}^n theta_j^2
J=∣∣X⋅θ−y∣∣2+λj=1∑nθj2
where
λ
lambda
λ is a constant called the regularization parameter.
Still, we calculate the partial derivative for each
θ
j
theta_j
θj. Note that the partial derivative for
θ
0
theta_0
θ0 is not change.
∂
J
∂
θ
j
=
∑
i
=
1
m
(
(
x
(
i
)
)
T
θ
−
y
(
i
)
)
⋅
x
j
(
i
)
+
λ
θ
j
=
0
(
f
o
r
j
>
0
)
frac{partial J}{partial theta_j} = sum_{i = 1}^{m} ((x^{(i)})^Ttheta -y^{(i)})cdot x_j^{(i)} + lambdatheta_j= 0 (for j>0)
∂θj∂J=i=1∑m((x(i))Tθ−y(i))⋅xj(i)+λθj=0 (for j>0)
[
x
j
(
1
)
x
j
(
2
)
.
.
.
x
j
(
m
)
]
X
⋅
θ
+
λ
θ
j
=
[
x
j
(
1
)
x
j
(
2
)
.
.
.
x
j
(
m
)
]
⋅
y
left[ begin{matrix}x_j^{(1)} & x_j^{(2)} & ... & x_j^{(m)}end{matrix}right]Xcdottheta + lambdatheta_j= left[ begin{matrix}x_j^{(1)} & x_j^{(2)} & ... & x_j^{(m)}end{matrix}right]cdot y
[xj(1)xj(2)...xj(m)]X⋅θ+λθj=[xj(1)xj(2)...xj(m)]⋅y
λ
θ
j
=
λ
e
j
T
θ
lambdatheta_j = lambda e_j^Ttheta
λθj=λejTθ
where
e
j
e_j
ej is the unit vector with the jth element be 1 and others be 0
We add all the n+1 equations up, to find :
(
X
T
X
+
λ
L
)
θ
=
X
T
y
(X^TX+lambda L)theta = X^Ty
(XTX+λL)θ=XTy
θ
=
(
X
T
X
+
λ
L
)
−
1
X
T
y
theta = (X^TX + lambda L)^{-1}X^Ty
θ=(XTX+λL)−1XTy
where
L
=
d
i
a
g
(
0
,
1
,
1
,
.
.
.
,
1
)
L = diag(0,1,1,...,1)
L=diag(0,1,1,...,1)
最后
以上就是拼搏树叶为你收集整理的Linear Regression - Normal Equation and Regularization的全部内容,希望文章能够帮你解决Linear Regression - Normal Equation and Regularization所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
发表评论 取消回复