概述
以逻辑回归的二分类模型作出如下推导:
1. 定义
在线性回归上套一层sigmoid函数
g
(
z
)
=
1
1
+
e
−
z
g(z) = frac{1}{1 + e^{-z}}
g(z)=1+e−z1
y = h θ ( x ) = g ( θ T x ) = 1 1 + e − θ T x = 1 1 + e − ( θ 0 + θ 1 x 1 + θ 2 x 2 + . . . + θ n x n ) y = h_theta(x) = g(theta^Tx) = frac{1}{1 + e^{-theta^Tx}} = frac{1}{1 + e^{-(theta_0 + theta_1x_1 + theta_2x_2 + ... + theta_nx_n)}} y=hθ(x)=g(θTx)=1+e−θTx1=1+e−(θ0+θ1x1+θ2x2+...+θnxn)1
注: x 0 x_0 x0是为了便于计算,人为增添的一列,值全为1
这里对函数 g ( z ) g(z) g(z)进行下求导运算,后面推导会用到。
g ′ ( z ) = ( 1 1 + e − z ) ′ g'(z) = (frac{1}{1 + e^{-z}})' g′(z)=(1+e−z1)′
= e − z + 1 − 1 ( 1 + e − z ) 2 quadquad=frac{e^{-z} +1-1}{{(1+e^{-z})}^2} =(1+e−z)2e−z+1−1
= 1 1 + e − z − 1 ( 1 + e − z ) 2 quadquad=frac{1}{1+e^{-z}} - frac{1}{{(1+e^{-z})}^2} =1+e−z1−(1+e−z)21
= g ( z ) ( 1 − g ( z ) ) quadquad=g(z)(1-g(z)) =g(z)(1−g(z))
2. 计算概率
假定:
- p ( y = 1 ∣ x ; θ ) = h θ ( x ) p(y=1|x;theta) = h_theta(x) p(y=1∣x;θ)=hθ(x)
- p ( y = 0 ∣ x ; θ ) = 1 − h θ ( x ) p(y=0|x;theta) = 1 - h_theta(x) p(y=0∣x;θ)=1−hθ(x)
组合上述两式:
p ( y ∣ x θ ) = h θ ( x ) y i ( 1 − h θ ( x ) ) 1 − y i p(y|xtheta) = h_theta(x)^{y_i}(1-h_theta(x))^{1-y_i} p(y∣xθ)=hθ(x)yi(1−hθ(x))1−yi
y y y是标签,正类标记1,负类标记0
3. 极大似然估计
L ( θ ) = ∏ i = 1 m ( h θ ( x i ) y i ( 1 − h θ ( x i ) ) 1 − y i ) ) L(theta) = prod_{i=1}^{m}{(h_theta(x_i)^{y_i}(1-h_theta(x_i))^{1-y_i}))} L(θ)=∏i=1m(hθ(xi)yi(1−hθ(xi))1−yi))
取对数,转累加
l ( θ ) = ln L ( θ ) l(theta) = ln L(theta) l(θ)=lnL(θ)
= ∑ i = 1 m ln ( h θ ( x i ) y i ( 1 − h θ ( x i ) ) 1 − y i ) ) quad =sum_{i=1}^{m}{ln(h_theta(x_i)^{y_i}(1-h_theta(x_i))^{1-y_i}))} =∑i=1mln(hθ(xi)yi(1−hθ(xi))1−yi))
= ∑ i = 1 m [ y i ln h θ ( x i ) + ( 1 − y i ) ln ( 1 − h θ ( x i ) ) ] quad =sum_{i=1}^{m}{[y_i ln h_theta(x_i) + (1-y_i)ln(1-h_theta(x_i))]} =∑i=1m[yilnhθ(xi)+(1−yi)ln(1−hθ(xi))]
说明:
- 当y=1时,我们期望 p ( y = 1 ∣ x ; θ ) p(y=1|x;theta) p(y=1∣x;θ)的值越大,即预测结果为正类的概率越大,误差就越小
- 当y=0时,我们期望 p ( y = 0 ∣ x ; θ ) p(y=0|x;theta) p(y=0∣x;θ)的值越大,即预测结果为负类的概率越大,误差也越小
因此我们的目标是求取似然函数 l ( θ ) l(theta) l(θ)的最大值。
4. 损失函数
对似然函数求最大值需要使用梯度上升的方式,这里我们引入 J ( θ ) = − l ( θ ) J(theta) = -l(theta) J(θ)=−l(θ),转化为使用梯度下降的方式计算损失函数的最小值。
5. 梯度下降
∂ ∂ θ J ( θ j ) = − ∂ ∂ θ ∑ i = 1 m [ y i ln h θ ( x i ) + ( 1 − y i ) ln ( 1 − h θ ( x i ) ) ] frac{partial}{partialtheta}J(theta_j) = -frac{partial}{partialtheta}sum_{i=1}^{m}{[y_i ln h_theta(x_i) + (1-y_i)ln(1-h_theta(x_i))]} ∂θ∂J(θj)=−∂θ∂∑i=1m[yilnhθ(xi)+(1−yi)ln(1−hθ(xi))]
= − ∑ i = 1 m [ y i 1 h θ ( x i ) ∂ ∂ θ h θ ( x i ) − ( 1 − y i ) 1 1 − h θ ( x i ) ∂ ∂ θ h θ ( x i ) ] quadquadquad = -sum_{i=1}^{m}{[y_ifrac{1}{h_theta(x_i)}frac{partial}{partial theta} h_theta(x_i)-(1-y_i)frac{1}{1-h_theta(x_i)}frac{partial}{partial theta} h_theta(x_i)]} =−∑i=1m[yihθ(xi)1∂θ∂hθ(xi)−(1−yi)1−hθ(xi)1∂θ∂hθ(xi)]
= − ∑ i = 1 m [ y i 1 h θ ( x i ) − ( 1 − y i ) 1 1 − h θ ( x i ) ] ∂ ∂ θ h θ ( x i ) quadquadquad = -sum_{i=1}^{m}{[y_ifrac{1}{h_theta(x_i)} - (1-y_i)frac{1}{1-h_theta(x_i)}]}frac{partial}{partial theta} h_theta(x_i) =−∑i=1m[yihθ(xi)1−(1−yi)1−hθ(xi)1]∂θ∂hθ(xi)
= − ∑ i = 1 m [ y i 1 g ( θ T x ) − ( 1 − y i ) 1 1 − g ( θ T x ) ] ∂ ∂ θ g ( θ T x ) quadquadquad = -sum_{i=1}^{m}{[y_ifrac{1}{g(theta^Tx)} - (1-y_i)frac{1}{1-g(theta^Tx)}]}frac{partial}{partial theta} g(theta^Tx) =−∑i=1m[yig(θTx)1−(1−yi)1−g(θTx)1]∂θ∂g(θTx)
= − ∑ i = 1 m [ y i 1 g ( θ T x ) − ( 1 − y i ) 1 1 − g ( θ T x ) ] g ( θ T x ) ( 1 − g ( θ T x ) ) ∂ ∂ θ θ T x quadquadquad = -sum_{i=1}^{m}{[y_ifrac{1}{g(theta^Tx)} - (1-y_i)frac{1}{1-g(theta^Tx)}]}g(theta^Tx)(1-g(theta^Tx))frac{partial}{partial theta}theta^Tx =−∑i=1m[yig(θTx)1−(1−yi)1−g(θTx)1]g(θTx)(1−g(θTx))∂θ∂θTx
= − ∑ i = 1 m [ y i ( 1 − g ( θ T x ) ) − ( 1 − y i ) g ( θ T x ) ] x i ( j ) quadquadquad = -sum_{i=1}^{m}{[y_i(1-g(theta^Tx)) - (1-y_i)g(theta^Tx)]}x_i^{(j)} =−∑i=1m[yi(1−g(θTx))−(1−yi)g(θTx)]xi(j)
= − ∑ i = 1 m [ y i − g ( θ T x ) ] x i ( j ) quadquadquad = -sum_{i=1}^{m}{[y_i - g(theta^Tx)]}x_i^{(j)} =−∑i=1m[yi−g(θTx)]xi(j)
= ∑ i = 1 m ( h θ ( x i ) − y i ) x i ( j ) quadquadquad = sum_{i=1}^{m}{(h_theta(x_i) - y_i)}x_i^{(j)} =∑i=1m(hθ(xi)−yi)xi(j)
更新参数:
θ j : = θ j − α ∑ i = 1 m ( h θ ( x i ) − y i ) x i ( j ) theta_j := theta_j - alphasum_{i=1}^{m}{(h_theta(x_i) - y_i)}x_i^{(j)} θj:=θj−α∑i=1m(hθ(xi)−yi)xi(j)
最后
以上就是完美小蘑菇为你收集整理的【机器学习】逻辑回归——数学原理推导的全部内容,希望文章能够帮你解决【机器学习】逻辑回归——数学原理推导所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
发表评论 取消回复