概述
目录
- 1. 信息熵
- 2. 条件熵
- 3. 参考文章
表1. 目标值为PlayTennis的14个训练样例
Day | Outlook | Temperature | Humidity | Wind | PlayTennis |
---|---|---|---|---|---|
D 1 D_1 D1 | Sunny | Hot | High | Weak | No |
D 2 D_2 D2 | Sunny | Hot | High | Strong | No |
D 3 D_3 D3 | Overcast | Hot | High | Weak | Yes |
D 4 D_4 D4 | Rain | Mild | High | Weak | Yes |
D 5 D_5 D5 | Rain | Cool | Normal | Weak | Yes |
D 6 D_6 D6 | Rain | Cool | Normal | Strong | No |
D 7 D_7 D7 | Overcast | Cool | Normal | Strong | Yes |
D 8 D_8 D8 | Sunny | Mild | High | Weak | No |
D 9 D_9 D9 | Sunny | Cool | Normal | Weak | Yes |
D 10 D_{10} D10 | Rain | Mild | Normal | Weak | Yes |
D 11 D_{11} D11 | Sunny | Mild | Normal | Strong | Yes |
D 12 D_{12} D12 | Overcast | Mild | High | Strong | Yes |
D 13 D_{13} D13 | Overcast | Hot | Normal | Weak | Yes |
D 14 D_{14} D14 | Rain | Mild | High | Strong | No |
如表1所示,目标值是:PlayTennis,也就是是否打球。
表1中有四个特征,分别是天气(Outlook)、温度(Temperature)、湿度(Humidity)以及风(Wind)。
1. 信息熵
信息熵的公式:
H
(
X
)
=
−
∑
x
∈
X
p
(
x
)
log
p
(
x
)
H(X) = - sum_{x in X} p(x) log p(x)
H(X)=−x∈X∑p(x)logp(x)
顺带一提,
0
≤
H
(
X
)
≤
log
n
0 leq H(X) leq log n
0≤H(X)≤logn
以表1为例,设是否打球这一随机变量为
Y
Y
Y,则
p
(
Y
=
Yes
)
=
9
14
p(Y = text{Yes}) = frac{9}{14}
p(Y=Yes)=149
p
(
Y
=
No
)
=
5
14
p(Y = text{No}) = frac{5}{14}
p(Y=No)=145
所以,
H
(
Y
)
=
−
∑
y
∈
Y
p
(
y
)
log
p
(
y
)
=
−
(
p
(
Y
=
Yes
)
∗
log
p
(
Y
=
Yes
)
+
p
(
Y
=
No
)
∗
log
p
(
Y
=
No
)
)
=
−
(
9
14
∗
log
2
9
14
+
5
14
∗
log
2
5
14
)
=
0.9403
begin{aligned} H(Y) &= - sum_{y in Y} p(y) log p(y) \ &= - ( p(Y=text{Yes}) ast log p(Y=text{Yes}) + p(Y=text{No}) ast log p(Y=text{No}) ) \ &= - ( frac{9}{14} ast log_2 frac{9}{14} + frac{5}{14} ast log_2 frac{5}{14}) \ &= 0.9403 end{aligned}
H(Y)=−y∈Y∑p(y)logp(y)=−(p(Y=Yes)∗logp(Y=Yes)+p(Y=No)∗logp(Y=No))=−(149∗log2149+145∗log2145)=0.9403
2. 条件熵
条件熵表示在条件
X
X
X下
Y
Y
Y的信息熵。
公式如下:
H
(
Y
∣
X
)
=
∑
x
∈
X
p
(
x
)
H
(
Y
∣
X
=
x
)
H(Y|X) = sum_{x in X} p(x) H(Y|X=x)
H(Y∣X)=x∈X∑p(x)H(Y∣X=x)
在表1的例子中,设湿度(Humidity)为随机变量
X
X
X,则:
p
(
X
=
High
)
=
7
14
=
1
2
p(X=text{High}) = frac{7}{14} = frac{1}{2}
p(X=High)=147=21
p
(
X
=
Normal
)
=
7
14
=
1
2
p(X=text{Normal}) = frac{7}{14} = frac{1}{2}
p(X=Normal)=147=21
所以,
H
(
Y
∣
X
)
=
∑
x
∈
X
p
(
x
)
H
(
Y
∣
X
=
x
)
=
p
(
X
=
High
)
∗
H
(
Y
∣
X
=
High
)
+
p
(
X
=
Normal
)
∗
H
(
Y
∣
X
=
Normal
)
begin{aligned} H(Y|X) &= sum_{x in X} p(x) H(Y|X=x) \ &= p(X=text{High}) ast H(Y|X=text{High}) + p(X=text{Normal}) ast H(Y|X=text{Normal}) end{aligned}
H(Y∣X)=x∈X∑p(x)H(Y∣X=x)=p(X=High)∗H(Y∣X=High)+p(X=Normal)∗H(Y∣X=Normal)
接下来计算 H ( Y ∣ X = High ) H(Y|X=text{High}) H(Y∣X=High)和 H ( Y ∣ X = Normal ) H(Y|X=text{Normal}) H(Y∣X=Normal)。
根据信息熵的计算方法可得:
H
(
Y
∣
X
=
High
)
=
−
∑
y
∈
Y
p
(
y
)
log
p
(
y
)
=
−
(
p
(
Y
=
Yes
∣
X
=
High
)
∗
log
p
(
Y
=
Yes
∣
X
=
High
)
+
p
(
Y
=
No
∣
X
=
High
)
∗
log
p
(
Y
=
No
∣
X
=
High
)
=
−
(
3
7
∗
log
2
3
7
+
4
7
∗
log
2
4
7
)
=
0.9852
begin{aligned} H(Y|X=text{High}) &= - sum_{y in Y} p(y) log p(y) \ &= - ( p(Y=text{Yes} | X=text{High}) ast log p(Y=text{Yes} | X=text{High} ) \ &+ p(Y=text{No} | X=text{High}) ast log p(Y=text{No} | X=text{High} ) \ &= - ( frac{3}{7} ast log_2 frac{3}{7} + frac{4}{7} ast log_2 frac{4}{7} ) \ &= 0.9852 end{aligned}
H(Y∣X=High)=−y∈Y∑p(y)logp(y)=−(p(Y=Yes∣X=High)∗logp(Y=Yes∣X=High)+p(Y=No∣X=High)∗logp(Y=No∣X=High)=−(73∗log273+74∗log274)=0.9852
H ( Y ∣ X = Normal ) = − ∑ y ∈ Y p ( y ) log p ( y ) = − ( p ( Y = Yes ∣ X = Normal ) ∗ log p ( Y = Yes ∣ X = Normal ) + p ( Y = No ∣ X = Normal ) ∗ log p ( Y = No ∣ X = Normal ) = − ( 6 7 ∗ log 2 6 7 + 1 7 ∗ log 2 1 7 ) = 0.5917 begin{aligned} H(Y|X=text{Normal}) &= - sum_{y in Y} p(y) log p(y) \ &= - ( p(Y=text{Yes} | X=text{Normal}) ast log p(Y=text{Yes} | X=text{Normal}) \ &+ p(Y=text{No} | X=text{Normal}) ast log p(Y=text{No} | X=text{Normal}) \ &= - ( frac{6}{7} ast log_2 frac{6}{7} + frac{1}{7} ast log_2 frac{1}{7} ) \ &= 0.5917 end{aligned} H(Y∣X=Normal)=−y∈Y∑p(y)logp(y)=−(p(Y=Yes∣X=Normal)∗logp(Y=Yes∣X=Normal)+p(Y=No∣X=Normal)∗logp(Y=No∣X=Normal)=−(76∗log276+71∗log271)=0.5917
因此,
H
(
Y
∣
X
)
=
∑
x
∈
X
p
(
x
)
H
(
Y
∣
X
=
x
)
=
p
(
X
=
High
)
∗
H
(
Y
∣
X
=
High
)
+
p
(
X
=
Normal
)
∗
H
(
Y
∣
X
=
Normal
)
=
1
2
∗
0.9852
+
1
2
∗
0.5917
=
0.7884
begin{aligned} H(Y|X) &= sum_{x in X} p(x) H(Y|X=x) \ &= p(X=text{High}) ast H(Y|X=text{High}) + p(X=text{Normal}) ast H(Y|X=text{Normal}) \ &= frac{1}{2} ast 0.9852 + frac{1}{2} ast 0.5917 \ &= 0.7884 end{aligned}
H(Y∣X)=x∈X∑p(x)H(Y∣X=x)=p(X=High)∗H(Y∣X=High)+p(X=Normal)∗H(Y∣X=Normal)=21∗0.9852+21∗0.5917=0.7884
3. 参考文章
- 什么是信息熵、条件熵和信息增益
最后
以上就是落寞舞蹈为你收集整理的信息熵和条件熵的计算1. 信息熵2. 条件熵3. 参考文章的全部内容,希望文章能够帮你解决信息熵和条件熵的计算1. 信息熵2. 条件熵3. 参考文章所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
发表评论 取消回复