表1. 目标值为PlayTennis的14个训练样例
Day | Outlook | Temperature | Humidity | Wind | PlayTennis |
---|---|---|---|---|---|
D 1 D_1 D1 | Sunny | Hot | High | Weak | No |
D 2 D_2 D2 | Sunny | Hot | High | Strong | No |
D 3 D_3 D3 | Overcast | Hot | High | Weak | Yes |
D 4 D_4 D4 | Rain | Mild | High | Weak | Yes |
D 5 D_5 D5 | Rain | Cool | Normal | Weak | Yes |
D 6 D_6 D6 | Rain | Cool | Normal | Strong | No |
D 7 D_7 D7 | Overcast | Cool | Normal | Strong | Yes |
D 8 D_8 D8 | Sunny | Mild | High | Weak | No |
D 9 D_9 D9 | Sunny | Cool | Normal | Weak | Yes |
D 10 D_{10} D10 | Rain | Mild | Normal | Weak | Yes |
D 11 D_{11} D11 | Sunny | Mild | Normal | Strong | Yes |
D 12 D_{12} D12 | Overcast | Mild | High | Strong | Yes |
D 13 D_{13} D13 | Overcast | Hot | Normal | Weak | Yes |
D 14 D_{14} D14 | Rain | Mild | High | Strong | No |
如表1所示,目标值是:PlayTennis,也就是是否打球。
表1中有四个特征,分别是天气(Outlook)、温度(Temperature)、湿度(Humidity)以及风(Wind)。
1. 信息熵
信息熵的公式:
H ( X ) = − ∑ x ∈ X p ( x ) log p ( x ) H(X) = - \sum_{x \in X} p(x) \log p(x) H(X)=−x∈X∑p(x)logp(x)
顺带一提,
0 ≤ H ( X ) ≤ log n 0 \leq H(X) \leq \log n 0≤H(X)≤logn
以表1为例,设是否打球这一随机变量为 Y Y Y,则
p ( Y = Yes ) = 9 14 p(Y = \text{Yes}) = \frac{9}{14} p(Y=Yes)=149
p ( Y = No ) = 5 14 p(Y = \text{No}) = \frac{5}{14} p(Y=No)=145
所以,
H ( Y ) = − ∑ y ∈ Y p ( y ) log p ( y ) = − ( p ( Y = Yes ) ∗ log p ( Y = Yes ) + p ( Y = No ) ∗ log p ( Y = No ) ) = − ( 9 14 ∗ log 2 9 14 + 5 14 ∗ log 2 5 14 ) = 0.9403 \begin{aligned} H(Y) &= - \sum_{y \in Y} p(y) \log p(y) \\ &= - ( p(Y=\text{Yes}) \ast \log p(Y=\text{Yes}) + p(Y=\text{No}) \ast \log p(Y=\text{No}) ) \\ &= - ( \frac{9}{14} \ast \log_2 \frac{9}{14} + \frac{5}{14} \ast \log_2 \frac{5}{14}) \\ &= 0.9403 \end{aligned} H(Y)=−y∈Y∑p(y)logp(y)=−(p(Y=Yes)∗logp(Y=Yes)+p(Y=No)∗logp(Y=No))=−(149∗log2149+145∗log2145)=0.9403
2. 条件熵
条件熵表示在条件 X X X下 Y Y Y的信息熵。
公式如下:
H ( Y ∣ X ) = ∑ x ∈ X p ( x ) H ( Y ∣ X = x ) H(Y|X) = \sum_{x \in X} p(x) H(Y|X=x) H(Y∣X)=x∈X∑p(x)H(Y∣X=x)
在表1的例子中,设湿度(Humidity)为随机变量 X X X,则:
p ( X = High ) = 7 14 = 1 2 p(X=\text{High}) = \frac{7}{14} = \frac{1}{2} p(X=High)=147=21
p ( X = Normal ) = 7 14 = 1 2 p(X=\text{Normal}) = \frac{7}{14} = \frac{1}{2} p(X=Normal)=147=21
所以,
H ( Y ∣ X ) = ∑ x ∈ X p ( x ) H ( Y ∣ X = x ) = p ( X = High ) ∗ H ( Y ∣ X = High ) + p ( X = Normal ) ∗ H ( Y ∣ X = Normal ) \begin{aligned} H(Y|X) &= \sum_{x \in X} p(x) H(Y|X=x) \\ &= p(X=\text{High}) \ast H(Y|X=\text{High}) + p(X=\text{Normal}) \ast H(Y|X=\text{Normal}) \end{aligned} H(Y∣X)=x∈X∑p(x)H(Y∣X=x)=p(X=High)∗H(Y∣X=High)+p(X=Normal)∗H(Y∣X=Normal)
接下来计算 H ( Y ∣ X = High ) H(Y|X=\text{High}) H(Y∣X=High)和 H ( Y ∣ X = Normal ) H(Y|X=\text{Normal}) H(Y∣X=Normal)。
根据信息熵的计算方法可得:
H ( Y ∣ X = High ) = − ∑ y ∈ Y p ( y ) log p ( y ) = − ( p ( Y = Yes ∣ X = High ) ∗ log p ( Y = Yes ∣ X = High ) + p ( Y = No ∣ X = High ) ∗ log p ( Y = No ∣ X = High ) = − ( 3 7 ∗ log 2 3 7 + 4 7 ∗ log 2 4 7 ) = 0.9852 \begin{aligned} H(Y|X=\text{High}) &= - \sum_{y \in Y} p(y) \log p(y) \\ &= - ( p(Y=\text{Yes} | X=\text{High}) \ast \log p(Y=\text{Yes} | X=\text{High} ) \\ &+ p(Y=\text{No} | X=\text{High}) \ast \log p(Y=\text{No} | X=\text{High} ) \\ &= - ( \frac{3}{7} \ast \log_2 \frac{3}{7} + \frac{4}{7} \ast \log_2 \frac{4}{7} ) \\ &= 0.9852 \end{aligned} H(Y∣X=High)=−y∈Y∑p(y)logp(y)=−(p(Y=Yes∣X=High)∗logp(Y=Yes∣X=High)+p(Y=No∣X=High)∗logp(Y=No∣X=High)=−(73∗log273+74∗log274)=0.9852
H ( Y ∣ X = Normal ) = − ∑ y ∈ Y p ( y ) log p ( y ) = − ( p ( Y = Yes ∣ X = Normal ) ∗ log p ( Y = Yes ∣ X = Normal ) + p ( Y = No ∣ X = Normal ) ∗ log p ( Y = No ∣ X = Normal ) = − ( 6 7 ∗ log 2 6 7 + 1 7 ∗ log 2 1 7 ) = 0.5917 \begin{aligned} H(Y|X=\text{Normal}) &= - \sum_{y \in Y} p(y) \log p(y) \\ &= - ( p(Y=\text{Yes} | X=\text{Normal}) \ast \log p(Y=\text{Yes} | X=\text{Normal}) \\ &+ p(Y=\text{No} | X=\text{Normal}) \ast \log p(Y=\text{No} | X=\text{Normal}) \\ &= - ( \frac{6}{7} \ast \log_2 \frac{6}{7} + \frac{1}{7} \ast \log_2 \frac{1}{7} ) \\ &= 0.5917 \end{aligned} H(Y∣X=Normal)=−y∈Y∑p(y)logp(y)=−(p(Y=Yes∣X=Normal)∗logp(Y=Yes∣X=Normal)+p(Y=No∣X=Normal)∗logp(Y=No∣X=Normal)=−(76∗log276+71∗log271)=0.5917
因此,
H ( Y ∣ X ) = ∑ x ∈ X p ( x ) H ( Y ∣ X = x ) = p ( X = High ) ∗ H ( Y ∣ X = High ) + p ( X = Normal ) ∗ H ( Y ∣ X = Normal ) = 1 2 ∗ 0.9852 + 1 2 ∗ 0.5917 = 0.7884 \begin{aligned} H(Y|X) &= \sum_{x \in X} p(x) H(Y|X=x) \\ &= p(X=\text{High}) \ast H(Y|X=\text{High}) + p(X=\text{Normal}) \ast H(Y|X=\text{Normal}) \\ &= \frac{1}{2} \ast 0.9852 + \frac{1}{2} \ast 0.5917 \\ &= 0.7884 \end{aligned} H(Y∣X)=x∈X∑p(x)H(Y∣X=x)=p(X=High)∗H(Y∣X=High)+p(X=Normal)∗H(Y∣X=Normal)=21∗0.9852+21∗0.5917=0.7884