前言
原文中,根据公式(8.28)写出了集成的“分歧”定义为:
A ‾ ( h ∣ x ) = ∑ i = 1 T w i ( h i ( x ) − H ( x ) ) 2 \overline A(h|x) = \sum\limits_{i=1}^{T}w_i(h_i(x)-H(x))^2 A(h∣x)=i=1∑Twi(hi(x)−H(x))2
结果在公式(8.31)突然变成,将分歧和误差联系上了,看得我非常懵逼
A ‾ ( h ∣ x ) = ∑ i = 1 T w i E ( h i ∣ x ) − E ( H ∣ x ) \overline A(h|x) = \sum\limits_{i=1}^{T}w_iE(h_i|x)-E(H|x) A(h∣x)=i=1∑TwiE(hi∣x)−E(H∣x)
所以,本文主要解释西瓜书第185页公式(8.31)的第一行是怎么来的
公式
首先,将公式(8.31)的第二行换个写法,我们叫他为公式(a),如果能够证明公式(a)是正确的,那么公式(8.31)的第一行也就是成立的:
E ‾ ( h ∣ x ) − A ‾ ( h ∣ x ) = E ( H ∣ x ) \overline E(h|x) -\overline A(h|x) = E(H|x) E(h∣x)−A(h∣x)=E(H∣x)
已知:
E ‾ ( h ∣ x ) = ∑ i = 1 T w i ( f ( x ) − h i ( x ) ) 2 \overline E(h|x) = \sum\limits_{i=1}^{T}w_i(f(x)-h_i(x))^2 E(h∣x)=i=1∑Twi(f(x)−hi(x))2
A ‾ ( h ∣ x ) = ∑ i = 1 T w i ( h i ( x ) − H ( x ) ) 2 \overline A(h|x) = \sum\limits_{i=1}^{T}w_i(h_i(x)-H(x))^2 A(h∣x)=i=1∑Twi(hi(x)−H(x))2
所以:
E ‾ ( h ∣ x ) − A ‾ ( h ∣ x ) \overline E(h|x) -\overline A(h|x) E(h∣x)−A(h∣x)$
= ∑ i = 1 T w i ( f ( x ) − h i ( x ) ) 2 − ∑ i = 1 T w i ( h i ( x ) − H ( x ) ) 2 = \sum\limits_{i=1}^{T}w_i(f(x)-h_i(x))^2 - \sum\limits_{i=1}^{T}w_i(h_i(x)-H(x))^2 =i=1∑Twi(f(x)−hi(x))2−i=1∑Twi(hi(x)−H(x))2
求和号 ∑ i = 1 T \sum\limits_{i=1}^{T} i=1∑T和权重 w i w_i wi提到前面,得:
= ∑ i = 1 T w i [ ( f ( x ) − h i ( x ) ) 2 − ( h i ( x ) − H ( x ) ) 2 ] = \sum\limits_{i=1}^{T}w_i[(f(x)-h_i(x))^2 - (h_i(x)-H(x))^2] =i=1∑Twi[(f(x)−hi(x))2−(hi(x)−H(x))2]
平方展开,得:
= ∑ i = 1 T w i [ f ( x ) 2 + h i ( x ) 2 − 2 f ( x ) h i ( x ) − h i ( x ) 2 − H ( x ) 2 + 2 H ( x ) h i ( x ) ] = \sum\limits_{i=1}^{T}w_i[f(x)^2 + h_i(x)^2 - 2f(x)h_i(x) - h_i(x)^2 -H(x) ^2+2H(x)h_i(x)] =i=1∑Twi[f(x)2+hi(x)2−2f(x)hi(x)−hi(x)2−H(x)2+2H(x)hi(x)]
= ∑ i = 1 T w i [ f ( x ) 2 − 2 f ( x ) h i ( x ) − H ( x ) 2 + 2 H ( x ) h i ( x ) ] = \sum\limits_{i=1}^{T}w_i[f(x)^2 - 2f(x)h_i(x) -H(x)^2 +2H(x)h_i(x)] =i=1∑Twi[f(x)2−2f(x)hi(x)−H(x)2+2H(x)hi(x)]
= ∑ i = 1 T w i [ f ( x ) 2 + 2 h i ( x ) [ H ( x ) − f ( x ) ] − H ( x ) 2 ] = \sum\limits_{i=1}^{T}w_i[f(x)^2 + 2h_i(x)[H(x)-f(x)] -H(x)^2] =i=1∑Twi[f(x)2+2hi(x)[H(x)−f(x)]−H(x)2]
将 ∑ i = 1 T w i \sum\limits_{i=1}^{T}w_i i=1∑Twi都乘进去,得:
= ∑ i = 1 T w i f ( x ) 2 + 2 ∑ i = 1 T w i h i ( x ) [ H ( x ) − f ( x ) ] − ∑ i = 1 T w i H ( x ) 2 = \sum\limits_{i=1}^{T}w_if(x)^2 + 2\sum\limits_{i=1}^{T}w_ih_i(x)[H(x)-f(x)] -\sum\limits_{i=1}^{T}w_iH(x)^2 =i=1∑Twif(x)2+2i=1∑Twihi(x)[H(x)−f(x)]−i=1∑TwiH(x)2
因为 f ( x ) 2 f(x)^2 f(x)2和 H ( x ) 2 H(x)^2 H(x)2均与 i i i无关,因此 ∑ i = 1 T w i = 1 \sum\limits_{i=1}^{T}w_i=1 i=1∑Twi=1,得到下式,记为公式(b):
= f ( x ) 2 + 2 ∑ i = 1 T w i h i ( x ) [ H ( x ) − f ( x ) ] − H ( x ) 2 = f(x)^2 + 2\sum\limits_{i=1}^{T}w_ih_i(x)[H(x)-f(x)] -H(x)^2 =f(x)2+2i=1∑Twihi(x)[H(x)−f(x)]−H(x)2
在回归学习问题中,由西瓜书第182页公式(8.23)可知:
H ( x ) = ∑ i = 1 T w i h i ( x ) H(x) = \sum\limits_{i=1}^{T}w_ih_i(x) H(x)=i=1∑Twihi(x)
将公式(8.23)代入公式(b)中可得:
= f ( x ) 2 + 2 H ( x ) [ H ( x ) − f ( x ) ] − H ( x ) 2 = f(x)^2 + 2H(x)[H(x)-f(x)] -H(x)^2 =f(x)2+2H(x)[H(x)−f(x)]−H(x)2
= f ( x ) 2 + 2 H ( x ) 2 − 2 H ( x ) f ( x ) ] − H ( x ) 2 = f(x)^2 + 2H(x)^2-2H(x)f(x)] -H(x)^2 =f(x)2+2H(x)2−2H(x)f(x)]−H(x)2
= f ( x ) 2 − 2 H ( x ) f ( x ) + H ( x ) 2 = f(x)^2 -2H(x)f(x)+H(x)^2 =f(x)2−2H(x)f(x)+H(x)2
= ( f ( x ) − H ( x ) ) 2 = (f(x)-H(x))^2 =(f(x)−H(x))2
= E ( H ∣ x ) = E(H|x) =E(H∣x)
因此,下式成立:
E ‾ ( h ∣ x ) − A ‾ ( h ∣ x ) = E ( H ∣ x ) \overline E(h|x) -\overline A(h|x) = E(H|x) E(h∣x)−A(h∣x)=E(H∣x)
可得,下式也成立
A ‾ ( h ∣ x ) = E ‾ ( h ∣ x ) − E ( H ∣ x ) \overline A(h|x) = \overline E(h|x) - E(H|x) A(h∣x)=E(h∣x)−E(H∣x)
可得,下式也成立
A ‾ ( h ∣ x ) = ∑ i = 1 T w i E ( h i ∣ x ) − E ( H ∣ x ) \overline A(h|x) = \sum\limits_{i=1}^{T}w_iE(h_i|x)-E(H|x) A(h∣x)=i=1∑TwiE(hi∣x)−E(H∣x)
证毕。
收获
(1)当没有思路的时候,不妨取一些特殊情况找找思路,例如可设 T = 1 T=1 T=1,这样就可以把求和号 ∑ i = 1 T \sum\limits_{i=1}^{T} i=1∑T和权重 w i w_i wi都忽略掉:
将
∑ i = 1 T w i [ f ( x ) 2 + 2 h i ( x ) [ H ( x ) − f ( x ) ] − H ( x ) 2 ] \sum\limits_{i=1}^{T}w_i[f(x)^2 + 2h_i(x)[H(x)-f(x)] -H(x)^2] i=1∑Twi[f(x)2+2hi(x)[H(x)−f(x)]−H(x)2]
变为:
f ( x ) 2 + 2 h ( x ) [ H ( x ) − f ( x ) ] − H ( x ) 2 f(x)^2 + 2h(x)[H(x)-f(x)] -H(x)^2 f(x)2+2h(x)[H(x)−f(x)]−H(x)2
因为 T = 1 T=1 T=1,所以 H ( x ) = h ( x ) H(x)=h(x) H(x)=h(x),可得:
f ( x ) 2 + 2 H ( x ) [ H ( x ) − f ( x ) ] − H ( x ) 2 f(x)^2 + 2H(x)[H(x)-f(x)] -H(x)^2 f(x)2+2H(x)[H(x)−f(x)]−H(x)2
= f ( x ) 2 − 2 H ( x ) f ( x ) + H ( x ) 2 = f(x)^2 -2H(x)f(x)+H(x)^2 =f(x)2−2H(x)f(x)+H(x)2
= ( f ( x ) − H ( x ) ) 2 = (f(x)-H(x))^2 =(f(x)−H(x))2
= E ( H ∣ x ) = E(H|x) =E(H∣x)
这时候你会突然发现,噢,原文问题的关键就是在于,
如何将:
f ( x ) 2 + 2 h ( x ) [ H ( x ) − f ( x ) ] − H ( x ) 2 f(x)^2 + 2h(x)[H(x)-f(x)] -H(x)^2 f(x)2+2h(x)[H(x)−f(x)]−H(x)2
变为:
f ( x ) 2 + 2 H ( x ) [ H ( x ) − f ( x ) ] − H ( x ) 2 f(x)^2 + 2H(x)[H(x)-f(x)] -H(x)^2 f(x)2+2H(x)[H(x)−f(x)]−H(x)2
关键又在于 H ( x ) = h ( x ) H(x)=h(x) H(x)=h(x),但因为之前有 ∑ i = 1 T \sum\limits_{i=1}^{T} i=1∑T和权重 w i w_i wi的干扰,所以你哪怕你知道 H ( x ) = ∑ i = 1 T w i h i ( x ) H(x) = \sum\limits_{i=1}^{T}w_ih_i(x) H(x)=i=1∑Twihi(x),但是如果你不把 ∑ i = 1 T w i \sum\limits_{i=1}^{T}w_i i=1∑Twi乘进去,你也不知道往下推导,所以取一些极端的列子,把干扰消除掉,就很明显了
(2)另外一思路是,两头夹击
我们的目标是得到 E ‾ ( h ∣ x ) − A ‾ ( h ∣ x ) = E ( H ∣ x ) \overline E(h|x) -\overline A(h|x) = E(H|x) E(h∣x)−A(h∣x)=E(H∣x)而:
E ( H ∣ x ) E(H|x) E(H∣x)
= ( f ( x ) − H ( x ) ) 2 = (f(x)-H(x))^2 =(f(x)−H(x))2
= f ( x ) 2 − 2 H ( x ) f ( x ) + H ( x ) 2 = f(x)^2 -2H(x)f(x)+H(x)^2 =f(x)2−2H(x)f(x)+H(x)2
同时,我们从 E ‾ ( h ∣ x ) − A ‾ ( h ∣ x ) \overline E(h|x) -\overline A(h|x) E(h∣x)−A(h∣x)出发已经得到了:
= ∑ i = 1 T w i [ f ( x ) 2 + 2 h i ( x ) [ H ( x ) − f ( x ) ] − H ( x ) 2 ] = \sum\limits_{i=1}^{T}w_i[f(x)^2 + 2h_i(x)[H(x)-f(x)] -H(x)^2] =i=1∑Twi[f(x)2+2hi(x)[H(x)−f(x)]−H(x)2]
两个式子对比一下就可以发现,关键就是要消去 h i ( x ) h_i(x) hi(x),所以我们要找
- h i ( x ) h_i(x) hi(x)与 H ( x ) H(x) H(x)的关系
- h i ( x ) h_i(x) hi(x)与 f ( x ) f(x) f(x)的关系
显然 h i ( x ) h_i(x) hi(x)与 f ( x ) f(x) f(x)是没有关系的,同时我们发现
H ( x ) = ∑ i = 1 T w i h i ( x ) H(x) = \sum\limits_{i=1}^{T}w_ih_i(x) H(x)=i=1∑Twihi(x)
所以可以把这个式子代进去尝试,把 h i ( x ) h_i(x) hi(x)消去,那么结果也就出来了
备注
由上面的推导可知,用到的是加权平均发 H ( x ) = ∑ i = 1 T w i h i ( x ) H(x) = \sum\limits_{i=1}^{T}w_ih_i(x) H(x)=i=1∑Twihi(x),因此这种分析方法只适用于回归学习(即数值型输出)