【机器学习】西瓜书集成学习的误差-分歧分解公式推导

前言

原文中,根据公式(8.28)写出了集成的“分歧”定义为:
A ‾ ( h ∣ x ) = ∑ i = 1 T w i ( h i ( x ) − H ( x ) ) 2 \overline A(h|x) = \sum\limits_{i=1}^{T}w_i(h_i(x)-H(x))^2 A(hx)=i=1Twi(hi(x)H(x))2

结果在公式(8.31)突然变成,将分歧和误差联系上了,看得我非常懵逼
A ‾ ( h ∣ x ) = ∑ i = 1 T w i E ( h i ∣ x ) − E ( H ∣ x ) \overline A(h|x) = \sum\limits_{i=1}^{T}w_iE(h_i|x)-E(H|x) A(hx)=i=1TwiE(hix)E(Hx)

所以,本文主要解释西瓜书第185页公式(8.31)的第一行是怎么来的

公式

首先,将公式(8.31)的第二行换个写法,我们叫他为公式(a),如果能够证明公式(a)是正确的,那么公式(8.31)的第一行也就是成立的:
E ‾ ( h ∣ x ) − A ‾ ( h ∣ x ) = E ( H ∣ x ) \overline E(h|x) -\overline A(h|x) = E(H|x) E(hx)A(hx)=E(Hx)

已知:
E ‾ ( h ∣ x ) = ∑ i = 1 T w i ( f ( x ) − h i ( x ) ) 2 \overline E(h|x) = \sum\limits_{i=1}^{T}w_i(f(x)-h_i(x))^2 E(hx)=i=1Twi(f(x)hi(x))2
A ‾ ( h ∣ x ) = ∑ i = 1 T w i ( h i ( x ) − H ( x ) ) 2 \overline A(h|x) = \sum\limits_{i=1}^{T}w_i(h_i(x)-H(x))^2 A(hx)=i=1Twi(hi(x)H(x))2

所以:

E ‾ ( h ∣ x ) − A ‾ ( h ∣ x ) \overline E(h|x) -\overline A(h|x) E(hx)A(hx)$

= ∑ i = 1 T w i ( f ( x ) − h i ( x ) ) 2 − ∑ i = 1 T w i ( h i ( x ) − H ( x ) ) 2 = \sum\limits_{i=1}^{T}w_i(f(x)-h_i(x))^2 - \sum\limits_{i=1}^{T}w_i(h_i(x)-H(x))^2 =i=1Twi(f(x)hi(x))2i=1Twi(hi(x)H(x))2

求和号 ∑ i = 1 T \sum\limits_{i=1}^{T} i=1T和权重 w i w_i wi提到前面,得:

= ∑ i = 1 T w i [ ( f ( x ) − h i ( x ) ) 2 − ( h i ( x ) − H ( x ) ) 2 ] = \sum\limits_{i=1}^{T}w_i[(f(x)-h_i(x))^2 - (h_i(x)-H(x))^2] =i=1Twi[(f(x)hi(x))2(hi(x)H(x))2]

平方展开,得:

= ∑ i = 1 T w i [ f ( x ) 2 + h i ( x ) 2 − 2 f ( x ) h i ( x ) − h i ( x ) 2 − H ( x ) 2 + 2 H ( x ) h i ( x ) ] = \sum\limits_{i=1}^{T}w_i[f(x)^2 + h_i(x)^2 - 2f(x)h_i(x) - h_i(x)^2 -H(x) ^2+2H(x)h_i(x)] =i=1Twi[f(x)2+hi(x)22f(x)hi(x)hi(x)2H(x)2+2H(x)hi(x)]

= ∑ i = 1 T w i [ f ( x ) 2 − 2 f ( x ) h i ( x ) − H ( x ) 2 + 2 H ( x ) h i ( x ) ] = \sum\limits_{i=1}^{T}w_i[f(x)^2 - 2f(x)h_i(x) -H(x)^2 +2H(x)h_i(x)] =i=1Twi[f(x)22f(x)hi(x)H(x)2+2H(x)hi(x)]

= ∑ i = 1 T w i [ f ( x ) 2 + 2 h i ( x ) [ H ( x ) − f ( x ) ] − H ( x ) 2 ] = \sum\limits_{i=1}^{T}w_i[f(x)^2 + 2h_i(x)[H(x)-f(x)] -H(x)^2] =i=1Twi[f(x)2+2hi(x)[H(x)f(x)]H(x)2]

∑ i = 1 T w i \sum\limits_{i=1}^{T}w_i i=1Twi都乘进去,得:

= ∑ i = 1 T w i f ( x ) 2 + 2 ∑ i = 1 T w i h i ( x ) [ H ( x ) − f ( x ) ] − ∑ i = 1 T w i H ( x ) 2 = \sum\limits_{i=1}^{T}w_if(x)^2 + 2\sum\limits_{i=1}^{T}w_ih_i(x)[H(x)-f(x)] -\sum\limits_{i=1}^{T}w_iH(x)^2 =i=1Twif(x)2+2i=1Twihi(x)[H(x)f(x)]i=1TwiH(x)2

因为 f ( x ) 2 f(x)^2 f(x)2 H ( x ) 2 H(x)^2 H(x)2均与 i i i无关,因此 ∑ i = 1 T w i = 1 \sum\limits_{i=1}^{T}w_i=1 i=1Twi=1,得到下式,记为公式(b):

= f ( x ) 2 + 2 ∑ i = 1 T w i h i ( x ) [ H ( x ) − f ( x ) ] − H ( x ) 2 = f(x)^2 + 2\sum\limits_{i=1}^{T}w_ih_i(x)[H(x)-f(x)] -H(x)^2 =f(x)2+2i=1Twihi(x)[H(x)f(x)]H(x)2

在回归学习问题中,由西瓜书第182页公式(8.23)可知:

H ( x ) = ∑ i = 1 T w i h i ( x ) H(x) = \sum\limits_{i=1}^{T}w_ih_i(x) H(x)=i=1Twihi(x)

将公式(8.23)代入公式(b)中可得:

= f ( x ) 2 + 2 H ( x ) [ H ( x ) − f ( x ) ] − H ( x ) 2 = f(x)^2 + 2H(x)[H(x)-f(x)] -H(x)^2 =f(x)2+2H(x)[H(x)f(x)]H(x)2

= f ( x ) 2 + 2 H ( x ) 2 − 2 H ( x ) f ( x ) ] − H ( x ) 2 = f(x)^2 + 2H(x)^2-2H(x)f(x)] -H(x)^2 =f(x)2+2H(x)22H(x)f(x)]H(x)2

= f ( x ) 2 − 2 H ( x ) f ( x ) + H ( x ) 2 = f(x)^2 -2H(x)f(x)+H(x)^2 =f(x)22H(x)f(x)+H(x)2

= ( f ( x ) − H ( x ) ) 2 = (f(x)-H(x))^2 =(f(x)H(x))2

= E ( H ∣ x ) = E(H|x) =E(Hx)

因此,下式成立:

E ‾ ( h ∣ x ) − A ‾ ( h ∣ x ) = E ( H ∣ x ) \overline E(h|x) -\overline A(h|x) = E(H|x) E(hx)A(hx)=E(Hx)

可得,下式也成立
A ‾ ( h ∣ x ) = E ‾ ( h ∣ x ) − E ( H ∣ x ) \overline A(h|x) = \overline E(h|x) - E(H|x) A(hx)=E(hx)E(Hx)

可得,下式也成立
A ‾ ( h ∣ x ) = ∑ i = 1 T w i E ( h i ∣ x ) − E ( H ∣ x ) \overline A(h|x) = \sum\limits_{i=1}^{T}w_iE(h_i|x)-E(H|x) A(hx)=i=1TwiE(hix)E(Hx)
证毕。

收获

(1)当没有思路的时候,不妨取一些特殊情况找找思路,例如可设 T = 1 T=1 T=1,这样就可以把求和号 ∑ i = 1 T \sum\limits_{i=1}^{T} i=1T和权重 w i w_i wi都忽略掉:

∑ i = 1 T w i [ f ( x ) 2 + 2 h i ( x ) [ H ( x ) − f ( x ) ] − H ( x ) 2 ] \sum\limits_{i=1}^{T}w_i[f(x)^2 + 2h_i(x)[H(x)-f(x)] -H(x)^2] i=1Twi[f(x)2+2hi(x)[H(x)f(x)]H(x)2]

变为:

f ( x ) 2 + 2 h ( x ) [ H ( x ) − f ( x ) ] − H ( x ) 2 f(x)^2 + 2h(x)[H(x)-f(x)] -H(x)^2 f(x)2+2h(x)[H(x)f(x)]H(x)2

因为 T = 1 T=1 T=1,所以 H ( x ) = h ( x ) H(x)=h(x) H(x)=h(x),可得:

f ( x ) 2 + 2 H ( x ) [ H ( x ) − f ( x ) ] − H ( x ) 2 f(x)^2 + 2H(x)[H(x)-f(x)] -H(x)^2 f(x)2+2H(x)[H(x)f(x)]H(x)2

= f ( x ) 2 − 2 H ( x ) f ( x ) + H ( x ) 2 = f(x)^2 -2H(x)f(x)+H(x)^2 =f(x)22H(x)f(x)+H(x)2

= ( f ( x ) − H ( x ) ) 2 = (f(x)-H(x))^2 =(f(x)H(x))2

= E ( H ∣ x ) = E(H|x) =E(Hx)

这时候你会突然发现,噢,原文问题的关键就是在于,
如何将:

f ( x ) 2 + 2 h ( x ) [ H ( x ) − f ( x ) ] − H ( x ) 2 f(x)^2 + 2h(x)[H(x)-f(x)] -H(x)^2 f(x)2+2h(x)[H(x)f(x)]H(x)2

变为:

f ( x ) 2 + 2 H ( x ) [ H ( x ) − f ( x ) ] − H ( x ) 2 f(x)^2 + 2H(x)[H(x)-f(x)] -H(x)^2 f(x)2+2H(x)[H(x)f(x)]H(x)2

关键又在于 H ( x ) = h ( x ) H(x)=h(x) H(x)=h(x),但因为之前有 ∑ i = 1 T \sum\limits_{i=1}^{T} i=1T和权重 w i w_i wi的干扰,所以你哪怕你知道 H ( x ) = ∑ i = 1 T w i h i ( x ) H(x) = \sum\limits_{i=1}^{T}w_ih_i(x) H(x)=i=1Twihi(x),但是如果你不把 ∑ i = 1 T w i \sum\limits_{i=1}^{T}w_i i=1Twi乘进去,你也不知道往下推导,所以取一些极端的列子,把干扰消除掉,就很明显了



(2)另外一思路是,两头夹击

我们的目标是得到 E ‾ ( h ∣ x ) − A ‾ ( h ∣ x ) = E ( H ∣ x ) \overline E(h|x) -\overline A(h|x) = E(H|x) E(hx)A(hx)=E(Hx)而:

E ( H ∣ x ) E(H|x) E(Hx)

= ( f ( x ) − H ( x ) ) 2 = (f(x)-H(x))^2 =(f(x)H(x))2

= f ( x ) 2 − 2 H ( x ) f ( x ) + H ( x ) 2 = f(x)^2 -2H(x)f(x)+H(x)^2 =f(x)22H(x)f(x)+H(x)2

同时,我们从 E ‾ ( h ∣ x ) − A ‾ ( h ∣ x ) \overline E(h|x) -\overline A(h|x) E(hx)A(hx)出发已经得到了:

= ∑ i = 1 T w i [ f ( x ) 2 + 2 h i ( x ) [ H ( x ) − f ( x ) ] − H ( x ) 2 ] = \sum\limits_{i=1}^{T}w_i[f(x)^2 + 2h_i(x)[H(x)-f(x)] -H(x)^2] =i=1Twi[f(x)2+2hi(x)[H(x)f(x)]H(x)2]

两个式子对比一下就可以发现,关键就是要消去 h i ( x ) h_i(x) hi(x),所以我们要找

  1. h i ( x ) h_i(x) hi(x) H ( x ) H(x) H(x)的关系
  2. h i ( x ) h_i(x) hi(x) f ( x ) f(x) f(x)的关系

显然 h i ( x ) h_i(x) hi(x) f ( x ) f(x) f(x)是没有关系的,同时我们发现
H ( x ) = ∑ i = 1 T w i h i ( x ) H(x) = \sum\limits_{i=1}^{T}w_ih_i(x) H(x)=i=1Twihi(x)

所以可以把这个式子代进去尝试,把 h i ( x ) h_i(x) hi(x)消去,那么结果也就出来了

备注

由上面的推导可知,用到的是加权平均发 H ( x ) = ∑ i = 1 T w i h i ( x ) H(x) = \sum\limits_{i=1}^{T}w_ih_i(x) H(x)=i=1Twihi(x),因此这种分析方法只适用于回归学习(即数值型输出)

猜你喜欢

转载自blog.csdn.net/weixin_38705903/article/details/103671482