AdaBoost 二分类问题训练误差界的2种证明方法

1.训练误差界定理

引自李航老师《统计学习方法》P161 定理8.2:
∏ m = 1 M Z m = ∏ m = 1 M [ 2 e m ( 1 − e m ) ] \quad\quad\quad\prod\limits_{m=1}^MZ_m=\prod\limits_{m=1}^M[2\sqrt{e_m(1-e_m)}] m=1MZm=m=1M[2em(1em) ]   = ∏ m = 1 M ( 1 − 4 r m 2 ) \\\quad\quad\quad\quad\quad\quad\ =\prod\limits_{m=1}^M\sqrt{(1-4r_m^2)}  =m=1M(14rm2)   ≤ exp ⁡ ( − 2 ∑ m = 1 M γ m 2 ) \\\quad\quad\quad\quad\quad\quad\ \le\exp({-2\sum\limits_{m=1}^M \gamma_m^2})  exp(2m=1Mγm2)
其中, γ = 1 2 − e m \gamma=\cfrac{1}{2}-e_m γ=21em

在此只证明不等式部分。

2.不等式部分的两种证明方法

2.1 《统计学习方法》给出的证明

通过泰勒展开,
e x = 1 + x + x 2 2 ! + . . . + x n n ! + . . . \qquad e^x=1+x+\cfrac{x^2}{2!}+...+\cfrac{x^n}{n!}+... ex=1+x+2!x2+...+n!xn+...
1 − x = 1 + 1 2 ( − x ) 1 ! + 1 2 ( 1 2 − 1 ) x 2 2 ! + . . . \qquad \sqrt{1-x}=1+\cfrac{\cfrac{1}{2}(-x)}{1!}+\cfrac{\cfrac{1}{2}(\cfrac{1}{2}-1)x^2}{2!}+... 1x =1+1!21(x)+2!21(211)x2+...
为表示方便起见,令: t = 4 r m 2 t=4r_m^2 t=4rm2
由于 0 ≤ e m ≤ 1 2 0\le {e_m}\le\cfrac{1}{2} 0em21 γ = 1 2 − e m \gamma=\cfrac{1}{2}-e_m γ=21em,则有: 0 ≤ t ≤ 1 0\le t\le1 0t1
因此:
e x p ( − 2 r m 2 ) = e x p ( − t 2 ) = 1 + ( − t 2 ) + ( − t 2 ) 2 2 ! + ( − t 2 ) 3 3 ! + ( − t 2 ) 4 4 ! + . . . exp(-2r_m^2)=exp(-\cfrac{t}{2})=1+(-\cfrac{t}{2})+\cfrac{(-\cfrac{t}{2})^2}{2!}+\cfrac{(-\cfrac{t}{2})^3}{3!}+\cfrac{(-\cfrac{t}{2})^4}{4!}+... exp(2rm2)=exp(2t)=1+(2t)+2!(2t)2+3!(2t)3+4!(2t)4+... = 1 − t 2 + t 2 8 − t 3 48 + t 4 384 − . . .       \\\qquad\qquad\quad=1-\cfrac{t}{2}+\cfrac{t^2}{8}-\cfrac{t^3}{48}+\cfrac{t^4}{384}-...\qquad\qquad \qquad\qquad\quad\ \qquad\ \quad\ \quad =12t+8t248t3+384t4...   
1 − 4 r m 2 = 1 − t = 1 + 1 2 ( − t ) 1 ! + 1 2 ( 1 2 − 1 ) ( − t ) 2 2 ! + 1 2 ( 1 2 − 1 ) ( 1 2 − 2 ) ( − t ) 3 3 ! \sqrt{1-4r_m^2}=\sqrt{1-t}=1+\cfrac{\cfrac{1}{2}(-t)}{1!}+\cfrac{\cfrac{1}{2}(\cfrac{1}{2}-1)(-t)^2}{2!}+\cfrac{\cfrac{1}{2}(\cfrac{1}{2}-1)(\cfrac{1}{2}-2)(-t)^3}{3!} 14rm2 =1t =1+1!21(t)+2!21(211)(t)2+3!21(211)(212)(t)3 + 1 2 ( 1 2 − 1 ) ( 1 2 − 2 ) ( 1 2 − 3 ) ( − t ) 4 4 ! + . . . \\\qquad\qquad\qquad\qquad\qquad\quad+\cfrac{\cfrac{1}{2}(\cfrac{1}{2}-1)(\cfrac{1}{2}-2)(\cfrac{1}{2}-3)(-t)^4}{4!}+... +4!21(211)(212)(213)(t)4+... = 1 − t 2 − t 2 8 − 3 t 3 48 − 15 t 4 384 − . . .       \\\qquad\qquad\quad=1-\cfrac{t}{2}-\cfrac{t^2}{8}-\cfrac{3t^3}{48}-\cfrac{15t^4}{384}-...\qquad\qquad \qquad\qquad\quad\ \qquad\ \quad\ \quad =12t8t2483t338415t4...   

容易证得, ① − ② ≥ 0 ①-②\ge0 0,即: e x p ( − 2 r m 2 ) ≥ 1 − 4 r m 2 exp(-2r_m^2)\ge \sqrt{1-4r_m^2} exp(2rm2)14rm2
因此有: ∏ m = 1 M ( 1 − 4 r m 2 ) \prod\limits_{m=1}^M\sqrt{(1-4r_m^2)} m=1M(14rm2) ≤ ∏ m = 1 M e x p ( − 2 r m 2 ) = e x p ( − 2 ∑ m = 1 M γ m 2 ) \le\prod\limits_{m=1}^Mexp(-2r_m^2)=exp({-2\sum\limits_{m=1}^M \gamma_m^2}) m=1Mexp(2rm2)=exp(2m=1Mγm2)
证毕。

2.2 Freund与Schapire的paper证法

两位boosting大神的论文《A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting》也给出了误差界的证明方法,其利用了KL散度。
∏ m = 1 M ( 1 − 4 r m 2 ) = e x p ( − ∑ m = 1 M K L ( 1 2 ∣ ∣ 1 2 − r m ) ) \qquad\qquad\prod\limits_{m=1}^M\sqrt{(1-4r_m^2)}=exp(-\sum\limits_{m=1}^MKL(\cfrac{1}{2}||\cfrac{1}{2}-r_m)) m=1M(14rm2) =exp(m=1MKL(2121rm))
这里 K L ( a ∣ ∣ b ) = a ln ⁡ a b + ( 1 − a ) ln ⁡ 1 − a 1 − b , a = 1 2 , b = 1 2 − r m KL(a||b)=a \ln\cfrac{a}{b}+(1-a)\ln\cfrac{1-a}{1-b},a=\cfrac{1}{2},b=\cfrac{1}{2}-r_m KL(ab)=alnba+(1a)ln1b1a,a=21,b=21rm,因此:
− K L ( 1 2 ∣ ∣ 1 2 − r m ) = 1 2 ln ⁡ 1 2 − r m 1 2 + 1 2 ln ⁡ 1 2 + r m 1 2 \qquad-KL(\cfrac{1}{2}||\cfrac{1}{2}-r_m)=\cfrac{1}{2}\ln\cfrac{\cfrac{1}{2}-r_m}{\frac{1}{2}}+\cfrac{1}{2}\ln\cfrac{\cfrac{1}{2}+r_m}{\frac{1}{2}} KL(2121rm)=21ln2121rm+21ln2121+rm       = 1 2 ln ⁡ ( 1 − 2 r m ) + 1 2 ln ⁡ ( 1 + 2 r m ) \\ \qquad\qquad\quad\ \qquad\ \quad\ \quad =\cfrac{1}{2}\ln(1-2r_m)+\cfrac{1}{2}\ln(1+2r_m)    =21ln(12rm)+21ln(1+2rm)       = 1 2 ln ⁡ ( 1 − 4 r m 2 ) \\ \qquad\qquad\quad\ \qquad\ \quad\ \quad =\cfrac{1}{2}\ln(1-4r_m^2)    =21ln(14rm2)
这里也需要用到泰勒展开:
ln ⁡ ( 1 − x ) = − x − x 2 2 − x 3 3 . . . − x n n − . . . \qquad\qquad \ln(1-x)=-x-\cfrac{x^2}{2}-\cfrac{x^3}{3}...-\cfrac{x^n}{n}-... ln(1x)=x2x23x3...nxn...

0 ≤ x = 4 r m 2 ≤ 1 0 \le x=4r_m^2\le1 0x=4rm21,则有: ln ⁡ ( 1 − 4 r m 2 ) ≤ − 4 r m 2 \ln(1-4r_m^2)\le-4r_m^2 ln(14rm2)4rm2,所以:
∏ m = 1 M ( 1 − 4 r m 2 ) = e x p ( ∑ m = 1 M − K L ( 1 2 ∣ ∣ 1 2 − r m ) ) ≤ e x p ( ∑ m = 1 M 1 2 ⋅ ( − 4 r m 2 ) ) = e x p ( − 2 ∑ m = 1 M r m 2 ) \qquad\qquad\prod\limits_{m=1}^M\sqrt{(1-4r_m^2)}=exp(\sum\limits_{m=1}^M-KL(\cfrac{1}{2}||\cfrac{1}{2}-r_m))\le exp(\sum\limits_{m=1}^M\cfrac{1}{2}\cdot(-4r_m^2))=exp(-2\sum\limits_{m=1}^Mr_m^2) m=1M(14rm2) =exp(m=1MKL(2121rm))exp(m=1M21(4rm2))=exp(2m=1Mrm2)
证毕。

参考文献

[1]统计学习方法 第8.2节
[2]A Decision-Theoretic Generalization of On-Line Learning
and an Application to Boosting

猜你喜欢

转载自blog.csdn.net/WANGWUSHAN/article/details/108625621