AdaBoost 二分类问题训练误差界的2种证明方法
1.训练误差界定理
引自李航老师《统计学习方法》P161 定理8.2:
∏ m = 1 M Z m = ∏ m = 1 M [ 2 e m ( 1 − e m ) ] \quad\quad\quad\prod\limits_{m=1}^MZ_m=\prod\limits_{m=1}^M[2\sqrt{e_m(1-e_m)}] m=1∏MZm=m=1∏M[2em(1−em)] = ∏ m = 1 M ( 1 − 4 r m 2 ) \\\quad\quad\quad\quad\quad\quad\ =\prod\limits_{m=1}^M\sqrt{(1-4r_m^2)} =m=1∏M(1−4rm2) ≤ exp ( − 2 ∑ m = 1 M γ m 2 ) \\\quad\quad\quad\quad\quad\quad\ \le\exp({-2\sum\limits_{m=1}^M \gamma_m^2}) ≤exp(−2m=1∑Mγm2)
其中, γ = 1 2 − e m \gamma=\cfrac{1}{2}-e_m γ=21−em
在此只证明不等式部分。
2.不等式部分的两种证明方法
2.1 《统计学习方法》给出的证明
通过泰勒展开,
e x = 1 + x + x 2 2 ! + . . . + x n n ! + . . . \qquad e^x=1+x+\cfrac{x^2}{2!}+...+\cfrac{x^n}{n!}+... ex=1+x+2!x2+...+n!xn+...
1 − x = 1 + 1 2 ( − x ) 1 ! + 1 2 ( 1 2 − 1 ) x 2 2 ! + . . . \qquad \sqrt{1-x}=1+\cfrac{\cfrac{1}{2}(-x)}{1!}+\cfrac{\cfrac{1}{2}(\cfrac{1}{2}-1)x^2}{2!}+... 1−x=1+1!21(−x)+2!21(21−1)x2+...
为表示方便起见,令: t = 4 r m 2 t=4r_m^2 t=4rm2
由于 0 ≤ e m ≤ 1 2 0\le {e_m}\le\cfrac{1}{2} 0≤em≤21 , γ = 1 2 − e m \gamma=\cfrac{1}{2}-e_m γ=21−em,则有: 0 ≤ t ≤ 1 0\le t\le1 0≤t≤1
因此:
e x p ( − 2 r m 2 ) = e x p ( − t 2 ) = 1 + ( − t 2 ) + ( − t 2 ) 2 2 ! + ( − t 2 ) 3 3 ! + ( − t 2 ) 4 4 ! + . . . exp(-2r_m^2)=exp(-\cfrac{t}{2})=1+(-\cfrac{t}{2})+\cfrac{(-\cfrac{t}{2})^2}{2!}+\cfrac{(-\cfrac{t}{2})^3}{3!}+\cfrac{(-\cfrac{t}{2})^4}{4!}+... exp(−2rm2)=exp(−2t)=1+(−2t)+2!(−2t)2+3!(−2t)3+4!(−2t)4+... = 1 − t 2 + t 2 8 − t 3 48 + t 4 384 − . . . \\\qquad\qquad\quad=1-\cfrac{t}{2}+\cfrac{t^2}{8}-\cfrac{t^3}{48}+\cfrac{t^4}{384}-...\qquad\qquad \qquad\qquad\quad\ \qquad\ \quad\ \quad =1−2t+8t2−48t3+384t4−... ①
1 − 4 r m 2 = 1 − t = 1 + 1 2 ( − t ) 1 ! + 1 2 ( 1 2 − 1 ) ( − t ) 2 2 ! + 1 2 ( 1 2 − 1 ) ( 1 2 − 2 ) ( − t ) 3 3 ! \sqrt{1-4r_m^2}=\sqrt{1-t}=1+\cfrac{\cfrac{1}{2}(-t)}{1!}+\cfrac{\cfrac{1}{2}(\cfrac{1}{2}-1)(-t)^2}{2!}+\cfrac{\cfrac{1}{2}(\cfrac{1}{2}-1)(\cfrac{1}{2}-2)(-t)^3}{3!} 1−4rm2=1−t=1+1!21(−t)+2!21(21−1)(−t)2+3!21(21−1)(21−2)(−t)3 + 1 2 ( 1 2 − 1 ) ( 1 2 − 2 ) ( 1 2 − 3 ) ( − t ) 4 4 ! + . . . \\\qquad\qquad\qquad\qquad\qquad\quad+\cfrac{\cfrac{1}{2}(\cfrac{1}{2}-1)(\cfrac{1}{2}-2)(\cfrac{1}{2}-3)(-t)^4}{4!}+... +4!21(21−1)(21−2)(21−3)(−t)4+... = 1 − t 2 − t 2 8 − 3 t 3 48 − 15 t 4 384 − . . . \\\qquad\qquad\quad=1-\cfrac{t}{2}-\cfrac{t^2}{8}-\cfrac{3t^3}{48}-\cfrac{15t^4}{384}-...\qquad\qquad \qquad\qquad\quad\ \qquad\ \quad\ \quad =1−2t−8t2−483t3−38415t4−... ②
容易证得, ① − ② ≥ 0 ①-②\ge0 ①−②≥0,即: e x p ( − 2 r m 2 ) ≥ 1 − 4 r m 2 exp(-2r_m^2)\ge \sqrt{1-4r_m^2} exp(−2rm2)≥1−4rm2
因此有: ∏ m = 1 M ( 1 − 4 r m 2 ) \prod\limits_{m=1}^M\sqrt{(1-4r_m^2)} m=1∏M(1−4rm2) ≤ ∏ m = 1 M e x p ( − 2 r m 2 ) = e x p ( − 2 ∑ m = 1 M γ m 2 ) \le\prod\limits_{m=1}^Mexp(-2r_m^2)=exp({-2\sum\limits_{m=1}^M \gamma_m^2}) ≤m=1∏Mexp(−2rm2)=exp(−2m=1∑Mγm2)
证毕。
2.2 Freund与Schapire的paper证法
两位boosting大神的论文《A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting》也给出了误差界的证明方法,其利用了KL散度。
∏ m = 1 M ( 1 − 4 r m 2 ) = e x p ( − ∑ m = 1 M K L ( 1 2 ∣ ∣ 1 2 − r m ) ) \qquad\qquad\prod\limits_{m=1}^M\sqrt{(1-4r_m^2)}=exp(-\sum\limits_{m=1}^MKL(\cfrac{1}{2}||\cfrac{1}{2}-r_m)) m=1∏M(1−4rm2)=exp(−m=1∑MKL(21∣∣21−rm)) ,
这里 K L ( a ∣ ∣ b ) = a ln a b + ( 1 − a ) ln 1 − a 1 − b , a = 1 2 , b = 1 2 − r m KL(a||b)=a \ln\cfrac{a}{b}+(1-a)\ln\cfrac{1-a}{1-b},a=\cfrac{1}{2},b=\cfrac{1}{2}-r_m KL(a∣∣b)=alnba+(1−a)ln1−b1−a,a=21,b=21−rm,因此:
− K L ( 1 2 ∣ ∣ 1 2 − r m ) = 1 2 ln 1 2 − r m 1 2 + 1 2 ln 1 2 + r m 1 2 \qquad-KL(\cfrac{1}{2}||\cfrac{1}{2}-r_m)=\cfrac{1}{2}\ln\cfrac{\cfrac{1}{2}-r_m}{\frac{1}{2}}+\cfrac{1}{2}\ln\cfrac{\cfrac{1}{2}+r_m}{\frac{1}{2}} −KL(21∣∣21−rm)=21ln2121−rm+21ln2121+rm = 1 2 ln ( 1 − 2 r m ) + 1 2 ln ( 1 + 2 r m ) \\ \qquad\qquad\quad\ \qquad\ \quad\ \quad =\cfrac{1}{2}\ln(1-2r_m)+\cfrac{1}{2}\ln(1+2r_m) =21ln(1−2rm)+21ln(1+2rm) = 1 2 ln ( 1 − 4 r m 2 ) \\ \qquad\qquad\quad\ \qquad\ \quad\ \quad =\cfrac{1}{2}\ln(1-4r_m^2) =21ln(1−4rm2)
这里也需要用到泰勒展开:
ln ( 1 − x ) = − x − x 2 2 − x 3 3 . . . − x n n − . . . \qquad\qquad \ln(1-x)=-x-\cfrac{x^2}{2}-\cfrac{x^3}{3}...-\cfrac{x^n}{n}-... ln(1−x)=−x−2x2−3x3...−nxn−...
令 0 ≤ x = 4 r m 2 ≤ 1 0 \le x=4r_m^2\le1 0≤x=4rm2≤1,则有: ln ( 1 − 4 r m 2 ) ≤ − 4 r m 2 \ln(1-4r_m^2)\le-4r_m^2 ln(1−4rm2)≤−4rm2,所以:
∏ m = 1 M ( 1 − 4 r m 2 ) = e x p ( ∑ m = 1 M − K L ( 1 2 ∣ ∣ 1 2 − r m ) ) ≤ e x p ( ∑ m = 1 M 1 2 ⋅ ( − 4 r m 2 ) ) = e x p ( − 2 ∑ m = 1 M r m 2 ) \qquad\qquad\prod\limits_{m=1}^M\sqrt{(1-4r_m^2)}=exp(\sum\limits_{m=1}^M-KL(\cfrac{1}{2}||\cfrac{1}{2}-r_m))\le exp(\sum\limits_{m=1}^M\cfrac{1}{2}\cdot(-4r_m^2))=exp(-2\sum\limits_{m=1}^Mr_m^2) m=1∏M(1−4rm2)=exp(m=1∑M−KL(21∣∣21−rm))≤exp(m=1∑M21⋅(−4rm2))=exp(−2m=1∑Mrm2)
证毕。
参考文献
[1]统计学习方法 第8.2节
[2]A Decision-Theoretic Generalization of On-Line Learning
and an Application to Boosting