XGBoost的以mae作为优化目标探究

1 mae/mad和mse介绍

Mse：mean-square error。

可导，常常作为loss function。

M S E (y, y ˆ) = 1 n s a m p l e s \sum i = 0 n s a m p l e s - 1 (y - y ˆ) 2

$MSE(y,\widehat{y}) = \frac{1}{n_{samples}} \sum_{i=0}^{n_{samples}-1}(y-\widehat{y})^{2}$
Mae：mean absolute error

不可导。

M A E (y, y ˆ) = 1 n s a m p l e s \sum i = 0 n s a m p l e s - 1 ∣ ∣ (y - y ˆ) ∣ ∣

$MAE(y,\widehat{y}) = \frac{1}{n_{samples}} \sum_{i=0}^{n_{samples}-1}\left|(y-\widehat{y})\right|$

2 如何在XGBoost中近似mae

我们都知道XGBoost支持我们自定义目标函数，但是其在实现中对目标函数做了二阶泰勒展开，所以我们需要提供目标函数的一阶和二阶导数。但是MAE并不是连续可导的（在0处不可导），无法直接作为XGBoost的目标函数。所以目前比较好的方法是找到一个函数来近似它。

2.1 Huber loss

在统计学当中，huber loss是鲁棒回归（robust regression）的损失函数，相比于平方损失更不容易受异常点的影响。有一些分类任务有时也会使用。 —Wikipedia

L δ = {1 2 a 2 δ (| a | - 1 2 δ) f o r | a | \leq δ o t h e r w i s e

$\begin{equation} L_{\delta}= \begin{cases} \frac{1}{2}a^{2} &{for \left|a\right| \le \delta}\\ \delta(\left|a\right|-\frac{1}{2}\delta) &{otherwise} \end{cases} \end{equation}$

这个函数对 $a$ 比较小的值是二次的，对比较大的值是线型的。 $a$ 常常代表残差， $a = y - f(x)$ 。

L δ = {1 2 (y - f (x)) 2 δ (∣ ∣ y - f (x)) ∣ ∣ - 1 2 δ) f o r ∣ ∣ y - f (x) ∣ ∣ \leq δ o t h e r w i s e

$\begin{equation} L_{\delta}= \begin{cases} \frac{1}{2}(y-f(x))^{2} &{for \left|y-f(x)\right| \le \delta}\\ \delta(\left|y-f(x))\right|-\frac{1}{2}\delta) &{otherwise} \end{cases} \end{equation}$
在XGBoost的python可以如下实现：

def huber_approx_obj(preds, dtrain):
    d = dtrain.get_labels() - preds #remove .get_labels() for sklearn
    h = 1  #h is delta
    scale = 1 + (d / h) ** 2
    scale_sqrt = np.sqrt(scale)
    grad = d / scale_sqrt
    hess = 1 / scale / scale_sqrt
    return grad, hess

2.2 Fair loss

L c = c | x | - c l n (∣ ∣ | x | + c ∣ ∣)

$L_{c} = c\left|x\right|-cln(\left|\left|x\right|+c\right|)$

在XGBoost的python实现如下：

def fair_obj(preds, dtrain):
    """y = c * abs(x) - c * np.log(abs(abs(x) + c))"""
    x = dtrain.get_labels() - preds
    c = 1
    den = abs(x) + c
    grad = c*x / den
    hess = c*c / den ** 2
    return grad, hess

2.3 Log-Cosh loss

L = l n (c o s h (x))

$L= ln(cosh(x))$

在XGBoost中的python实现如下：

def log_cosh_obj(preds, dtrain):
    x = dtrain.get_labels() - preds
    grad = np.tanh(x)
    hess = 1 / np.cosh(x)**2
    return grad, hess

2.4 对比

动态图对比了 $mae, mse, fair loss, log-cosh loss$ 在图像上的差异。可以看到 $mse$ 图像与 $mae$ 在x值较大时差别较大， $log-cosh$ 在一部分区域内与 $mse$ 类似，但是在之后会出现断崖式地改变，只有 $fair loss$ 一直与 $mae$ 图像吻合， $huber loss$ 因为分段函数的问题暂时没有体现在图像上。

3 理论分析

@ To do

4 References

1.Xgboost-How to use “mae” as objective function?