Xgboost模型在机器学习、深度学习中经久不衰，不论是分类还是回归任务都是一个不错的baseline甚至最终可用的模型，XGB对任务的普适性也决定了其具有大量的可调节参数，针对同一个任务，不同的参数设置可能带来不同甚至相差甚远的性能结果，因为寻找当前任务下可用、有效的参数是一个必不可少的过程，在上一篇文章 XGB系列-XGB参数指南_wwlsm_zql的博客-CSDN博客在运行 XGBoost 之前，我们必须设置三种类型的参数: 通用参数、提升参数和任务参数。本文提供了对XGB模型的全部参数的介绍，用于指导对参数的选择https://blog.csdn.net/wwlsm_zql/article/details/126192959介绍了XGB的所有参数，针对如果繁多的参数，试探枚举是一个非常庞大的工作量，因此本文介绍通过hyperopt实现自动参数寻优，找到适合自己任务的最佳参数。

代码链接：colab代码https://colab.research.google.com/drive/1dm3Bk0VlEuBed8FMMeoZGWUR8xY84Ho9#scrollTo=ILPR3vXWdAvY

安装依赖的包

!pip install xgboost sklearn hyperopt

导入基本库

# 导入基本包
import pandas as pd
import numpy as np
import xgboost as xgb
from sklearn.metrics import accuracy_score
from hyperopt import STATUS_OK, Trials, fmin, hp, tpe
from sklearn.model_selection import train_test_split

加载数据，并拆分

df = pd.read_csv("drive/MyDrive/data_daily/Wholesalecustomersdata.csv")

x = df.drop('Channel', axis=1)
y = df['Channel']
"""将分类任务转换为0-1"""
y[y == 2] = 0
y[y == 1] = 1

"""切分数据集"""
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size = 0.3, random_state = 0)

使用优化器进行参数寻优

定义参数空间，指定参数的所有候选空间
定义训练过程和评估的目标（损失函数）
执行寻优过程
获取最优的参数组合

初始化参数空间

The available hyperopt optimization algorithms are -

hp.choice(label, options) — Returns one of the options, which should be a list or tuple.
hp.randint(label, upper) — Returns a random integer between the range [0, upper).
hp.uniform(label, low, high) — Returns a value uniformly between low and high.
hp.quniform(label, low, high, q) — Returns a value round(uniform(low, high) / q) * q, i.e it rounds the decimal values and returns an integer.
hp.normal(label, mean, std) — Returns a real value that’s normally-distributed with mean and standard deviation sigma.

space={'max_depth': hp.quniform("max_depth", 3, 18, 1),
        'gamma': hp.uniform ('gamma', 1,9),
        'reg_alpha' : hp.quniform('reg_alpha', 40,180,1),
        'reg_lambda' : hp.uniform('reg_lambda', 0,1),
        'colsample_bytree' : hp.uniform('colsample_bytree', 0.5,1),
        'min_child_weight' : hp.quniform('min_child_weight', 0, 10, 1),
        'n_estimators': 180,
        'seed': 0
    }

定义优化目标

def objective(space):
    clf=xgb.XGBClassifier(
                    n_estimators =space['n_estimators'], max_depth = int(space['max_depth']), gamma = space['gamma'],
                    reg_alpha = int(space['reg_alpha']),min_child_weight=int(space['min_child_weight']),
                    colsample_bytree=int(space['colsample_bytree']))
    
    evaluation = [( X_train, y_train), ( X_test, y_test)]
    
    clf.fit(X_train, y_train,
            eval_set=evaluation, eval_metric="auc",
            early_stopping_rounds=10,verbose=False)
    

    pred = clf.predict(X_test)
    accuracy = accuracy_score(y_test, pred>0.5)
    print ("SCORE:", accuracy)
    return {'loss': -accuracy, 'status': STATUS_OK }

寻优过程

trials = Trials()

best_hyperparams = fmin(fn = objective,
                        space = space,
                        algo = tpe.suggest,
                        max_evals = 100,
                        trials = trials)

打印结果

Here best_hyperparams gives us the optimal parameters that best fit model and better loss function value.
trials is an object that contains or stores all the relevant information such as hyperparameter, loss-functions for each set of parameters that the model has been trained.
'fmin' is an optimization function that minimizes the loss function and takes in 4 inputs - fn, space, algo and max_evals.
Algorithm used is tpe.suggest.

print("The best hyperparameters are : ","\n")
print(best_hyperparams)

Xgboost系列-XGB实际参数调优指南附源码