xgb的简单使用（特征选择，重要性图像绘制，分类，预测）

在kaggle等竞赛中时常可以看到xgb的身影。2016年，陈天奇在论文《 XGBoost：A Scalable Tree Boosting System》中正式提出该算法。XGBoost的基本思想和GBDT相同，但是做了一些优化，比如二阶导数使损失函数更精准；正则项避免树过拟合；Block存储可以并行计算等。XGBoost具有高效、灵活和轻便的特点，在数据挖掘、推荐系统等领域得到广泛的应用。在此简单总结一下常用代码。
假设已经下载好了xgb并准备好了train_x, train_y 与 test_x, test_y

分类

import xgboost as xgb
from xgboost import XGBClassifier
from matplotlib import pyplot as plt

model = XGBClassifier()
model.fit(train_x, train_y )

# feature importance
print(model.feature_importances_)

'''
plot_importance 与 feature_importances 可能会出现不一致
这是因为model.feature_importances_的重要性排名默认使用gain，而xgb.plot_importance默认使用weight。
改一下就一样了
plot_importance(model,importance_type='gain')
'''

# plot feature importance
plot_importance(model)
plt.show()

# 预测
y_pred = model.predict(test_x)

预测

import xgboost as xgb
from xgboost import plot_importance

model = xgb.XGBRegressor(max_depth=6, # 可以调节这些参数来改进模型效果
			learning_rate=0.12, 
			n_estimators=90, 
			min_child_weight=6, 
			objective="reg:gamma")
model.fit(x_train, y_train)

特征重要性图像尺寸调整

import xgboost as xgb
from xgboost import plot_importance
from matplotlib import pyplot as plt

fig,ax = plt.subplots(figsize=(10,6))# 调节图像尺寸
plot_importance(model,
                height=0.6,# 调节线宽
                ax = ax,
                max_num_features=64)#调节显示数目
plt.show()

特征重要性中文显示

import xgboost as xgb
from xgboost import plot_importance

# model = xgb.XGBRegressor()  # sklearn接口
# model.fit(xgb_trainx, xgb_trainy)

# 绘图显示中文
#newdf.columns[3:]
feature_names = list(newdf.columns[3:])	# 拿到所有的特征
# 原生接口
dtrain = xgb.DMatrix(xgb_trainx, label=xgb_trainy, feature_names=feature_names)
param = {
    
    }
model = xgb.train(param, dtrain)

fig,ax = plt.subplots(figsize=(10,6))
plot_importance(model,
                height=0.6,# 调节线宽
                #ylabel=ttylab,
                ax = ax,
                max_num_features=10,#调节显示数目
               importance_type='gain')
plt.show()  # 挑选前3个特征

xgb的简单使用（特征选择，重要性图像绘制，分类，预测）

分类

预测

特征重要性图像尺寸调整

特征重要性中文显示

猜你喜欢