lightGBM dump_model的功能是吧模型转变为json字典,然后就可以对其进行方便的操作,然后有时候会报
JSONDecodeError: Expecting ',' delimiter: line 7 column 95 (char 209)
以下示例本错误并且提出解决办法
from tsfresh import extract_features from tsfresh.feature_extraction import ComprehensiveFCParameters import lightgbm as lgb import pandas as pd import numpy as np
samples_timeserires = np.random.random((50,500)) y = np.random.randint(2,size=500) ts_fc_comprehensive_settings = ComprehensiveFCParameters() df = pd.DataFrame(samples_timeserires) df.loc[:,'col_id'] = 0 X = pd.DataFrame() for i in range(500): timeseries_container = df.loc[:,[i,'col_id']] timeseries_container.columns = [0,'col_id'] statics_feats_position = extract_features(timeseries_container=timeseries_container, column_id='col_id', column_value=0, default_fc_parameters=ts_fc_comprehensive_settings, n_jobs=1, disable_progressbar=True ) X = pd.concat([X,statics_feats_position],axis=0) X.sample()
此时进行模型训练并且尝试dump_model
dtrain = lgb.Dataset(X,y,free_raw_data=False,feature_name='auto', categorical_feature='auto') gbm = lgb.train(params={},num_boost_round=5,train_set=dtrain) gbm.dump_model()
则出现错误如下:
经过测试,造成此错误的原因是因为X这个DataFrame的列名中含有双引号(")引起的,对列名进行重名了
X_rename = X.copy() X_rename.columns = ['col_%03d'%i for i in range(len(X_rename.columns))] dtrain = lgb.Dataset(X_rename,y,free_raw_data=False,feature_name='auto', categorical_feature='auto') gbm = lgb.train(params={},num_boost_round=5,train_set=dtrain) gbm.dump_model()
结果正常了
对产生异常的列名进行打印已验证原因
normal_name = [] anomaly_name = [] for column in X.columns: feats = X.loc[:,column] if len(feats.unique())<4: continue dtrain = lgb.Dataset(pd.DataFrame(feats), y,free_raw_data=False,feature_name='auto', categorical_feature='auto') gbm = lgb.train(params={},num_boost_round=5,train_set=dtrain) try: gbm.dump_model() normal_name.append(column) except: anomaly_name.append(column) print('normal:') print(normal_name) print('\nanomaly:') print(anomaly_name)