Pipeline的使用例子

Pipelines:组织数据预处理和建模的方法,他捆绑了预处理和建模的步骤,使用Pipeline可以是你的代码更加简洁、直观、不容易出BUG、简单部署、有更多的模型验证选项。

假设你有训练数据验证数据:X_train, X_valid, y_train, and y_valid.这些数据包含有缺失值的变量和类别变量。
下面的代码是使用Pipeline进行预处理和建模的例子。

定义预处理步骤

1.对于数值型的缺失值我们估算(imputes)它(均值、众数、中位数、零等)
2.类别变量中的我们先估算它再用独热编码。

from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder

# Preprocessing for numerical data
numerical_transformer = SimpleImputer(strategy='constant')

# Preprocessing for categorical data
categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='most_frequent')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))
])

# Bundle preprocessing for numerical and categorical data
preprocessor = ColumnTransformer(
    transformers=[
        ('num', numerical_transformer, numerical_cols),
        ('cat', categorical_transformer, categorical_cols)
    ])

定义模型

from sklearn.ensemble import RandomForestRegressor

model = RandomForestRegressor(n_estimators=100, random_state=0)

创建和评价Pipeline

from sklearn.metrics import mean_absolute_error

# Bundle preprocessing and modeling code in a pipeline
my_pipeline = Pipeline(steps=[('preprocessor', preprocessor),
                              ('model', model)
                             ])

# Preprocessing of training data, fit model 
my_pipeline.fit(X_train, y_train)

# Preprocessing of validation data, get predictions
preds = my_pipeline.predict(X_valid)

# Evaluate the model
score = mean_absolute_error(y_valid, preds)
print('MAE:', score)
发布了15 篇原创文章 · 获赞 6 · 访问量 3259

猜你喜欢

转载自blog.csdn.net/supreme_1/article/details/104364350