一、标准化
均值为0,标准差为1
from sklearn import preprocessing
scaler = preprocessing.StandardScaler()
scaler.fit_transform(X)
二、归一化
对原始数据进行线性变换,变换到[0,1]区间(也可以是其他固定最小最大值的区间)
from sklearn import preprocessing
scaler = preprocessing.MinMaxScaler(feature_range=(0, 1))
scaler.fit_transform(X)
三、正则化
from sklearn import preprocessing
X_normalized = preprocessing.normalize(X, norm='l2')
X_normalized = preprocessing.normalize(X, norm='l1')
四、one-hot编码
离散特征值的编码方式,类别特征编码
from sklearn import preprocessing
encoder = preprocessing.OneHotEncoder()
encoder.fit_transform(data).toarray()
五、特征二值化
给定阈值,将特征转换为0/1
from sklearn import preprocessing
binarizer = preprocessing.Binarizer(threshold=1.1)
binarizer.transform(X)
六、标签编码
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit_transform(data)