Day 1 Preprocessing data

https://github.com/MLEveryday/100-Days-Of-ML-Code/blob/master/Code/Day%201_Data_Preprocessing.md

data set:

Country Age Salary Purchased
France 44 72000 No
Spain 27 48000 Yes
Germany 30 54000 No
Spain 38 61000 No
Germany 40   Yes
France 35 58000 Yes
Spain   52000 No
France 48 79000 Yes
Germany 50 83000 No
France 37 67000 Yes

code:

import numpy as np
import pandas as pd
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

path = 'C:/Users/liky/Desktop/100-Days-Of-ML-Code-master/datasets/Data.csv'
dataset = pd.read_csv(path)
X = dataset.iloc[ : , :-1].values
Y = dataset.iloc[ : , 3].values

# 处理丢失数据
imputer = SimpleImputer(missing_values=np.nan, strategy='mean')
imputer.fit(X[ : , 1:3])
X[ : , 1:3] = imputer.transform(X[ : , 1:3])


# 解析分类数据
labelencoder_X = LabelEncoder()
X[ : , 0] = labelencoder_X.fit_transform(X[ : , 0])


# 创建虚拟变量
onehotencoder = OneHotEncoder(categorical_features = [0])
X = onehotencoder.fit_transform(X).toarray()
labelencoder_Y = LabelEncoder()
Y = labelencoder_Y.fit_transform(Y)


# 拆分数据集为训练集合和测试集合
X_train, X_test, Y_train, Y_test = train_test_split( X , Y , test_size = 0.2, random_state = 0)


# 特征量化
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)

猜你喜欢

转载自blog.csdn.net/li_k_y/article/details/86496450