1. 对于特征较多的DF,进行数据预处理时需要对每个特征变量进行相关处理,为了避免混乱,可以DF.info()后将输出复制到sublime,然后在sublime中针对每个特征变量进行处理方式标注
# 非python代码,只是为了展示在sublime中的效果
action_type 30697 non-null object[get dummies]
combined_shot_type 30697 non-null object[get dummies]
game_event_id 30697 non-null int64[del]
game_id 30697 non-null int64[del]
lat 30697 non-null float64[remain]
loc_x 30697 non-null int64[del]
loc_y 30697 non-null int64[del]
lon 30697 non-null float64[remain]
minutes_remaining 30697 non-null int64[process then del]
period 30697 non-null int64[remain]
playoffs 30697 non-null int64[remain]
season 30697 non-null object[get dummies]
seconds_remaining 30697 non-null int64[process then del]
shot_distance 30697 non-null int64[remain]
shot_made_flag 25697 non-null float64[tag] # 指该列为标签列
shot_type 30697 non-null object[get dummies]
shot_zone_area 30697 non-null object[del]
shot_zone_basic 30697 non-null object[del]
shot_zone_range 30697 non-null object[del]
team_id 30697 non-null int64[del]
team_name 30697 non-null object[del]
game_date 30697 non-null object[del]
matchup 30697 non-null object[del]
opponent 30697 non-null object[get dummies]
shot_id 30697 non-null int64[del]
2. 对于object类型的数据,通常可以考虑get dummies,转换成数值型数据
3. 对于一些单位不同,但是衡量同一指标的特征变量,可以通过单位换算化成同单位进行合并
4.名称、ID类特征一般直接删除