在对数据进行分析时,如果某一列的值较为分散,那么在画柱状图或者分析时,我们大多会对这列数据处理,处理后再进行分析。这就涉及到对DataFrame的列的处理。
看以下的场景,处理前:
df_1 = df[(df['country']== 1)]
sns.barplot(x = df_1['hotel_score'], y = df_1['uv'])
plt.show() # 图3
图为:很难看清横轴的标签
下面对数据进行处理:
# 处理评分的区间
def dev_hotel_score(hotel_score):
if hotel_score<=1:
result = 1
elif hotel_score<=1.5:
result = 1.5
elif hotel_score<=2:
result = 2
elif hotel_score<=2.5:
result = 2.5
elif hotel_score<=3:
result = 3
elif hotel_score<=3.5:
result = 3.5
elif hotel_score<=4:
result = 4
elif hotel_score<=4.5:
result = 4.5
elif hotel_score<=5:
result = 5
else:
result = hotel_score
return result
df_1 = df[(df['country']== 1)]
sns.barplot(x = df_1['hotel_score'].map(lambda x:dev_hotel_score(x)), y = df_1['uv'])
plt.show() # 图3
可以看出下面画出的图,就更加直观了。
上面对列的处理,我用到了map-lambda的组合,在很多地方都是很实用的,有机会还是要多锻炼使用。