数据透视表¶
In [1]:
import pandas as pd
excelample=pd.DataFrame({'Month':["January","January","January","January",
"February", "February","February","February",
"March","March","March","March"],
'Category':["Transportation","Grocery","Household","Entertainment",
"Transportation","Grocery","Household","Entertainment",
"Transportation","Grocery","Household","Entertainment"],
'Amount':[74.,235.,175.,100.,115.,240.,225.,125.,390.,260.,200.,120.]})
In [2]:
excelample
Out[2]:
1.统计指标:每个月的各个种类的花费:pivot
In [3]:
example_pivot=excelample.pivot(index='Category',columns='Month',values='Amount')
example_pivot
Out[3]:
In [4]:
example_pivot.sum(axis=1)#计算每个种类的总和
Out[4]:
In [5]:
example_pivot.sum(axis=0)#每个月的总和
Out[5]:
In [6]:
df=pd.read_csv('./Titanic_Data-master/Titanic_Data-master/train.csv')
df.head()#读取前几行数据
Out[6]:
2.通过性别索引,船舱的等级分类,统计不同性别在不同船舱的费用:pivot_table(默认求平均值)
In [8]:
df.pivot_table(index='Sex',columns='Pclass',values='Fare')#默认求平均值
Out[8]:
In [9]:
df.pivot_table(index='Sex',columns='Pclass',values='Fare',aggfunc='max')#求最大
Out[9]:
In [12]:
df.pivot_table(index='Sex',columns='Pclass',values='Fare',aggfunc='count')#求计数
Out[12]:
In [13]:
pd.crosstab(index=df['Sex'],columns=df['Pclass'])#pd.crosstab和df.pivot_table的count是一样的效果
Out[13]:
3.求不同等级的舱位,不同性别的获救概率
In [14]:
df.pivot_table(index='Pclass',columns='Sex',values='Survived',aggfunc='mean')#求平均值的概率
Out[14]:
4.新加一列,计算未成年的,不同性别的获救情况概率
In [15]:
df['Underaged']=df['Age']<=18#新加一列
df.pivot_table(index='Underaged',columns='Sex',values='Survived',aggfunc='mean')#求平均值的概率
Out[15]: