版权声明:https://blog.csdn.net/thfyshz版权所有 https://blog.csdn.net/thfyshz/article/details/83987651
切片一个数据集:
In [122]: df = pd.DataFrame(data={'Case' : ['A','A','A','B','A','A','B','A','A'],
.....: 'Data' : np.random.randn(9)})
.....:
In [123]: dfs = list(zip(*df.groupby((1*(df['Case']=='B')).cumsum().rolling(window=3,min_periods=1).median())))[-1]
In [124]: dfs[0]
Out[124]:
Case Data
0 A 0.174068
1 A -0.439461
2 A -0.741343
3 B -0.079673
In [125]: dfs[1]
Out[125]:
Case Data
4 A -0.922875
5 A 0.303638
6 B -0.917368
In [126]: dfs[2]
Out[126]:
Case Data
7 A -1.624062
8 A -0.758514
数据透视表:
In [127]: df = pd.DataFrame(data={'Province' : ['ON','QC','BC','AL','AL','MN','ON'],
.....: 'City' : ['Toronto','Montreal','Vancouver','Calgary','Edmonton','Winnipeg','Windsor'],
.....: 'Sales' : [13,6,16,8,4,3,1]})
.....:
In [128]: table = pd.pivot_table(df,values=['Sales'],index=['Province'],columns=['City'],aggfunc=np.sum,margins=True)
In [129]: table.stack('City')
Out[129]:
Sales
Province City
AL All 12.0
Calgary 8.0
Edmonton 4.0
BC All 16.0
Vancouver 16.0
MN All 3.0
Winnipeg 3.0
... ...
All Calgary 8.0
Edmonton 4.0
Montreal 6.0
Toronto 13.0
Vancouver 16.0
Windsor 1.0
Winnipeg 3.0
[20 rows x 1 columns]
In [130]: grades = [48,99,75,80,42,80,72,68,36,78]
In [131]: df = pd.DataFrame( {'ID': ["x%d" % r for r in range(10)],
.....: 'Gender' : ['F', 'M', 'F', 'M', 'F', 'M', 'F', 'M', 'M', 'M'],
.....: 'ExamYear': ['2007','2007','2007','2008','2008','2008','2008','2009','2009','2009'],
.....: 'Class': ['algebra', 'stats', 'bio', 'algebra', 'algebra', 'stats', 'stats', 'algebra', 'bio', 'bio'],
.....: 'Participated': ['yes','yes','yes','yes','no','yes','yes','yes','yes','yes'],
.....: 'Passed': ['yes' if x > 50 else 'no' for x in grades],
.....: 'Employed': [True,True,True,False,False,False,False,True,True,False],
.....: 'Grade': grades})
.....:
In [132]: df.groupby('ExamYear').agg({'Participated': lambda x: x.value_counts()['yes'],
.....: 'Passed': lambda x: sum(x == 'yes'),
.....: 'Employed' : lambda x : sum(x),
.....: 'Grade' : lambda x : sum(x) / len(x)})
.....:
Out[132]:
Participated Passed Employed Grade
ExamYear
2007 3 2 3 74.000000
2008 3 3 0 68.500000
2009 3 2 2 60.666667
以年度数据形式展现:
In [133]: df = pd.DataFrame({'value': np.random.randn(36)},
.....: index=pd.date_range('2011-01-01', freq='M', periods=36))
.....:
In [134]: pd.pivot_table(df, index=df.index.month, columns=df.index.year,
.....: values='value', aggfunc='sum')
.....:
Out[134]:
2011 2012 2013
1 -0.560859 0.120930 0.516870
2 -0.589005 -0.210518 0.343125
3 -1.070678 -0.931184 2.137827
4 -1.681101 0.240647 0.452429
5 0.403776 -0.027462 0.483103
6 0.609862 0.033113 0.061495
7 0.387936 -0.658418 0.240767
8 1.815066 0.324102 0.782413
9 0.705200 -1.403048 0.628462
10 -0.668049 -0.581967 -0.880627
11 0.242501 -1.233862 0.777575
12 0.313421 -3.520876 -0.779367