import pandas as pd
import numpy as np
from pandas import Series,DataFrame
df = DataFrame({'key1':['a','a','b','b','a'],
'key2':['one','two','one','two','one'],
'data1':np.random.randn(5),
'data2':np.random.randn(5)})
print(df)
data1 data2 key1 key2
0 0.001573 1.348387 a one
1 -0.423522 0.884686 a two
2 -0.942151 -0.672910 b one
3 -0.720580 0.724431 b two
4 0.262283 -0.251035 a one
1.GroupBy对象支持迭代,可以产生一组二元元组(由分组名和数据块组成)
for name,group in df.groupby('key1'):
print(name)
print(group,end='\n-------------------------------\n')
a
data1 data2 key1 key2
0 0.001573 1.348387 a one
1 -0.423522 0.884686 a two
4 0.262283 -0.251035 a one
-------------------------------
b
data1 data2 key1 key2
2 -0.942151 -0.672910 b one
3 -0.720580 0.724431 b two
-------------------------------
2.多重键分组进行迭代
for (key1,key2),group in df.groupby(['key1','key2']):
print(key1,key2)
print(group,end='\n-------------------------------\n')
a one
data1 data2 key1 key2
0 0.001573 1.348387 a one
4 0.262283 -0.251035 a one
-------------------------------
a two
data1 data2 key1 key2
1 -0.423522 0.884686 a two
-------------------------------
b one
data1 data2 key1 key2
2 -0.942151 -0.67291 b one
-------------------------------
b two
data1 data2 key1 key2
3 -0.72058 0.724431 b two
-------------------------------
3.将GroupBy对象转换为字典
分组键就是字典的键,分组就是字典的值
pieces = dict(list(df.groupby('key1')))
print(pieces['b'])
data1 data2 key1 key2
2 -0.942151 -0.672910 b one
3 -0.720580 0.724431 b two
4.按列的类型(dtype)进行分组
grouped = df.groupby(df.dtypes,axis=1)
dict(list(grouped))
{dtype('float64'): data1 data2
0 0.001573 1.348387
1 -0.423522 0.884686
2 -0.942151 -0.672910
3 -0.720580 0.724431
4 0.262283 -0.251035, dtype('O'): key1 key2
0 a one
1 a two
2 b one
3 b two
4 a one}