版权声明:https://blog.csdn.net/thfyshz版权所有 https://blog.csdn.net/thfyshz/article/details/83889291
使用resample和apply函数分别变换:
In [103]: rng = pd.date_range(start="2014-10-07",periods=10,freq='2min')
In [104]: ts = pd.Series(data = list(range(10)), index = rng)
In [105]: def MyCust(x):
.....: if len(x) > 2:
.....: return x[1] * 1.234
# 否则返回一个空值
.....: return pd.NaT
.....:
In [106]: mhc = {'Mean' : np.mean, 'Max' : np.max, 'Custom' : MyCust}
#resample采样
In [107]: ts.resample("5min").apply(mhc)
Out[107]:
Custom 2014-10-07 00:00:00 1.234
2014-10-07 00:05:00 NaT
2014-10-07 00:10:00 7.404
2014-10-07 00:15:00 NaT
Max 2014-10-07 00:00:00 2
2014-10-07 00:05:00 4
2014-10-07 00:10:00 7
2014-10-07 00:15:00 9
Mean 2014-10-07 00:00:00 1
2014-10-07 00:05:00 3.5
2014-10-07 00:10:00 6
2014-10-07 00:15:00 8.5
dtype: object
In [108]: ts
Out[108]:
2014-10-07 00:00:00 0
2014-10-07 00:02:00 1
2014-10-07 00:04:00 2
2014-10-07 00:06:00 3
2014-10-07 00:08:00 4
2014-10-07 00:10:00 5
2014-10-07 00:12:00 6
2014-10-07 00:14:00 7
2014-10-07 00:16:00 8
2014-10-07 00:18:00 9
Freq: 2T, dtype: int64
以某一列的数值长度作为新列
In [109]: df = pd.DataFrame({'Color': 'Red Red Red Blue'.split(),
.....: 'Value': [100, 150, 50, 50]}); df
.....:
Out[109]:
Color Value
0 Red 100
1 Red 150
2 Red 50
3 Blue 50
In [110]: df['Counts'] = df.groupby(['Color']).transform(len)
In [111]: df
Out[111]:
Color Value Counts
0 Red 100 3
1 Red 150 3
2 Red 50 3
3 Blue 50 1