from datetime import datetime
import pandas as pd
import numpy as np
一、构建以datetime为索引的Series
dates = [datetime(2018,7,1),datetime(2018,7,3),datetime(2018,7,5),datetime(2018,7,7),datetime(2018,7,9),datetime(2018,7,11)]
ts = pd.Series(np.random.randn(6),index=dates)
ts
2018-07-01 0.578942
2018-07-03 0.465359
2018-07-05 0.037308
2018-07-07 -2.784810
2018-07-09 -0.053657
2018-07-11 -0.421860
dtype: float64
二、索引
stamp = ts.index[2]
ts[stamp]
0.037308214634515072
同一日期的不同写法进行索引会得到同样的结果
print(ts['2018-07-01'])
print(ts['20180701'])
0.578941720925
0.578941720925
通过“年”或“年月”可以轻松切片
longer_ts = pd.Series(np.random.randn(1000),index=pd.date_range('1/1/2000',periods=1000))
longer_ts.shape
(1000,)
longer_ts['2001']
2001-01-01 -0.183849
2001-01-02 0.393347
2001-01-03 -1.777193
2001-01-04 0.113331
2001-01-05 -0.283405
2001-01-06 0.448394
2001-01-07 -0.062456
2001-01-08 1.162382
2001-01-09 0.912657
2001-01-10 -1.054139
2001-01-11 1.639748
2001-01-12 0.026114
2001-01-13 1.526940
2001-01-14 0.590537
2001-01-15 -2.123738
2001-01-16 0.836568
2001-01-17 -0.741981
2001-01-18 0.039363
2001-01-19 -1.251572
2001-01-20 -0.223897
2001-01-21 -1.306646
2001-01-22 0.977636
2001-01-23 -1.874165
2001-01-24 0.074528
2001-01-25 -0.547662
2001-01-26 -1.464749
2001-01-27 -0.400133
2001-01-28 1.082618
2001-01-29 0.370398
2001-01-30 -0.745193
...
2001-12-02 -0.699917
2001-12-03 -0.127025
2001-12-04 0.126134
2001-12-05 0.906386
2001-12-06 0.534549
2001-12-07 0.419908
2001-12-08 1.415131
2001-12-09 -1.601909
2001-12-10 -1.232961
2001-12-11 -0.676176
2001-12-12 -0.714718
2001-12-13 1.143975
2001-12-14 -1.087204
2001-12-15 1.752753
2001-12-16 -3.039599
2001-12-17 -0.597569
2001-12-18 0.055790
2001-12-19 0.379972
2001-12-20 -1.410376
2001-12-21 -2.095945
2001-12-22 -0.035397
2001-12-23 -0.202549
2001-12-24 0.377027
2001-12-25 -0.820194
2001-12-26 -1.138857
2001-12-27 0.491915
2001-12-28 1.188331
2001-12-29 -0.680069
2001-12-30 1.608267
2001-12-31 1.723339
Freq: D, Length: 365, dtype: float64
longer_ts['2001-05']
2001-05-01 -1.440953
2001-05-02 -2.132345
2001-05-03 1.132536
2001-05-04 -0.365506
2001-05-05 0.997308
2001-05-06 0.017255
2001-05-07 1.880290
2001-05-08 0.819983
2001-05-09 1.697819
2001-05-10 -3.067531
2001-05-11 0.637673
2001-05-12 -0.587333
2001-05-13 0.518774
2001-05-14 0.823871
2001-05-15 -0.474210
2001-05-16 -0.746972
2001-05-17 0.822030
2001-05-18 2.103642
2001-05-19 1.074490
2001-05-20 1.012978
2001-05-21 0.324720
2001-05-22 -0.096673
2001-05-23 -0.085382
2001-05-24 1.455619
2001-05-25 0.120917
2001-05-26 -0.639450
2001-05-27 0.804710
2001-05-28 0.721796
2001-05-29 -0.887137
2001-05-30 0.416457
2001-05-31 0.960286
Freq: D, dtype: float64
使用日期进行切片只对Series有效
ts[datetime(2018,7,3):]
2018-07-03 0.465359
2018-07-05 0.037308
2018-07-07 -2.784810
2018-07-09 -0.053657
2018-07-11 -0.421860
dtype: float64
判断重复索引
dates = pd.DatetimeIndex(['7/4/2018','7/5/2018','7/5/2018','7/5/2018','7/6/2018'])
dup_ts = pd.Series(np.arange(5),index=dates)
dup_ts.index.is_unique
False
通过groupby去除重复索引
dup_ts['7/5/2018']
2018-07-05 1
2018-07-05 2
2018-07-05 3
dtype: int32
dup_ts.groupby(level=0).mean()
2018-07-04 0
2018-07-05 2
2018-07-06 4
dtype: int32