文章目录
时间日期
- 时间戳 tiimestamp:固定的时刻 -> pd.Timestamp
- 固定时期 period:比如 2016年3月份,再如2015年销售额 -> pd.Period
- 时间间隔 interval:由起始时间和结束时间来表示,固定时期是时间间隔的一个特殊
时间日期在 Pandas 里的作用
- 分析金融数据,如股票交易数据
- 分析服务器日志
import pandas as pd
import numpy as np
from datetime import datetime
from datetime import timedelta
时间差
date1 = datetime(2016, 3, 20)
date2 = datetime(2016, 3, 16)
delta = date1 - date2
delta
[Out:]
datetime.timedelta(4)
delta.days
[Out:]
4
delta.total_seconds()
[Out:]
345600.0
date2 + delta
[Out:]
datetime.datetime(2016, 3, 20, 0, 0)
date2 + timedelta(4.5)
[Out:]
datetime.datetime(2016, 3, 20, 12, 0)
字符串和 datetime 转换
关于 datetime 格式定义,可以参阅 python 官方文档
date = datetime(2016, 3, 20, 8, 30)
date
[Out:]
datetime.datetime(2016, 3, 20, 8, 30)
str(date)
[Out:]
'2016-03-20 08:30:00'
date.strftime('%Y-%m-%d %H:%M:%S')
[Out:]
'2016-03-20 08:30:00'
datetime.strptime('2016-03-20 09:30', '%Y-%m-%d %H:%M')
[Out:]
datetime.datetime(2016, 3, 20, 9, 30)
Pandas 里的时间序列
Pandas 里使用 Timestamp 来表达时间
dates = [datetime(2016, 3, 1), datetime(2016, 3, 2), datetime(2016, 3, 3), datetime(2016, 3, 4)]
s = pd.Series(np.random.randn(4), index=dates)
s
[Out:]
2016-03-01 1.650889
2016-03-02 -0.328463
2016-03-03 1.674872
2016-03-04 -0.310849
dtype: float64
type(s.index)
[Out:]
pandas.tseries.index.DatetimeIndex
type(s.index[0])
[Out:]
pandas.tslib.Timestamp
日期范围
生成日期范围
pd.date_range('20160320', '20160331')
[Out:]
DatetimeIndex(['2016-03-20', '2016-03-21', '2016-03-22', '2016-03-23',
'2016-03-24', '2016-03-25', '2016-03-26', '2016-03-27',
'2016-03-28', '2016-03-29', '2016-03-30', '2016-03-31'],
dtype='datetime64[ns]', freq='D')
pd.date_range(start='20160320', periods=10)
[Out:]
DatetimeIndex(['2016-03-20', '2016-03-21', '2016-03-22', '2016-03-23',
'2016-03-24', '2016-03-25', '2016-03-26', '2016-03-27',
'2016-03-28', '2016-03-29'],
dtype='datetime64[ns]', freq='D')
## 规则化时间戳
pd.date_range(start='2016-03-20 16:23:32', periods=10, normalize=True)
[Out:]
DatetimeIndex(['2016-03-20', '2016-03-21', '2016-03-22', '2016-03-23',
'2016-03-24', '2016-03-25', '2016-03-26', '2016-03-27',
'2016-03-28', '2016-03-29'],
dtype='datetime64[ns]', freq='D')
时间频率
## 星期
pd.date_range(start='20160320', periods=10, freq='W')
[Out:]
DatetimeIndex(['2016-03-20', '2016-03-27', '2016-04-03', '2016-04-10',
'2016-04-17', '2016-04-24', '2016-05-01', '2016-05-08',
'2016-05-15', '2016-05-22'],
dtype='datetime64[ns]', freq='W-SUN')
# 月
pd.date_range(start='20160320', periods=10, freq='M')
[Out:]
DatetimeIndex(['2016-03-31', '2016-04-30', '2016-05-31', '2016-06-30',
'2016-07-31', '2016-08-31', '2016-09-30', '2016-10-31',
'2016-11-30', '2016-12-31'],
dtype='datetime64[ns]', freq='M')
## 每个月最后一个工作日组成的索引
pd.date_range(start='20160320', periods=10, freq='BM')
[Out:]
DatetimeIndex(['2016-03-31', '2016-04-29', '2016-05-31', '2016-06-30',
'2016-07-29', '2016-08-31', '2016-09-30', '2016-10-31',
'2016-11-30', '2016-12-30'],
dtype='datetime64[ns]', freq='BM')
# 小时
pd.date_range(start='20160320', periods=10, freq='4H')
[Out:]
DatetimeIndex(['2016-03-20 00:00:00', '2016-03-20 04:00:00',
'2016-03-20 08:00:00', '2016-03-20 12:00:00',
'2016-03-20 16:00:00', '2016-03-20 20:00:00',
'2016-03-21 00:00:00', '2016-03-21 04:00:00',
'2016-03-21 08:00:00', '2016-03-21 12:00:00'],
dtype='datetime64[ns]', freq='4H')
时期及算术运算
- pd.Period 表示时期,比如几日,月或几个月等。比如用来统计每个月的销售额,就可以用时期作为单位。
p1 = pd.Period(2010)
p1
[Out:]
Period('2010', 'A-DEC')
p2 = p1 + 2
p2
[Out:]
Period('2012', 'A-DEC')
p2 - p1
[Out:]
2L
p1 = pd.Period(2016, freq='M')
p1
[Out:]
Period('2016-01', 'M')
p1 + 3
[Out:]
Period('2016-04', 'M')
时期序列
pd.period_range(start='2016-01', periods=12, freq='M')
[Out:]
PeriodIndex(['2016-01', '2016-02', '2016-03', '2016-04', '2016-05', '2016-06',
'2016-07', '2016-08', '2016-09', '2016-10', '2016-11', '2016-12'],
dtype='int64', freq='M')
pd.period_range(start='2016-01', end='2016-10', freq='M')
[Out:]
PeriodIndex(['2016-01', '2016-02', '2016-03', '2016-04', '2016-05', '2016-06',
'2016-07', '2016-08', '2016-09', '2016-10'],
dtype='int64', freq='M')
# 直接用字符串
index = pd.PeriodIndex(['2016Q1', '2016Q2', '2016Q3'], freq='Q-DEC')
index
[Out:]
PeriodIndex(['2016Q1', '2016Q2', '2016Q3'], dtype='int64', freq='Q-DEC')
时期的频率转换
asfreq
- A-DEC: 以 12 月份作为结束的年时期
- A-NOV: 以 11 月份作为结束的年时期
- Q-DEC: 以 12 月份作为结束的季度时期
p = pd.Period('2016', freq='A-DEC')
p
[Out:]
Period('2016', 'A-DEC')
p.asfreq('M', how='start')
[Out:]
Period('2016-01', 'M')
p.asfreq('M', how='end')
[Out:]
Period('2016-12', 'M')
p = pd.Period('2016-04', freq='M')
p
[Out:]
Period('2016-04', 'M')
p.asfreq('A-DEC')
[Out:]
Period('2016', 'A-DEC')
# 以年为周期,以一年中的 3 月份作为年的结束(财年)
p.asfreq('A-MAR')
[Out:]
Period('2017', 'A-MAR')
季度时间频率
- Pandas 支持 12 种季度型频率,从 Q-JAN 到 Q-DEC
p = pd.Period('2016Q4', 'Q-JAN')
p
[Out:]
Period('2016Q4', 'Q-JAN')
# 以 1 月份结束的财年中,2016Q4 的时期是指 2015-11-1 到 2016-1-31
p.asfreq('D', how='start'), p.asfreq('D', how='end')
[Out:]
(Period('2015-11-01', 'D'), Period('2016-01-31', 'D'))
# 获取该季度倒数第二个工作日下午4点的时间戳
p4pm = (p.asfreq('B', how='end') - 1).asfreq('T', 'start') + 16 * 60
p4pm
[Out:]
Period('2016-01-28 16:00', 'T')
# 转换为 timestamp
p4pm.to_timestamp()
[Out:]
Timestamp('2016-01-28 16:00:00')
Timestamp 和 Period 相互转换
ts = pd.Series(np.random.randn(5), index = pd.date_range('2016-01-01', periods=5, freq='M'))
ts
[Out:]
2016-01-31 -0.773323
2016-02-29 0.215953
2016-03-31 1.301631
2016-04-30 -0.066134
2016-05-31 1.651792
Freq: M, dtype: float64
ts.to_period()
[Out:]
2016-01 -0.773323
2016-02 0.215953
2016-03 1.301631
2016-04 -0.066134
2016-05 1.651792
Freq: M, dtype: float64
ts = pd.Series(np.random.randn(5), index = pd.date_range('2016-12-29', periods=5, freq='D'))
ts
[Out:]
2016-12-29 -0.110462
2016-12-30 -0.265792
2016-12-31 -0.382456
2017-01-01 -0.036111
2017-01-02 -1.029658
Freq: D, dtype: float64
pts = ts.to_period(freq='M')
pts
[Out:]
2016-12 -0.110462
2016-12 -0.265792
2016-12 -0.382456
2017-01 -0.036111
2017-01 -1.029658
Freq: M, dtype: float64
pts.groupby(level=0).sum()
[Out:]
2016-12 -0.758711
2017-01 -1.065769
Freq: M, dtype: float64
# 转换为时间戳时,细部时间会丢失
pts.to_timestamp(how='end')
[Out:]
2016-12-31 -0.110462
2016-12-31 -0.265792
2016-12-31 -0.382456
2017-01-31 -0.036111
2017-01-31 -1.029658
dtype: float64
重采样
- 高频率 -> 低频率 -> 降采样:5 分钟股票交易数据转换为日交易数据
- 低频率 -> 高频率 -> 升采样
- 其他重采样:每周三 (W-WED) 转换为每周五 (W-FRI)
ts = pd.Series(np.random.randint(0, 50, 60), index=pd.date_range('2016-04-25 09:30', periods=60, freq='T'))
ts
[Out:]
2016-04-25 09:30:00 18
2016-04-25 09:31:00 41
2016-04-25 09:32:00 49
2016-04-25 09:33:00 26
2016-04-25 09:34:00 5
2016-04-25 09:35:00 12
2016-04-25 09:36:00 6
2016-04-25 09:37:00 47
2016-04-25 09:38:00 16
2016-04-25 09:39:00 37
2016-04-25 09:40:00 44
2016-04-25 09:41:00 8
2016-04-25 09:42:00 22
2016-04-25 09:43:00 24
2016-04-25 09:44:00 12
2016-04-25 09:45:00 26
2016-04-25 09:46:00 30
2016-04-25 09:47:00 38
2016-04-25 09:48:00 5
2016-04-25 09:49:00 26
2016-04-25 09:50:00 39
2016-04-25 09:51:00 7
2016-04-25 09:52:00 6
2016-04-25 09:53:00 12
2016-04-25 09:54:00 24
2016-04-25 09:55:00 0
2016-04-25 09:56:00 12
2016-04-25 09:57:00 27
2016-04-25 09:58:00 10
2016-04-25 09:59:00 26
2016-04-25 10:00:00 27
2016-04-25 10:01:00 18
2016-04-25 10:02:00 27
2016-04-25 10:03:00 25
2016-04-25 10:04:00 25
2016-04-25 10:05:00 35
2016-04-25 10:06:00 28
2016-04-25 10:07:00 3
2016-04-25 10:08:00 20
2016-04-25 10:09:00 48
2016-04-25 10:10:00 5
2016-04-25 10:11:00 48
2016-04-25 10:12:00 30
2016-04-25 10:13:00 2
2016-04-25 10:14:00 11
2016-04-25 10:15:00 18
2016-04-25 10:16:00 21
2016-04-25 10:17:00 32
2016-04-25 10:18:00 43
2016-04-25 10:19:00 10
2016-04-25 10:20:00 5
2016-04-25 10:21:00 45
2016-04-25 10:22:00 3
2016-04-25 10:23:00 30
2016-04-25 10:24:00 3
2016-04-25 10:25:00 24
2016-04-25 10:26:00 46
2016-04-25 10:27:00 2
2016-04-25 10:28:00 33
2016-04-25 10:29:00 25
Freq: T, dtype: int32
# 0-4 分钟为第一组
ts.resample('5min', how='sum')
[Out:]
2016-04-25 09:30:00 139
2016-04-25 09:35:00 118
2016-04-25 09:40:00 110
2016-04-25 09:45:00 125
2016-04-25 09:50:00 88
2016-04-25 09:55:00 75
2016-04-25 10:00:00 122
2016-04-25 10:05:00 134
2016-04-25 10:10:00 96
2016-04-25 10:15:00 124
2016-04-25 10:20:00 86
2016-04-25 10:25:00 130
Freq: 5T, dtype: int32
# 0-4 分钟为第一组
ts.resample('5min', how='sum', label='right'
[Out:]
2016-04-25 09:35:00 139
2016-04-25 09:40:00 118
2016-04-25 09:45:00 110
2016-04-25 09:50:00 125
2016-04-25 09:55:00 88
2016-04-25 10:00:00 75
2016-04-25 10:05:00 122
2016-04-25 10:10:00 134
2016-04-25 10:15:00 96
2016-04-25 10:20:00 124
2016-04-25 10:25:00 86
2016-04-25 10:30:00 130
Freq: 5T, dtype: int32
OHLC 重采样
金融数据专用:Open/High/Low/Close
ts.resample('5min', how='ohlc')
[Out:]
open high low close
2016-04-25 09:30:00 18 49 5 5
2016-04-25 09:35:00 12 47 6 37
2016-04-25 09:40:00 44 44 8 12
2016-04-25 09:45:00 26 38 5 26
2016-04-25 09:50:00 39 39 6 24
2016-04-25 09:55:00 0 27 0 26
2016-04-25 10:00:00 27 27 18 25
2016-04-25 10:05:00 35 48 3 48
2016-04-25 10:10:00 5 48 2 11
2016-04-25 10:15:00 18 43 10 10
2016-04-25 10:20:00 5 45 3 3
2016-04-25 10:25:00 24 46 2 25
### 通过 groupby 重采样
ts = pd.Series(np.random.randint(0, 50, 100), index=pd.date_range('2016-03-01', periods=100, freq='D'))
ts
[Out:]
2016-03-01 13
2016-03-02 21
2016-03-03 26
2016-03-04 3
2016-03-05 31
2016-03-06 29
2016-03-07 42
2016-03-08 24
2016-03-09 10
2016-03-10 42
2016-03-11 42
2016-03-12 7
2016-03-13 10
2016-03-14 48
2016-03-15 12
2016-03-16 15
2016-03-17 16
2016-03-18 34
2016-03-19 45
2016-03-20 40
2016-03-21 45
2016-03-22 46
2016-03-23 21
2016-03-24 27
2016-03-25 10
2016-03-26 47
2016-03-27 8
2016-03-28 9
2016-03-29 0
2016-03-30 20
..
2016-05-10 38
2016-05-11 46
2016-05-12 8
2016-05-13 15
2016-05-14 13
2016-05-15 30
2016-05-16 25
2016-05-17 15
2016-05-18 3
2016-05-19 5
2016-05-20 21
2016-05-21 18
2016-05-22 11
2016-05-23 47
2016-05-24 14
2016-05-25 33
2016-05-26 37
2016-05-27 40
2016-05-28 5
2016-05-29 27
2016-05-30 2
2016-05-31 31
2016-06-01 31
2016-06-02 41
2016-06-03 28
2016-06-04 2
2016-06-05 21
2016-06-06 10
2016-06-07 21
2016-06-08 18
Freq: D, dtype: int32
ts.groupby(lambda x: x.month).sum()
[Out:]
3 759
4 648
5 748
6 172
dtype: int32
ts.groupby(ts.index.to_period('M')).sum()
[Out:]
2016-03 759
2016-04 648
2016-05 748
2016-06 172
Freq: M, dtype: int32
升采样和插值
# 以周为单位,每周五采样
df = pd.DataFrame(np.random.randint(1, 50, 2), index=pd.date_range('2016-04-22', periods=2, freq='W-FRI'))
df
[Out:]
0
2016-04-22 10
2016-04-29 6
df.resample('D')
[Out:]
0
2016-04-22 10
2016-04-23 NaN
2016-04-24 NaN
2016-04-25 NaN
2016-04-26 NaN
2016-04-27 NaN
2016-04-28 NaN
2016-04-29 6
df.resample('D', fill_method='ffill', limit=3)
[Out:]
0
2016-04-22 10
2016-04-23 10
2016-04-24 10
2016-04-25 10
2016-04-26 NaN
2016-04-27 NaN
2016-04-28 NaN
2016-04-29 6
# 以周为单位,每周一采样
df.resample('W-MON', fill_method='ffill')
[Out:]
0
2016-04-25 10
2016-05-02 6
时期重采样
df = pd.DataFrame(np.random.randint(2, 30, (24, 4)),
index=pd.period_range('2015-01', '2016-12', freq='M'),
columns=list('ABCD'))
df
[Out:]
A B C D
2015-01 20 7 22 18
2015-02 2 28 21 19
2015-03 13 17 12 7
2015-04 24 17 20 14
2015-05 15 13 15 20
2015-06 19 28 2 22
2015-07 20 7 2 27
2015-08 10 18 2 16
2015-09 17 24 11 9
2015-10 23 2 21 25
2015-11 24 3 19 8
2015-12 7 16 6 12
2016-01 18 13 8 15
2016-02 17 14 2 21
2016-03 17 6 5 24
2016-04 24 14 22 14
2016-05 16 14 20 14
2016-06 26 29 14 15
2016-07 2 11 11 2
2016-08 12 11 17 18
2016-09 19 21 4 16
2016-10 21 16 11 7
2016-11 16 23 2 22
2016-12 21 9 27 11
adf = df.resample('A-DEC', how='mean')
adf
[Out:]
A B C D
2015 16.166667 15.000000 12.750000 16.416667
2016 17.416667 15.083333 11.916667 14.916667
df.resample('A-MAY', how='mean')
[Out:]
A B C D
2015 14.800000 16.400000 18.000000 15.60
2016 17.666667 13.250000 10.000000 17.25
2017 16.714286 17.142857 12.285714 13.00
# 升采样
adf.resample('Q-DEC')
[Out:]
A B C D
2015Q1 16.166667 15.000000 12.750000 16.416667
2015Q2 NaN NaN NaN NaN
2015Q3 NaN NaN NaN NaN
2015Q4 NaN NaN NaN NaN
2016Q1 17.416667 15.083333 11.916667 14.916667
2016Q2 NaN NaN NaN NaN
2016Q3 NaN NaN NaN NaN
2016Q4 NaN NaN NaN NaN
adf.resample('Q-DEC', fill_method='ffill')
[Out:]
A B C D
2015Q1 16.166667 15.000000 12.750000 16.416667
2015Q2 16.166667 15.000000 12.750000 16.416667
2015Q3 16.166667 15.000000 12.750000 16.416667
2015Q4 16.166667 15.000000 12.750000 16.416667
2016Q1 17.416667 15.083333 11.916667 14.916667
2016Q2 17.416667 15.083333 11.916667 14.916667
2016Q3 17.416667 15.083333 11.916667 14.916667
2016Q4 17.416667 15.083333 11.916667 14.916667
性能
n = 1000000
ts = pd.Series(np.random.randn(n),
index=pd.date_range('2000-01-01', periods=n, freq='10ms'))
len(ts)
[Out:]
1000000
%timeit ts.resample('10min', how='ohlc')
# out=> 10 loops, best of 3: 21.9 ms per loop
ts.resample('D', how='ohlc')
[Out:]
open high low close
2000-01-01 1.161091 4.551988 -4.660681 -0.406231
从文件中读取日期序列
数据在这:练习数据下载,也可以不用,自己用循环生成一些就好
df = pd.read_csv('data/002001.csv', index_col='Date')
df
[Out:]
Open High Low Close Volume Adj Close
Date
2015-12-22 16.86 17.13 16.48 16.95 13519900 16.95
2015-12-21 16.31 17.00 16.20 16.85 14132200 16.85
2015-12-18 16.59 16.70 16.21 16.31 10524300 16.31
2015-12-17 16.28 16.75 16.16 16.60 12326500 16.60
2015-12-16 16.23 16.42 16.05 16.28 8026000 16.28
2015-12-15 16.06 16.31 15.95 16.18 6647500 16.18
2015-12-14 15.60 16.06 15.45 16.06 8355200 16.06
2015-12-11 15.50 15.80 15.41 15.62 7243500 15.62
2015-12-10 15.99 16.05 15.51 15.56 7654900 15.56
2015-12-09 16.00 16.19 15.80 15.83 7926900 15.83
2015-12-08 16.54 16.55 16.01 16.05 7640100 16.05
2015-12-07 16.50 17.04 16.48 16.63 11917200 16.63
2015-12-04 16.13 16.85 16.01 16.62 14011100 16.62
2015-12-03 15.97 16.34 15.88 16.21 9504000 16.21
2015-12-02 15.89 16.04 15.50 15.88 11229600 15.88
2015-12-01 15.67 15.96 15.50 15.85 7192200 15.85
2015-11-30 15.54 15.90 15.05 15.70 11615200 15.70
2015-11-27 16.61 16.99 15.10 15.54 15177000 15.54
2015-11-26 16.98 17.22 16.62 16.78 13196300 16.78
2015-11-25 16.15 17.04 16.03 16.94 18600100 16.94
2015-11-24 15.90 16.20 15.70 16.15 8561200 16.15
2015-11-23 16.09 16.32 16.00 16.05 9441700 16.05
2015-11-20 15.96 16.17 15.81 16.08 8022200 16.08
2015-11-19 15.75 16.05 15.71 16.02 5193300 16.02
2015-11-18 16.26 16.30 15.72 15.75 7318500 15.75
2015-11-17 16.41 16.47 16.11 16.22 11479800 16.22
2015-11-16 15.70 16.22 15.61 16.21 9083200 16.21
2015-11-13 16.36 16.47 15.90 15.95 12924400 15.95
2015-11-12 16.23 16.92 16.00 16.59 16492800 16.59
2015-11-11 16.16 16.28 15.81 16.22 15661900 16.22
2015-11-10 16.29 16.69 16.04 16.15 21457600 16.15
2015-11-09 15.70 16.29 15.56 16.02 20842600 16.02
2015-11-06 15.53 16.01 15.41 15.86 17735800 15.86
2015-11-05 15.33 15.79 15.21 15.52 19051400 15.52
2015-11-04 14.65 15.35 14.65 15.33 14578200 15.33
2015-11-03 14.84 14.96 14.44 14.62 6576300 14.62
2015-11-02 14.91 15.18 14.74 14.74 9487800 14.74
2015-10-30 15.25 15.52 14.81 15.22 12908500 15.22
2015-10-29 15.01 15.36 14.96 15.30 11177100 15.30
2015-10-28 15.14 15.50 14.96 15.02 11373200 15.02
2015-10-27 15.10 15.17 14.51 15.15 12950400 15.15
2015-10-26 15.41 15.55 14.87 15.18 15844500 15.18
2015-10-23 14.80 15.23 14.75 15.20 14769000 15.20
2015-10-22 14.28 14.82 14.25 14.73 10428900 14.73
2015-10-21 15.24 15.70 14.08 14.26 21113500 14.26
2015-10-20 14.99 15.24 14.89 15.22 11935800 15.22
2015-10-19 15.27 15.35 14.85 15.03 11601300 15.03
2015-10-16 15.23 15.35 14.82 15.25 14168700 15.25
2015-10-15 14.73 15.15 14.60 15.12 11177700 15.12
2015-10-14 14.99 15.12 14.72 14.73 10368900 14.73
2015-10-13 15.02 15.19 14.85 15.07 13408200 15.07
2015-10-12 14.63 15.43 14.41 15.30 24110800 15.30
2015-10-09 14.50 14.79 14.11 14.62 23818500 14.62
2015-10-08 14.75 14.75 14.65 14.75 18317200 14.75
2015-10-07 13.41 13.41 13.41 13.41 0 13.41
2015-10-06 13.41 13.41 13.41 13.41 0 13.41
2015-10-05 13.41 13.41 13.41 13.41 0 13.41
2015-10-02 13.41 13.41 13.41 13.41 0 13.41
2015-10-01 13.41 13.41 13.41 13.41 0 13.41
df.index
[Out:]
Index([u'2015-12-22', u'2015-12-21', u'2015-12-18', u'2015-12-17',
u'2015-12-16', u'2015-12-15', u'2015-12-14', u'2015-12-11',
u'2015-12-10', u'2015-12-09', u'2015-12-08', u'2015-12-07',
u'2015-12-04', u'2015-12-03', u'2015-12-02', u'2015-12-01',
u'2015-11-30', u'2015-11-27', u'2015-11-26', u'2015-11-25',
u'2015-11-24', u'2015-11-23', u'2015-11-20', u'2015-11-19',
u'2015-11-18', u'2015-11-17', u'2015-11-16', u'2015-11-13',
u'2015-11-12', u'2015-11-11', u'2015-11-10', u'2015-11-09',
u'2015-11-06', u'2015-11-05', u'2015-11-04', u'2015-11-03',
u'2015-11-02', u'2015-10-30', u'2015-10-29', u'2015-10-28',
u'2015-10-27', u'2015-10-26', u'2015-10-23', u'2015-10-22',
u'2015-10-21', u'2015-10-20', u'2015-10-19', u'2015-10-16',
u'2015-10-15', u'2015-10-14', u'2015-10-13', u'2015-10-12',
u'2015-10-09', u'2015-10-08', u'2015-10-07', u'2015-10-06',
u'2015-10-05', u'2015-10-02', u'2015-10-01'],
dtype='object', name=u'Date')
df = pd.read_csv('data/002001.csv', index_col='Date', parse_dates=True)
df.index
[Out:]
DatetimeIndex(['2015-12-22', '2015-12-21', '2015-12-18', '2015-12-17',
'2015-12-16', '2015-12-15', '2015-12-14', '2015-12-11',
'2015-12-10', '2015-12-09', '2015-12-08', '2015-12-07',
'2015-12-04', '2015-12-03', '2015-12-02', '2015-12-01',
'2015-11-30', '2015-11-27', '2015-11-26', '2015-11-25',
'2015-11-24', '2015-11-23', '2015-11-20', '2015-11-19',
'2015-11-18', '2015-11-17', '2015-11-16', '2015-11-13',
'2015-11-12', '2015-11-11', '2015-11-10', '2015-11-09',
'2015-11-06', '2015-11-05', '2015-11-04', '2015-11-03',
'2015-11-02', '2015-10-30', '2015-10-29', '2015-10-28',
'2015-10-27', '2015-10-26', '2015-10-23', '2015-10-22',
'2015-10-21', '2015-10-20', '2015-10-19', '2015-10-16',
'2015-10-15', '2015-10-14', '2015-10-13', '2015-10-12',
'2015-10-09', '2015-10-08', '2015-10-07', '2015-10-06',
'2015-10-05', '2015-10-02', '2015-10-01'],
dtype='datetime64[ns]', name=u'Date', freq=None)
wdf = df['Adj Close'].resample('W-FRI', how='ohlc')
wdf
[Out:]
open high low close
Date
2015-10-02 13.41 13.41 13.41 13.41
2015-10-09 13.41 14.75 13.41 14.62
2015-10-16 15.30 15.30 14.73 15.25
2015-10-23 15.03 15.22 14.26 15.20
2015-10-30 15.18 15.30 15.02 15.22
2015-11-06 14.74 15.86 14.62 15.86
2015-11-13 16.02 16.59 15.95 15.95
2015-11-20 16.21 16.22 15.75 16.08
2015-11-27 16.05 16.94 15.54 15.54
2015-12-04 15.70 16.62 15.70 16.62
2015-12-11 16.63 16.63 15.56 15.62
2015-12-18 16.06 16.60 16.06 16.31
2015-12-25 16.85 16.95 16.85 16.95
wdf['Volume'] = df['Volume'].resample('W-FRI', how='sum')
wdf
[Out:]
open high low close Volume
Date
2015-10-02 13.41 13.41 13.41 13.41 0
2015-10-09 13.41 14.75 13.41 14.62 42135700
2015-10-16 15.30 15.30 14.73 15.25 73234300
2015-10-23 15.03 15.22 14.26 15.20 69848500
2015-10-30 15.18 15.30 15.02 15.22 64253700
2015-11-06 14.74 15.86 14.62 15.86 67429500
2015-11-13 16.02 16.59 15.95 15.95 87379300
2015-11-20 16.21 16.22 15.75 16.08 41097000
2015-11-27 16.05 16.94 15.54 15.54 64976300
2015-12-04 15.70 16.62 15.70 16.62 53552100
2015-12-11 16.63 16.63 15.56 15.62 42382600
2015-12-18 16.06 16.60 16.06 16.31 45879500
2015-12-25 16.85 16.95 16.85 16.95 27652100
自定义时间日期解析函数
def date_parser(s):
s = '2016/' + s
d = datetime.strptime(s, '%Y/%m/%d')
return d
df = pd.read_csv('data/custom_date.csv', parse_dates=True, index_col='Date', date_parser=date_parser)
df
[Out:]
Price
Date
2016-01-01 10.2
2016-01-02 10.4
2016-01-03 10.5
2016-01-04 10.8
2016-01-05 11.2
2016-01-06 10.6
df.index
[Out:]
DatetimeIndex(['2016-01-01', '2016-01-02', '2016-01-03', '2016-01-04',
'2016-01-05', '2016-01-06'],
dtype='datetime64[ns]', name=u'Date', freq=None)