文章目录
5-5 索引与分层索引
查看索引
df.index
- 查看索引
- 注意 : 索引值不能够单独赋值,只能进行整体的赋值
In [6]: import pandas as pd
In [7]: import numpy as np
In [8]: df = pd.DataFrame(np.arange(12).reshape(3,4), index=list('abc'), columns=list('qwer'))
In [9]: df
Out[9]:
q w e r
a 0 1 2 3
b 4 5 6 7
c 8 9 10 11
In [10]: # 查看索引
In [11]: df.index
Out[11]: Index(['a', 'b', 'c'], dtype='object')
In [12]: # 索引并不能单独赋值并修改
In [13]: df.index[0] = 'e'
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-13-57fd5743f906> in <module>
----> 1 df.index[0] = 'e'
d:\python3.6.5\lib\site-packages\pandas\core\indexes\base.py in __setitem__(self, key, value)
4258
4259 def __setitem__(self, key, value):
-> 4260 raise TypeError("Index does not support mutable operations")
4261
4262 def __getitem__(self, key):
TypeError: Index does not support mutable operations
In [14]: # 索引只能通过对应索引重新赋值并修改
In [16]: df.index = list('nms')
In [17]: df
Out[17]:
q w e r
n 0 1 2 3
m 4 5 6 7
s 8 9 10 11
重置索引
df.reindex()
- 如果新添加的索引中没有对应的值,则默认为nan
- 如果减少索引的值出现,相当于切片
In [22]: df
Out[22]:
q w e r
n 0 1 2 3
m 4 5 6 7
s 8 9 10 11
In [23]: # 对df进行重置索引
In [24]: df.reindex(list('nma'))
Out[24]:
q w e r
n 0.0 1.0 2.0 3.0
m 4.0 5.0 6.0 7.0
a NaN NaN NaN NaN
In [25]: # 当重置的索引中没有对应的值的话显示为nan
In [26]: # 当重置的索引中的索引值不勾,则相当于切片
In [27]: df.reindex(list('ns'))
Out[27]:
q w e r
n 0 1 2 3
s 8 9 10 11
指定索引
df.set_index()
- 将Dataframe中的列转换为行索引
In [29]: df
Out[29]:
q w e r
n 0 1 2 3
m 4 5 6 7
s 8 9 10 11
In [30]: # set_index 为DataFram中的列转化为行索引
In [31]: df.set_index('q')
Out[31]:
w e r
q
0 1 2 3
4 5 6 7
8 9 10 11
In [32]: # set_index 中有个参数 drop,
In [33]: # drop : 该参数默认为True 当指定为False时,可以将指定的列索引数值显示出来
In [34]: df.set_index('q', drop=False)
Out[34]:
q w e r
q
0 0 1 2 3
4 4 5 6 7
8 8 9 10 11
返回index的唯一值
df.set_index("M").index.unique()
df.set_index('q').index
: 显示为index索引- unique : 过滤掉重复的索引
In [48]: df
Out[48]:
q w e r
n 0 1 2 3
m 8 5 6 7
s 8 9 10 11
In [49]: # unique 主要查看是否是唯一字段
In [50]: df.set_index('q').index.unique()
Out[50]: Int64Index([0, 8], dtype='int64', name='q')
分层索引
分层索引是Pandas的重要特性,允许你在一个轴向上拥有多个(两个或两个以上)索引层级。
In [52]: # 由于数据中索引出现重复的值将会显示为空号,当我们想取多层索引的时候可以传入列表
In [53]: df.set_index(['q','w'])
Out[53]:
e r
q w
0 1 2 3
8 5 6 7
9 10 11
In [55]: df1 = pd.DataFrame({'a': range(7),'b':range(7,0,-1),'c':['one','one','one','two','two',
...: 'two','two'],'d':list('hjklmno')})
In [56]: df1
Out[56]:
a b c d
0 0 7 one h
1 1 6 one j
2 2 5 one k
3 3 4 two l
4 4 3 two m
5 5 2 two n
6 6 1 two o
In [57]: df2 = df1.set_index(['c','d'])
In [58]: df2
Out[58]:
a b
c d
one h 0 7
j 1 6
k 2 5
two l 3 4
m 4 3
n 5 2
o 6 1
分层索引即切片
- loc
- iloc
交换索引
交换的索引是内层与外层之间的索引
- `df.swaplevel(i=level1, j=level2)
- 交换set_index后的内层与外层索引
- level为层级
In [21]: # 创建二维数组
In [22]: df = pd.DataFrame(np.arange(12).reshape(3,4), index=list('abc'), columns=list('qwer'))
In [23]: # 设置多成索引
In [24]: df
Out[24]:
q w e r
a 0 1 2 3
b 4 5 6
分层索引也可以进行排序
sort_index(ascending=True)
- ascending : 默认情况下为True为升序,设置为False就变成降序
In [32]: df1
Out[32]:
e r
w q
1 0 2 3
5 4 6 7
9 8 10 11
In [33]: df1.sort_index()
Out[33]:
e r
w q
1 0 2 3
5 4 6 7
9 8 10 11
In [33]: #查看源代码
In [34]: df1.sort_index??
Signature:
df1.sort_index(
axis=0,
level=None,
ascending=True,
inplace=False,
kind='quicksort',
na_position='last',
sort_remaining=True,
by=None,
)
Docstring:
Sort object by labels (along an axis).
Parameters
----------
axis : {0 or 'index', 1 or 'columns'}, default 0
The axis along which to sort. The value 0 identifies the rows,
and 1 identifies the columns.
level : int or level name or list of ints or list of level names
If not None, sort on values in specified index level(s).
ascending : bool, default True
Sort ascending vs. descending.
inplace : bool, default False
If True, perform operation in-place.
kind : {'quicksort', 'mergesort', 'heapsort'}, default 'quicksort'
Choice of sorting algorithm. See also ndarray.np.sort for more
information. `mergesort` is the only stable algorithm. For
DataFrames, this option is only applied when sorting on a single
column or label.
na_position : {'first', 'last'}, default 'last'
Puts NaNs at the beginning if `first`; `last` puts NaNs at the end.
Not implemented for MultiIndex.
sort_remaining : bool, default True
If True and sorting by level and index is multilevel, sort by other
levels too (in order) after sorting by specified level.
Returns
-------
sorted_obj : DataFrame or None
In [35]: df1.sort_index(ascending=False)
Out[35]:
e r
w q
9 8 10 11
5 4 6 7
1 0 2 3
In [36]: # 由于我们的数据是按照从小到大的效果并看不出来什么效果
In [37]: # 所以我们采用升序
In [38]: # sort_index()
In [39]: # 里面有个参数ascending
In [40]: # 默认情况下为True 这情况为降序,将我们设置为True的时候为升序
聚合函数
- 可以指定mean sum等其他操作
In [53]: df1
Out[53]:
e r
w q
1 0 2 3
5 4 6 7
9 8 10 11
In [54]: df1.sum()
Out[54]:
e 18
r 21
dtype: int64
# level 指定内层索引,就是内层索引进行聚合函数计算
In [55]: df1.sum(level=1)
Out[55]:
e r
q
0 2 3
4 6 7
8 10 11
将多层索引恢复到数据中
reset_index()
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: df = pd.DataFrame(np.arange(12).reshape(3,4), index=list('abc'), columns=list('qwer'))
In [4]: df
Out[4]:
q w e r
a 0 1 2 3
b 4 5 6 7
c 8 9 10 11
In [5]: # 设置多层索引
In [6]: df1 = df.set_index(['q','w','r'])
In [7]: df1
Out[7]:
e
q w r
0 1 3 2
4 5 7 6
8 9 11 10
In [8]: # reset_index : 为把多层索引转换为数据
In [9]: df1 = df1.reset_index()
In [10]: df1
Out[10]:
q w r e
0 0 1 3 2
1 4 5 7 6
2 8 9 11 10
5-6 时间序列
时间序列前言
时间序列数据在很多领域都是重要的结构化数据形式,比如金融,生态学,物理学。在多个时间点观测的数据形成了时间序列。时间序列可以是固定频率的,也可以是不规则的
不使用Pandas创建的时间序列索引
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: from datetime import datetime
In [4]: dates = [datetime(2020,5,18),datetime(2020,5,19),datetime(2020,5,20)]
In [5]: Sr = pd.Series(np.random.randint(20,40, size=3), index=dates)
In [6]: Sr
Out[6]:
2020-05-18 34
2020-05-19 33
2020-05-20 33
dtype: int32
In [7]: Sr.index
Out[7]: DatetimeIndex(['2020-05-18', '2020-05-19', '2020-05-20'], dtype='datetime64[ns]', freq=None)
In [8]: # 取数据出来进行计算
In [9]: Sr[::2]
Out[9]:
2020-05-18 34
2020-05-20 33
dtype: int32
In [10]: Sr1 = Sr[::2]
In [11]: # 算术运算 会自动补齐 对应的值,对应运算,当没有数据进行运算的时候会显示NaN
In [12]: Sr + Sr1
Out[12]:
2020-05-18 68.0
2020-05-19 NaN
2020-05-20 66.0
dtype: float64
In [13]: # 数据类型为纳秒级别
In [14]: Sr.index
Out[14]: DatetimeIndex(['2020-05-18', '2020-05-19', '2020-05-20'], dtype='datetime64[ns]', freq=None)
In [15]: Sr.index.dtype
Out[15]: dtype('<M8[ns]')
时间序列基础
时间序列介绍
Pandas中的基础时间序列种类是由时间戳索引的Series,在Pandas外部通常表示为Panda字符串或datetime对象。
注意
- datetime对象可作为索引,时间序列DatetimeIndex
- <M8[ns]类型为纳秒级别的时间戳
- 时间序列里面每个元素为Timestamp对象
生成时间序列索引
pd.date_range(start=None,end=None,periods=None,frep=None,tz=None,normalize=False,name=None,closed=None)
- start : 起始时间
- end : 结束时间
- periods : 固定时期
- freq : 日期偏移量(频率)
- h : 为小时
- min : 为分钟
- s : 为秒
- D : 为天
- W : 为周
- M : 为月
- Y : 为年
- normalize : 标准化为0的时间戳
In [40]: dt = pd.date_range(start='20200101', end='20200520',freq='1h')
In [41]: dt
Out[41]:
DatetimeIndex(['2020-01-01 00:00:00', '2020-01-01 01:00:00',
'2020-01-01 02:00:00', '2020-01-01 03:00:00',
'2020-01-01 04:00:00', '2020-01-01 05:00:00',
'2020-01-01 06:00:00', '2020-01-01 07:00:00',
'2020-01-01 08:00:00', '2020-01-01 09:00:00',
...
'2020-05-19 15:00:00', '2020-05-19 16:00:00',
'2020-05-19 17:00:00', '2020-05-19 18:00:00',
'2020-05-19 19:00:00', '2020-05-19 20:00:00',
'2020-05-19 21:00:00', '2020-05-19 22:00:00',
'2020-05-19 23:00:00', '2020-05-20 00:00:00'],
dtype='datetime64[ns]', length=3361, freq='H')
In [42]: # 当指定分钟的时候
In [43]: dt = pd.date_range(start='20200101', end='20200520',freq='1h30min')
In [44]: dt
Out[44]:
DatetimeIndex(['2020-01-01 00:00:00', '2020-01-01 01:30:00',
'2020-01-01 03:00:00', '2020-01-01 04:30:00',
'2020-01-01 06:00:00', '2020-01-01 07:30:00',
'2020-01-01 09:00:00', '2020-01-01 10:30:00',
'2020-01-01 12:00:00', '2020-01-01 13:30:00',
...
'2020-05-19 10:30:00', '2020-05-19 12:00:00',
'2020-05-19 13:30:00', '2020-05-19 15:00:00',
'2020-05-19 16:30:00', '2020-05-19 18:00:00',
'2020-05-19 19:30:00', '2020-05-19 21:00:00',
'2020-05-19 22:30:00', '2020-05-20 00:00:00'],
dtype='datetime64[ns]', length=2241, freq='90T')
In [45]: # 当指定秒数的时候
In [46]: dt = pd.date_range(start='20200101', end='20200520',freq='1h30min30s')
In [47]: dt
Out[47]:
DatetimeIndex(['2020-01-01 00:00:00', '2020-01-01 01:30:30',
'2020-01-01 03:01:00', '2020-01-01 04:31:30',
'2020-01-01 06:02:00', '2020-01-01 07:32:30',
'2020-01-01 09:03:00', '2020-01-01 10:33:30',
'2020-01-01 12:04:00', '2020-01-01 13:34:30',
...
'2020-05-19 09:29:00', '2020-05-19 10:59:30',
'2020-05-19 12:30:00', '2020-05-19 14:00:30',
'2020-05-19 15:31:00', '2020-05-19 17:01:30',
'2020-05-19 18:32:00', '2020-05-19 20:02:30',
'2020-05-19 21:33:00', '2020-05-19 23:03:30'],
dtype='datetime64[ns]', length=2228, freq='5430S')
In [48]: # 当指定为天
In [49]: dt = pd.date_range(start='20200101', end='20200520',freq='1D')
In [50]: dt
Out[50]:
DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04',
'2020-01-05', '2020-01-06', '2020-01-07', '2020-01-08',
'2020-01-09', '2020-01-10',
...
'2020-05-11', '2020-05-12', '2020-05-13', '2020-05-14',
'2020-05-15', '2020-05-16', '2020-05-17', '2020-05-18',
'2020-05-19', '2020-05-20'],
dtype='datetime64[ns]', length=141, freq='D')
In [51]: # 当指定为周
In [52]: dt = pd.date_range(start='20200101', end='20200520',freq='1W')
In [53]: dt
Out[53]:
DatetimeIndex(['2020-01-05', '2020-01-12', '2020-01-19', '2020-01-26',
'2020-02-02', '2020-02-09', '2020-02-16', '2020-02-23',
'2020-03-01', '2020-03-08', '2020-03-15', '2020-03-22',
'2020-03-29', '2020-04-05', '2020-04-12', '2020-04-19',
'2020-04-26', '2020-05-03', '2020-05-10', '2020-05-17'],
dtype='datetime64[ns]', freq='W-SUN')
In [54]: # 当指定为月
In [55]: dt = pd.date_range(start='20200101', end='20200520',freq='1M')
In [56]: dt
Out[56]: DatetimeIndex(['2020-01-31', '2020-02-29', '2020-03-31', '2020-04-30'], dtype='datetime64[ns]', freq='M')
# periods 划分为5个区间
# 当不指定end值的时候,将会按照periods为划分区间,当我们不设置freq时,会采用默认参数d
In [57]: dt = pd.date_range(start='20200101',periods=5)
In [58]: dt
Out[58]:
DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04',
'2020-01-05'],
dtype='datetime64[ns]', freq='D')
In [21]: # periods 为固定时间序列
In [22]: # normalize 为标准化时间为0的时间戳
In [23]: df = pd.date_range(start='2020-05-21', periods=5, normalize=True)
In [24]: df
Out[24]:
DatetimeIndex(['2020-05-21', '2020-05-22', '2020-05-23', '2020-05-24',
'2020-05-25'],
dtype='datetime64[ns]', freq='D')
时间序列索引及选择数据
- 时间序列取值通过
[]
来进行取值 - 年份月份日之间需要使用
空格来进行操作
- 也可以通过
-
进行桥接 - 也支持
loc
和iloc
等操作
In [21]: # periods 为固定时间序列
In [22]: # normalize 为标准化时间为0的时间戳
In [25]: ts = pd.Series(np.random.randint(20,50,size=100),index=pd.date_range(start='20200521',periods=100))
In [26]: ts
Out[26]:
2020-05-21 48
2020-05-22 49
2020-05-23 23
2020-05-24 26
2020-05-25 30
..
2020-08-24 34
2020-08-25 25
2020-08-26 44
2020-08-27 23
2020-08-28 41
Freq: D, Length: 100, dtype: int32
In [27]: # periods为时间间隔,由于不指定end,freq是以D来进行划分也就是一天
In [28]: # 进行时间序列索引操作
In [29]: # 选取2020的数据
In [30]: ts['2020']
Out[30]:
2020-05-21 48
2020-05-22 49
2020-05-23 23
2020-05-24 26
2020-05-25 30
..
2020-08-24 34
2020-08-25 25
2020-08-26 44
2020-08-27 23
2020-08-28 41
Freq: D, Length: 100, dtype: int32
In [31]: # 选取2020 5 月的数据
In [33]: ts['2020 05']
Out[33]:
2020-05-21 48
2020-05-22 49
2020-05-23 23
2020-05-24 26
2020-05-25 30
2020-05-26 29
2020-05-27 27
2020-05-28 32
2020-05-29 40
2020-05-30 38
2020-05-31 35
Freq: D, dtype: int32
In [34]: # 年份月份日之间要进行空格相隔
In [35]: # 取2020年5月01日至5月10日的数据
In [36]: ts['2020 05 01' : '2020 05 10']
Out[36]: Series([], Freq: D, dtype: int32)
In [37]: ts
Out[37]:
2020-05-21 48
2020-05-22 49
2020-05-23 23
2020-05-24 26
2020-05-25 30
..
2020-08-24 34
2020-08-25 25
2020-08-26 44
2020-08-27 23
2020-08-28 41
Freq: D, Length: 100, dtype: int32
In [38]: # 取2020年5月的所有数据
In [39]: ts['2020 05 21':'2020 05 31']
Out[39]:
2020-05-21 48
2020-05-22 49
2020-05-23 23
2020-05-24 26
2020-05-25 30
2020-05-26 29
2020-05-27 27
2020-05-28 32
2020-05-29 40
2020-05-30 38
2020-05-31 35
Freq: D, dtype: int32
In [40]: ts.loc['2020-05']
Out[40]:
2020-05-21 48
2020-05-22 49
2020-05-23 23
2020-05-24 26
2020-05-25 30
2020-05-26 29
2020-05-27 27
2020-05-28 32
2020-05-29 40
2020-05-30 38
2020-05-31 35
Freq: D, dtype: int32
时间序列也含有重复的索引
df.index.is_unique
- 检查索引是否有重复的值出现
- 当显示为True 表示为没有重复的索引
- 当显示为False 表示为有重复的索引
In [51]: dates = [datetime(2020, 5, 21),datetime(2020, 5, 21),datetime(2020, 5, 22),datetime(2020, 5, 23)]
In [52]: dates
Out[52]:
[datetime.datetime(2020, 5, 21, 0, 0),
datetime.datetime(2020, 5, 21, 0, 0),
datetime.datetime(2020, 5, 22, 0, 0),
datetime.datetime(2020, 5, 23, 0, 0)]
In [53]: st = pd.Series(np.random.randint(20,30,size=4),index=dates)
In [54]: st
Out[54]:
2020-05-21 23
2020-05-21 28
2020-05-22 29
2020-05-23 24
dtype: int32
In [55]: # 检查是否有重复索引
In [56]: # 当为false 显示有重复
In [57]: # 当为true 显示没有重复
In [59]: st.index.is_unique
Out[59]: False
In [61]: # 当有重复索引获取值的时候也不会进行报错
In [62]: st.loc['2020-05-21']
Out[62]:
2020-05-21 23
2020-05-21 28
dtype: int32
重复索引进行分组运算
In [70]: dates = [datetime(2020, 5, 21),datetime(2020, 5, 21),datetime(2020, 5, 22),datetime(2020, 5, 22)]
In [71]: st = pd.Series(np.random.randint(20,30,size=4),index=dates)
In [72]: st
Out[72]:
2020-05-21 29
2020-05-21 20
2020-05-22 29
2020-05-22 25
dtype: int32
In [73]: # 重复索引进行分组在进行求和运算
In [74]: st = st.groupby(level=0).sum()
In [75]: st
Out[75]:
2020-05-21 49
2020-05-22 54
dtype: int32
移位日期
"移位"指的是将日期按时间向前移动或向后移动。Series和DataFrame都有一个shift
方法用于进行简单的前向或后向移位而不改变索引
In [77]: import pandas as pd
In [78]: import numpy as np
In [79]: st = pd.Series(np.random.randint(20,30,size=100),index=pd.date_range(start='20200521',periods=100))
In [80]: st
Out[80]:
2020-05-21 27
2020-05-22 25
2020-05-23 21
2020-05-24 23
2020-05-25 23
..
2020-08-24 25
2020-08-25 21
2020-08-26 27
2020-08-27 21
2020-08-28 25
Freq: D, Length: 100, dtype: int32
In [81]: # 当我进行指定向前进行移位,向前移动时,由于前面没数据,使用nan填充
In [82]: st.shift(2)
Out[82]:
2020-05-21 NaN
2020-05-22 NaN
2020-05-23 27.0
2020-05-24 25.0
2020-05-25 21.0
...
2020-08-24 21.0
2020-08-25 28.0
2020-08-26 25.0
2020-08-27 21.0
2020-08-28 27.0
Freq: D, Length: 100, dtype: float64
In [83]: # 也可以进行向后进行移位
In [84]: st.shift(-2)
Out[84]:
2020-05-21 21.0
2020-05-22 23.0
2020-05-23 23.0
2020-05-24 22.0
2020-05-25 22.0
...
2020-08-24 27.0
2020-08-25 21.0
2020-08-26 25.0
2020-08-27 NaN
2020-08-28 NaN
Freq: D, Length: 100, dtype: float64
应用场景
- 计算增长率
- (后一天-前一天)/ 前一天
- 后一天/前天 -1
pd.pct_chang()
In [85]: st.pct_change()
Out[85]:
2020-05-21 NaN
2020-05-22 -0.074074
2020-05-23 -0.160000
2020-05-24 0.095238
2020-05-25 0.000000
...
2020-08-24 -0.107143
2020-08-25 -0.160000
2020-08-26 0.285714
2020-08-27 -0.222222
2020-08-28 0.190476
Freq: D, Length: 100, dtype: float64
# 通过shift 也可以实现
In [86]: st/st.shift(1)-1
Out[86]:
2020-05-21 NaN
2020-05-22 -0.074074
2020-05-23 -0.160000
2020-05-24 0.095238
2020-05-25 0.000000
...
2020-08-24 -0.107143
2020-08-25 -0.160000
2020-08-26 0.285714
2020-08-27 -0.222222
2020-08-28 0.190476
Freq: D, Length: 100, dtype: float64
5-7 重采样
重采样介绍
重采样:指的是将时间序列从一个频率转化为另一个频率进行处理的过程,将高频率数据转化为低频率数据为降采样,低频率转化为高频率为升采样。
In [87]: import pandas as pd
In [88]: import numpy as np
In [89]: df = pd.DataFrame(np.random.randint(20,30,size=10),index=pd.date_range(start='20200521',periods=10))
In [90]: df
Out[90]:
0
2020-05-21 25
2020-05-22 25
2020-05-23 23
2020-05-24 23
2020-05-25 20
2020-05-26 22
2020-05-27 23
2020-05-28 28
2020-05-29 23
2020-05-30 22
In [91]: # 采用重采样 resample 可以指定类型
In [92]: df.resample('d').mean()
Out[92]:
0
2020-05-21 25
2020-05-22 25
2020-05-23 23
2020-05-24 23
2020-05-25 20
2020-05-26 22
2020-05-27 23
2020-05-28 28
2020-05-29 23
2020-05-30 22
# 当以星期来进行操作
In [93]: df.resample('w').mean()
Out[93]:
0
2020-05-24 24
2020-05-31 23
练习
北上广深与沈阳5个城市空气质量数据,绘制出北京的PM2.5随时间的变化情况
# @Time : 2020/5/21 14:25
# @Author : SmallJ
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
# 读取csv文件
df = pd.read_csv('PM2.5/BeijingPM20100101_20151231.csv')
# 显示所有的数据
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
# 读取一行数据
df.head(1)
# PeriodIndex 为时间段
datetime = pd.PeriodIndex(year=df.year, month=df.month, day=df.day, hour=df.hour, freq="h")
# 添加一列值
df['datetime'] = datetime
# 设置datetime为索引,在原数据上进行修改
df.set_index(df.datetime, inplace=True)
# freq : 以1小时为基础
# 采用重采样进行进行频率处理
df = df.resample('7D').mean()
# 处理缺失值
data = df['PM_US Post'].dropna()
# 绘制图片
x = data.index
y = data.values
# 中文显示设置
font = {'family': 'SimHei'}
matplotlib.rc('font', **font)
# 设置画布大小
plt.figure(figsize=(15, 8), dpi=80)
# 显示title
plt.title('北京的PM2.5天气情况')
# 绘制折线图
# 这里并不能直接采用x 为什么呢,因为x的数据类型为 period[7D]
plt.plot(range(len(x)), y, color='blue')
# 设置x轴的刻度
# ticks=None, labels=None
# ticks 为刻度
# labels 为标签
plt.xticks(ticks=range(0, len(x))[::10], labels=x[::10], rotation=45)
# 绘图
plt.savefig('beijingpm.png')
# 展示图例
plt.show()