学习笔记,这个笔记以例子为主。
开发工具:Spyder
pandas介绍
pandas是基于NumPy 的一种工具,该工具是为了解决数据分析任务而创建的。Pandas 纳入 了大量库和一些标准的数据模型,提供了高效地操作大型结构化数据集所需的工具。
Series
Series可以理解为一个一维的数组,只是index名称可以自己改动。类似于定长的有序字典,有Index和 value。
创建Series
- 语法
import pandas as pd
# 创建一个空的系列
s = pd.Series()
# 从ndarray创建一个系列
data = np.array(['a','b','c','d'])
s = pd.Series(data)
s = pd.Series(data,index=[100,101,102,103])
# 从字典创建一个系列
data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data)
# 从标量创建一个系列
s = pd.Series(5, index=[0, 1, 2, 3])
- 例子
代码1(从ndarray创建一个系列):
import numpy as np
import pandas as pd
data = np.array(['Ada', 'Bunny', 'Jack', 'Black'])
s1 = pd.Series(data)
print(s1)
结果1:
0 Ada
1 Bunny
2 Jack
3 Black
dtype: object
代码2(自定义index):
s2 = pd.Series(data, index = [10, 20, 30, 40])
print(s2)
结果2:
10 Ada
20 Bunny
30 Jack
40 Black
dtype: object
代码3(从字典创建一个系列):
data = {"a":0, "b":1, "c":2, 'e':3}
#字典的key为Series的index
s3 = pd.Series(data)
print(s3)
结果3:
a 0
b 1
c 2
e 3
dtype: int64
代码4(从标量创建一个系列):
s4 = pd.Series(10, index = [0, 1, 2, 3])
print(s4)
结果4:
0 10
1 10
2 10
3 10
dtype: int64
访问Series中的数据
- 语法
# 使用索引检索元素
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
print(s[0], s[:3], s[-3:])
# 使用标签检索数据
print(s['a'], s[['a','c','d']])
- 例子
代码1:
import numpy as np
import pandas as pd
data = np.array(['Ada', 'Bunny', 'Jack', 'Black'])
s = pd.Series(data, index = ["a", "b", "c", "d"])
print(s[0], '\n\n',s[:3],'\n\n', s[-3: ])
结果1:
Ada
a Ada
b Bunny
c Jack
dtype: object
b Bunny
c Jack
d Black
dtype: object
代码2:
print(s["a"], '\n\n',s[["a", "b", "c"]])
结果2:
Ada
a Ada
b Bunny
c Jack
dtype: object
pandas日期处理
- 语法
# pandas可以识别的日期字符串格式
dates = pd.Series(['2011', '2011-02', '2011-03-01', '2011/04/01', '2011/05/01 01:01:01', '01 Jun 2011'])
# to_datetime()方法可以转换为日期数据类型
dates = pd.to_datetime(dates)
- 例子
代码1(识别日期):
import numpy as np
import pandas as pd
dates = pd.Series(['1997', '2015-09', '2019-03-01',
'2019/04/01', '2019/05/01 01:01:01',
'01 Jun 2019'])
print(dates)
print("-"*20)
dates = pd.to_datetime(dates)
print(dates)
结果1:
0 1997
1 2015-09
2 2019-03-01
3 2019/04/01
4 2019/05/01 01:01:01
5 01 Jun 2019
dtype: object
--------------------
0 1997-01-01 00:00:00
1 2015-09-01 00:00:00
2 2019-03-01 00:00:00
3 2019-04-01 00:00:00
4 2019-05-01 01:01:01
5 2019-06-01 00:00:00
dtype: datetime64[ns]
代码2(日期运算):
delta = dates - pd.to_datetime('1970-01-01')
print(delta)
print("-"*20)
#通过Series的dt接口,可以访问偏移量数据
print(delta.dt.days)
结果2:
0 9862 days 00:00:00
1 16679 days 00:00:00
2 17956 days 00:00:00
3 17987 days 00:00:00
4 18017 days 01:01:01
5 18048 days 00:00:00
dtype: timedelta64[ns]
--------------------
0 9862
1 16679
2 17956
3 17987
4 18017
5 18048
dtype: int64
Series.dt提供了很多日期相关操作, 部分操作如下:
Series.dt的日期相关操作 | 含义 |
---|---|
Series.dt.year | The year of the datetime. |
Series.dt.month | The month as January=1, December=12. |
Series.dt.day | The days of the datetime. |
Series.dt.hour | The hours of the datetime. |
Series.dt.minute | The minutes of the datetime. |
Series.dt.second | The seconds of the datetime. |
Series.dt.microsecond | The microseconds of the datetime. |
Series.dt.week | The week ordinal of the year. |
Series.dt.weekofyear | The week ordinal of the year. |
Series.dt.dayofweek | The day of the week with Monday=0, Sunday=6. |
Series.dt.weekday | The day of the week with Monday=0, Sunday=6. |
Series.dt.dayofyear | The ordinal day of the year. |
Series.dt.quarter | The quarter of the date. |
Series.dt.is_month_start | Indicates whether the date is the first day of the month. |
Series.dt.is_month_end | Indicates whether the date is the last day of the month. |
Series.dt.is_quarter_start | Indicator for whether the date is the first day of a quarter. |
Series.dt.is_quarter_end | Indicator for whether the date is the last day of a quarter. |
Series.dt.is_year_start | Indicate whether the date is the first day of a year. |
Series.dt.is_year_end Indicate | whether the date is the last day of the year. |
Series.dt.is_leap_year | Boolean indicator if the date belongs to a leap year. |
Series.dt.days_in_month | The number of days in the month. |
代码3(dt接口的各项操作演示):
print(dates.dt.month)
结果3:
0 1
1 9
2 3
3 4
4 5
5 6
dtype: int64
DateTimeIndex
通过指定周期和频率,使用pd.date_range()函数就可以创建日期序列。
- 语法
import pandas as pd
# 以日为频率(默认值), 2019/08/21为起始,创建5个时间数据
datelist = pd.date_range('2019/08/21', periods = 5)
# 以月为频率
datelist = pd.date_range('2019/08/21', periods=5,freq='M')
# 构建某个区间的时间序列
start = pd.datetime(2017, 11, 1)
end = pd.datetime(2017, 11, 5)
dates = pd.date_range(start, end)
- 例子
代码1:
import numpy as np
import pandas as pd
dates1 = pd.date_range('2020-01-01', periods = 5,
freq = 'D')
print(dates1)
print("-"*20)
dates2 = pd.date_range('2015-01-10', periods = 5,
freq = 'M')
print(dates2)
print("-"*20)
start_num = pd.datetime(2019, 1, 1)
end_num = pd.datetime(2019, 1, 5)
dates3 = pd.date_range(start_num, end_num)
print(dates3)
结果1:
DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04',
'2020-01-05'],
dtype='datetime64[ns]', freq='D')
--------------------
DatetimeIndex(['2015-01-31', '2015-02-28', '2015-03-31', '2015-04-30',
'2015-05-31'],
dtype='datetime64[ns]', freq='M')
--------------------
DatetimeIndex(['2019-01-01', '2019-01-02', '2019-01-03', '2019-01-04',
'2019-01-05'],
dtype='datetime64[ns]', freq='D')
代码2:
dates1 = pd.bdate_range('2020-01-01', periods = 10)
print(dates1)
备注:bdate_range()
用来表示商业日期范围,不同于date_range()
,它不包括星期六和星期天。
结果2:
DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-06',
'2020-01-07', '2020-01-08', '2020-01-09', '2020-01-10',
'2020-01-13', '2020-01-14'],
dtype='datetime64[ns]', freq='B')