Numpy库
numpy库用于高性能科学计算和数据分析,是常用的高级数据分析库的基础包。
# 1.一维数组
arr1 = np.array([1, 2, 3])
print(arr1, arr1.dtype)
arr2 = np.array([1.2, 2.3, 3.4])
print(arr2, arr2.dtype)
arr3 = arr1 + arr2
print(arr3)
# [1 2 3] int32
# [1.2 2.3 3.4] float64
# [2.2 4.3 6.4]
# 2.二维数组
arr1 = np.array([[1, 2, 3], [4, 5, 6]])
print(arr1, arr1.dtype)
# [[1 2 3]
# [4 5 6]] int32
# 3.定义全0矩阵
print(np.zeros(5))
# [0. 0. 0. 0. 0.]
print(np.zeros([3, 4]))
# [[0. 0. 0. 0.]
# [0. 0. 0. 0.]
# [0. 0. 0. 0.]]
Pandas库
pandas库包含两种数据结构series和dataframe,分别对应以为数组和二维数组的处理。
Series基本操作
# 1.创建list,自带索引
obj = Series([4, 5, 6, 8])
print(obj.index)
print(obj.values)
print(obj)
# RangeIndex(start=0, stop=4, step=1)
# [4 5 6 8]
# 0 4
# 1 5
# 2 6
# 3 8
# dtype: int64
# 2.更改索引
dic = Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])
print(dic)
print(dic['a']) # 按字典的访问方式进行访问
# a 1
# b 2
# c 3
# d 4
# dtype: int64
# 1
# 3.字典转换为Series,键作为索引
dic = {'name': 'jack', 'sex': 'male', 'age': 13}
obj = Series(dic)
print(obj)
# name jack
# sex male
# age 13
# dtype: object
DataFrame基本操作
# 1.DataFrame 构造表格
data = {
'city': ['beijing', 'shanghai', 'tianjin'],
'year': [2015, 2016, 2014],
'gdp': [1, 2, 3]
}
# index:指定索引
# columns: 指定列排序
frame = DataFrame(data, index=range(1, 4), columns=['year', 'city', 'gdp'])
print(frame)
# year city gdp
# 1 2015 beijing 1
# 2 2016 shanghai 2
# 3 2014 tianjin 3
# 2.取表格中数据
print(frame[0:2]) # 切片取行
# year city gdp
# 1 2015 beijing 1
# 2 2016 shanghai 2
print(frame['city']) # 取列
# 1 beijing
# 2 shanghai
# 3 tianjin
# Name: city, dtype: object
# 3.添加新的一列
frame['pop'] = [100, 200, 300]
print(frame)
# year city gdp pop
# 1 2015 beijing 1 100
# 2 2016 shanghai 2 200
# 3 2014 tianjin 3 300
frame['capital'] = frame['city'] == 'beijing' # 根据现有列生成新的一列
print(frame)
# year city gdp pop capital
# 1 2015 beijing 1 100 True
# 2 2016 shanghai 2 200 False
# 3 2014 tianjin 3 300 False
# 4.字典嵌套构造DataFrame
data2 = {
'beijing': {2008: 1, 2009: 2, 2010: 3},
'shanghai': {2008: 2, 2009: 3, 2010: 4}
}
frame2 = DataFrame(data2)
print(frame2)
# beijing shanghai
# 2008 1 2
# 2009 2 3
# 2010 3 4
print(frame2.T) # 转置矩阵
# 2008 2009 2010
# beijing 1 2 3
# shanghai 2 3 4
Matplotlib库
matplotlib是python的一个2D绘图库。
# 绘制简单曲线
plt.plot([1, 2, 3], [4, 9, 6])
plt.show()
...