7.python数据分析与展示------Pandas库入门

1.Pandas库的介绍

Pandas是Python第三方库，提供高性能易用数据类型和分析工具

import pandas as pd

Pandas基于Numpy实现，常与Numpy和Matplotlib一同使用

import pandas as pd

d =pd.Series(range(20))
print(d)
# 0      0
# 1      1
# 2      2
# 3      3
# 4      4
# 5      5
# 6      6
# 7      7
# 8      8
# 9      9
# 10    10
# 11    11
# 12    12
# 13    13
# 14    14
# 15    15
# 16    16
# 17    17
# 18    18
# 19    19
# dtype: int64
d=d.cumsum()
print(d)
# 0       0
# 1       1
# 2       3
# 3       6
# 4      10
# 5      15
# 6      21
# 7      28
# 8      36
# 9      45
# 10     55
# 11     66
# 12     78
# 13     91
# 14    105
# 15    120
# 16    136
# 17    153
# 18    171
# 19    190
# dtype: int64

两个数据类型：Series,DataFrame

基于上述数据类型的各类操作基本操作、运算操作、特征类操作、关联类操作

2.Pandas库的Series类型

Series类型由一组数据及与之相关的数据索引组成

可自定义索引

b=pd.Series([9,8,7,6],index=['a','b','c','d'])
print(b)
# a    9
# b    8
# c    7
# d    6
# dtype: int64

Series类型：

Series类型可以由如下类型创建：

•Python列表

•标量值

•Python字典

•ndarray

•其他函数

Series类型可以由如下类型创建：

•Python列表，index与列表元素个数一致

•标量值，index表达Series类型的尺寸

•Python字典，键值对中的“键”是索引，index从字典中进行选择操作

•ndarray，索引和数据都可以通过ndarray类型创建

•其他函数，range()函数等

Series类型的基本操作

Series类型包括index和values两部分

Series类型的操作类似ndarray类型

Series类型的操作类似Python字典类型

import pandas as pd
b=pd.Series([9,8,7,6],index=['a','b','c','d'])
print(b)
# a    9
# b    8
# c    7
# d    6
# dtype: int64
print(b.index)
#Index(['a', 'b', 'c', 'd'], dtype='object')
print(b.values)
#[9 8 7 6]

Series类型的操作类似ndarray类型：

•索引方法相同，采用[]

•NumPy中运算和操作可用于Series类型

•可以通过自定义索引的列表进行切片

•可以通过自动索引进行切片，如果存在自定义索引，则一同被切片

import pandas as pd
import numpy as np
b=pd.Series([9,8,7,6],index=['a','b','c','d'])
print(b)
# a    9
# b    8
# c    7
# d    6
# dtype: int64
print(b[3])
#6
print(b[:3])
# a    9
# b    8
# c    7
print(b[b>b.median()])
# a    9
# b    8
# dtype: int64
print(np.exp(b))
# a    8103.083928
# b    2980.957987
# c    1096.633158
# d     403.428793
# dtype: float64

Series类型的操作类似Python字典类型：

•通过自定义索引访问

•保留字in操作

•使用.get()方法

import pandas as pd
import numpy as np
b=pd.Series([9,8,7,6],index=['a','b','c','d'])
print(b['b'])
#8
print('c' in b)
#True
print(0 in b)
#False
print(b.get('f',100))
#100

import pandas as pd
import numpy as np
b=pd.Series([9,8,7,6],index=['a','b','c','d'])
a=pd.Series([1,2,3],['c','d','e'])
print(a+b)
# a    NaN
# b    NaN
# c    8.0
# d    8.0
# e    NaN
# dtype: float64

Series类型在运算中会自动对齐不同索引的数据

import pandas as pd
import numpy as np
b=pd.Series([9,8,7,6],index=['a','b','c','d'])
print(b.name)
#None
b.name='Series对象'
b.index.name='索引列'
print(b)
# 索引列
# a    9
# b    8
# c    7
# d    6
# Name: Series对象, dtype: int64