pandas入门：使用 loc 和 iloc 选择数据，算术和数据对齐，使用填充值的算数方法，DataFrame and Series 间的操作

使用 loc 和 iloc 选择数据

之前提到过，这个方法可以更加简洁的标签索引。。。针对dataframe 在行上的标签索引，可以使用轴标签（loc) 或整数标签（ iloc ) 以numpy 风格的语法从 dataframe 中选出数据组的行和列的子集。

frame = pd.DataFrame(np.arange(9).reshape((3,3)),index=['a','b','c'],
                     columns=['ohio','texas','california'])
frame
	ohio	texas	california
a	0	1	2
b	3	4	5
c	6	7	8

frame.loc['a',['ohio','california']] # 选出单行多列
ohio          0
california    2
Name: a, dtype: int32

frame.iloc[1,[2,1,0]]     #使用整数标签来选择
california    5
texas         4
ohio          3
Name: b, dtype: int32

frame.loc[:'b',:'texas']     # 还支持切片
ohio	texas
a	0	1
b	3	4

frame.iloc[:1,:1]       # 可以看到，这个就不包含结尾。
	ohio
a	0

frame.loc[:,:'texas'][frame.texas>2]
ohio	texas
b	3	4
c	6	7

在这里插入图片描述额一个总结吧。。。

ser=pd.Series(np.arange(3.))
ser
0    0.0
1    1.0
2    2.0
dtype: float64

ser[:1]
0    0.0
dtype: float64

ser.loc[:1]  # 这是标签索引啊
0    0.0
1    1.0
dtype: float64

ser.iloc[:1]
0    0.0
dtype: float64

算术和数据对齐

不同索引的对象之间的算术行为是pandas 提供给一些应用的一项重要特性。当你将对象相加时，如果存在某个索引对不同，则返回结果的索引将是索引对的并集。

s1=pd.Series([7.3,-2.5,3.4,1.5],index=['a','c','d','e'])
s2=pd.Series([-2.1,3.6,-1.5,4,3.1],index=['a','c','e','f','g'])
s1
a    7.3
c   -2.5
d    3.4
e    1.5
dtype: float64
s2
a   -2.1
c    3.6
e   -1.5
f    4.0
g    3.1
dtype: float64

s2+s1
a    5.2
c    1.1
d    NaN
e    0.0
f    NaN
g    NaN
dtype: float64

# 在 dataframe 中，行和列上都会执行对齐
df1 = pd.DataFrame(np.arange(9.).reshape((3, 3)), columns=list('bcd'),
                   index=['Ohio', 'Texas', 'Colorado'])
df2 = pd.DataFrame(np.arange(12.).reshape((4, 3)), columns=list('bde'),
                   index=['Utah', 'Ohio', 'Texas', 'Oregon'])

df1
       b	c	d
Ohio	0.0	1.0	2.0
Texas	3.0	4.0	5.0
Colorado	6.0	7.0	8.0
df2

df2
       b	d	e
Utah	0.0	1.0	2.0
Ohio	3.0	4.0	5.0
Texas	6.0	7.0	8.0
Oregon	9.0	10.0	11.0

df2+df1                # 相加后获得索引，列是每个 dataframe 的索引，列的并集。

          b	c	d	e
Colorado	NaN	NaN	NaN	NaN
Ohio	3.0	NaN	6.0	NaN
Oregon	NaN	NaN	NaN	NaN
Texas	9.0	NaN	12.0	NaN
Utah	NaN	NaN	NaN	NaN

# 如果将两个行或列完全不同的dataframe 对象相加，结果将全为空。

使用填充值的算数方法

在两个不同的索引化对象之间进行算术操作时，你可能会想要使用特殊填充值，比如当轴标签在一个对象中存在，在另一个对象中不存在，你想将缺失值填充为零。

df1=pd.DataFrame(np.arange(12.).reshape((3,4)),columns=list('abcd'))
df2=pd.DataFrame(np.arange(20.).reshape((4,5)),columns=list('abcde'))
df2.loc[1,'b']=np.nan
	a	b	c	d
0	0.0	1.0	2.0	3.0
1	4.0	5.0	6.0	7.0
2	8.0	9.0	10.0	11.0

df2
   a	b	c	d	e
0	0.0	1.0	2.0	3.0	4.0
1	5.0	NaN	7.0	8.0	9.0
2	10.0	11.0	12.0	13.0	14.0
3	15.0	16.0	17.0	18.0	19.0


df2+df1         # 导致一些不重叠的位置出现了NA 值
   a	b	c	d	e
0	0.0	2.0	4.0	6.0	NaN
1	9.0	NaN	13.0	15.0	NaN
2	18.0	20.0	22.0	24.0	NaN
3	NaN	NaN	NaN	NaN	NaN

df1.add(df2,fill_value=0)
	a	b	c	d	e
0	0.0	2.0	4.0	6.0	4.0           # 将那些缺失值补上了。。。
1	9.0	5.0	13.0	15.0	9.0
2	18.0	20.0	22.0	24.0	14.0
3	15.0	16.0	17.0	18.0	19.0

算术方法	描述
add,radd	加法
sub,rsub	减肥
div,rdiv	除法
floordiv,rfloordiv	整除 //
mul,rmul	乘方
pow,rpow	幂次方

这些方法都有一个以 r 开头的，这些函数的参数是翻转的。也就是说1/df1 == df1.rdiv(1) df1.floordiv(df2)==df2.floordiv(df1)

DataFrame and Series 间的操作

这两者之间的算术操作于numpy 中不同维度数组间的操作类似，

arr=np.arange(12.).reshape((3,4))
arr
array([[ 0.,  1.,  2.,  3.],
       [ 4.,  5.,  6.,  7.],
       [ 8.,  9., 10., 11.]])

arr-arr[0]           # 减法在每一行都进行了操作，这就是所谓的广播机制
array([[0., 0., 0., 0.],
       [4., 4., 4., 4.],
       [8., 8., 8., 8.]])


frame = pd.DataFrame(np.arange(9).reshape((3,3)),index=['a','b','c'],
                     columns=['ohio','texas','california'])
frame
ohio	texas	california
a	0	1	2
b	3	4	5
c	6	7	8

series=frame.iloc[1]

frame-series        # dataframe and series 的数学操作会将series 的索引和 dataframe 的列进行匹配，并广播到各行
ohio	texas	california
a	-3	-3	-3
b	0	0	0
c	3	3	3

# 如果索引不在二者中，就重建索引并形成联合
series=pd.Series(range(3),index=['ohio','bob','texas'])
frame+series
bob	california	ohio	texas
a	NaN	NaN	0.0	3.0
b	NaN	NaN	3.0	6.0
c	NaN	NaN	6.0	9.0

# 如果你想在列上进行广播，在行上进行匹配，必须使用算术方法中的一种，
series=frame['ohio']
frame.sub(series,axis='index')   # 或者 axis=0，这个值是用于匹配轴的，这个例子就是对行上的匹配
	ohio	texas	california
a	0	1	2
b	0	1	2
c	0	1	2

哎，，对数据分析啥的还是没有什么实感啊。。。。

pandas入门：使用 loc 和 iloc 选择数据，算术和数据对齐，使用填充值的算数方法，DataFrame and Series 间的操作

使用 loc 和 iloc 选择数据

算术和数据对齐

使用填充值的算数方法

DataFrame and Series 间的操作

猜你喜欢