Python之pandsa库apply,map,applymap使用详解

pandas官方文档http://pandas.pydata.org/pandas-docs/stable/index.html

作用：在Pandas中应用函数

1. apply的使用（最为灵活的使用）

常见使用方式：

import pandas as pd
import numpy as np
dates = pd.date_range('20130101', periods=6)
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD')) # 通过index和columns定义行索引和列索引
df

               A	         B	         C          D
2013-01-01	0.345018	-1.152533	0.579335	0.785159
2013-01-02	0.345413	-1.789114	-1.291485	0.748488
2013-01-03	-0.463054	0.836781	-0.534286	0.408583
2013-01-04	1.056680	-2.314360	-0.148108	0.518469
2013-01-05	-1.387264	-1.029623	-1.805952	-0.904334
2013-01-06	-0.073426	0.413364	-0.371006	-0.023453

df.apply(sum)
A    -0.521652
B    -3.882952
C    -3.571501
D    30.000000
F    18.000000
dtype: float64

df.apply(sum,axis=0)
A    -0.521652
B    -3.882952
C    -3.571501
D    30.000000
F    18.000000
dtype: float64

df.apply(sum,axis=1)
2013-01-01    8.579335
2013-01-02    3.264814
2013-01-03    6.839441
2013-01-04    6.594212
2013-01-05    4.777161
2013-01-06    9.968932
Freq: D, dtype: float64

总结：默认将函数作用于每一列，可以通过设置axis来设定函数作用的轴，0作用在列，1作用在行

通过apply作用在指定的列或指定的行(接着上面的代码演示)

df["B"].apply(sum)  # 想将sum函数作用在B列

TypeError: 'float' object is not iterable

原本想通过以上代码将sum函数作用在B列，结果报错了，于是换了一种方式：

df[["B"]].apply(sum)     # 作用于指定行，两个中括号

B   -3.882952
dtype: float64

用以上代码就成功了，于是就想着查看一下原因在那里。

print(type(df["B"]))
<class 'pandas.core.series.Series'>

print(type(df[["B"]]))
<class 'pandas.core.frame.DataFrame'>

通过以上代码可以看出两者的区别在于一个中括号返回的是Series类型，而两个中括号返回的是DataFrame格式。
于是就在想难道是apply只能作用在DataFrame类型而不能作用在Series类型上，于是又做了一下测试：

s = pd.Series([1, 3, 5, 6, 6, 8])
s

0    1
1    3
2    5
3    6
4    6
5    8

s.apply(lambda x:x+50)

0    51
1    53
2    55
3    56
4    56
5    58

通过这段测试代码发现不是这个原因导致，apply可以作用于Series类型，那么问题又出在那里呢，于是看看异常，是TypeError: ‘float’ object is not iterable，难道是sum函数的问题，于是又接着测试：

b=pd.DataFrame(s)
b

	0
0	1
1	3
2	5
3	6
4	6
5	8

b.apply(sum)

0    29
dtype: int64

s.apply(sum)
TypeError: 'int' object is not iterable

sum([1,2])

3

sum(1,2)

TypeError: 'int' object is not iterable

通过上述代码可以发现sum函数的参数需要可迭代的，当给1,2时，sum作用在1时int类型不可迭代，所以报出异常，同理在Series类型的s中，sum作用在第一个元素1上时也是int类型，所以报错。而作用在DataFrame类型b上时，sum作用在b的第一个元素上（即b[0]一个Series类型的数据）是可以迭代的，所以正常运行。

apply的使用总结：

apply可以将函数作用于行或列上，通过axis来设置作用轴，默认为0作用在列上，设置为1作用在行上
跟据传入函数的传参要求进行传参

2.map的使用

代码还是接着上方代码演示

df

               A	         B	         C          D
2013-01-01	0.345018	-1.152533	0.579335	0.785159
2013-01-02	0.345413	-1.789114	-1.291485	0.748488
2013-01-03	-0.463054	0.836781	-0.534286	0.408583
2013-01-04	1.056680	-2.314360	-0.148108	0.518469
2013-01-05	-1.387264	-1.029623	-1.805952	-0.904334
2013-01-06	-0.073426	0.413364	-0.371006	-0.023453

df["A"].map(lambda x:x+50)     # 作用于指定列的元素

2013-01-01    50.000000
2013-01-02    50.345413
2013-01-03    49.536946
2013-01-04    51.056680
2013-01-05    48.612736
2013-01-06    49.926574
Freq: D, Name: A, dtype: float64

df[["A"]].map(lambda x:x+50) 

AttributeError: 'DataFrame' object has no attribute 'map'

map使用总结：map只能将函数作用在Series类型上，通过异常很容易发现DataFrame没有这个特性。（两个中括号和一个钟阔号取值的区别上面的代码已经做出说明）

3.applymap的使用

df

                 A	         B	        C	    D  	F
2013-01-01	0.000000	0.000000	0.579335	5	3.0
2013-01-02	0.345413	-1.789114	-1.291485	5	1.0
2013-01-03	-0.463054	0.836781	-0.534286	5	2.0
2013-01-04	1.056680	-2.314360	-0.148108	5	3.0
2013-01-05	-1.387264	-1.029623	-1.805952	5	4.0
2013-01-06	-0.073426	0.413364	-0.371006	5	5.0

df.applymap(lambda x:x+50)     # 作用于表格中所有元素


                   A	       B	        C	 D	 
2013-01-01	50.000000	50.000000	50.579335	55	53.0
2013-01-02	50.345413	48.210886	48.708515	55	51.0
2013-01-03	49.536946	50.836781	49.465714	55	52.0
2013-01-04	51.056680	47.685640	49.851892	55	53.0
2013-01-05	48.612736	48.970377	48.194048	55	54.0
2013-01-06	49.926574	50.413364	49.628994	55	55.0

df[["A","B"]].applymap(lambda x:x+50)     # 作用于指定的几列元素

                   A	B
2013-01-01	50.000000	50.000000
2013-01-02	50.345413	48.210886
2013-01-03	49.536946	50.836781
2013-01-04	51.056680	47.685640
2013-01-05	48.612736	48.970377
2013-01-06	49.926574	50.413364

df["A"].applymap(lambda x:x+50)     # 作用于指定的几列元素
AttributeError: 'Series' object has no attribute 'applymap'

applymap使用总结:将行数作用在表格中每个单元的数据上，但是只能用在DataFrame类型上，Series类型没有这一特性，和map相反。

wtzhu_13

发布了14 篇原创文章 · 获赞 0 · 访问量 2395

私信关注