dropna( )
对于Serial对象
丢弃带有NAN的所有项
In [152]: data=pd.Series([1,np.nan,5,np.nan])
In [153]: data
Out[153]:
0 1.0
1 NaN
2 5.0
3 NaN
dtype: float64
In [154]: data.dropna()
Out[154]:
0 1.0
2 5.0
dtype: float64
对于DataFrame对象
丢弃带有NAN的行
In [19]: data=pd.DataFrame([[1,5,9,np.nan],[np.nan,3,7,np.nan],[6,np.nan,2,np.nan]
...: ,[np.nan,np.nan,np.nan,np.nan],[1,2,3,np.nan]])
In [20]: data
Out[20]:
0 1 2 3
0 1.0 5.0 9.0 NaN
1 NaN 3.0 7.0 NaN
2 6.0 NaN 2.0 NaN
3 NaN NaN NaN NaN
4 1.0 2.0 3.0 NaN
In [21]: data.dropna()
Out[21]:
Empty DataFrame
Columns: [0, 1, 2, 3]
Index: []
丢弃所有元素都是NAN的行
In [22]: data.dropna(how='all')
Out[22]:
0 1 2 3
0 1.0 5.0 9.0 NaN
1 NaN 3.0 7.0 NaN
2 6.0 NaN 2.0 NaN
4 1.0 2.0 3.0 NaN
丢弃所有元素都是NAN的列
In [23]: data.dropna(axis=1,how='all')
Out[23]:
0 1 2
0 1.0 5.0 9.0
1 NaN 3.0 7.0
2 6.0 NaN 2.0
3 NaN NaN NaN
4 1.0 2.0 3.0
只保留至少有3个非NAN值的行
In [24]: data.dropna(thresh=3)
Out[24]:
0 1 2 3
0 1.0 5.0 9.0 NaN
4 1.0 2.0 3.0 NaN
fillna( )
以常数替换NAN值
In [25]: data.fillna(0)
Out[25]:
0 1 2 3
0 1.0 5.0 9.0 0.0
1 0.0 3.0 7.0 0.0
2 6.0 0.0 2.0 0.0
3 0.0 0.0 0.0 0.0
4 1.0 2.0 3.0 0.0
后向填充
In [27]: data.fillna(method='ffill')
Out[27]:
0 1 2 3
0 1.0 5.0 9.0 NaN
1 1.0 3.0 7.0 NaN
2 6.0 3.0 2.0 NaN
3 6.0 3.0 2.0 NaN
4 1.0 2.0 3.0 NaN
后项填充且可以连续填充的最大数量为1
In [28]: data.fillna(method='ffill',limit=1)
Out[28]:
0 1 2 3
0 1.0 5.0 9.0 NaN
1 1.0 3.0 7.0 NaN
2 6.0 3.0 2.0 NaN
3 6.0 NaN 2.0 NaN
4 1.0 2.0 3.0 NaN
方法 | 说明 |
---|---|
dropna | 对缺失的数据进行过滤 |
fillna | 用指定值或插值的方法填充缺失数据 |
isnull | 判断数据是否缺失 |
notnull | isnull的否定式 |