reindex-重新索引2019/1/9
作用是创建一个适应新索引的新对象
1.函数
df.reindex(labels = None,index = None,columns = None,axis = None,method = None,
copy = True,level = None,fill_value = nan,limit = None,tolerance = None )#返回新索引
# 除非新索引等效于当前索引并且copy = False,否则将生成新对象
# 重新索引会更改行标签和列标签。数据匹配标签。新标签位置插入缺失值(NA) fill or pad向前填充;bfill 向后填充
参数
labels = None | 类似数组,新标签/索引 |
index,columns | array-like(应该使用关键字指定)新标签/索引。优选索引对象以避免重复数据 |
axis | int或str,轴('index','columns')或数字(0,1) |
method =None | {None,'pad/bfill','pad'/'ffill','nearest'},填充空值方法。仅适用单调递增/递减 |
pad / ffill | 用前面的值填充 默认不填补空白 |
backfill / bfill | 用后面的值填充 |
copy=True | 即使传递的索引相同,也返回一个新对象 |
level | int或name跨级别广播,匹配传递的MultiIndex级别的索引值 |
fill_value | 标量,默认np.NaN用于缺失值的值。可是任何值 |
limit | int,向前或向后填充的最大连续元素数 |
tolerance | 不精确匹配的原始和新标签之间的最大距离。 |
调用约定
(index=index_labels, columns=column_labels, ...)
(labels, axis={'index', 'columns'}, ...)
建议您使用关键字参数来阐明您的意图
实例1:序列
s=pd.Series([11,12,13],index=list('abc'))
s.reindex(list('bcd'))
b 12.0
c 13.0
d NaN
dtype: float64
实例2:DataFrame
实例1:重新索引行列
实例1.1:重新索引行
index = ['a1', 'a2', 'a3', 'a4', 'a5']
df = pd.DataFrame({ 'A1': [10,11,12,13,14], 'A2': [21, 22, 23, 24, 25]},index=index)
new_index= ['a7', 'a6', 'a4', 'a3','a2']
result1=df.reindex(new_index)
实例1.2:重新索引列
result2=df.reindex(columns=['A1', 'A3'])
result2=df.reindex(['A1', 'A3'], axis="columns")#使用“轴式”关键字参数
实例1.3:重新索引行列 (插值只能按行)
result3=df.reindex(index=['a1','a2','b1'],columns=['A1', 'A3'],fill_value=99)
df.loc[['a1','b1'],['A1','C1']]#报警未定义的标签
# df result1 result2 result3
A1 A2 A1 A2 A1 A3 A1 A3
a1 10 21 a7 NaN NaN a1 10 NaN a1 10 99
a2 11 22 a6 NaN NaN a2 11 NaN a2 11 99
a3 12 23 a4 13.0 24.0 a3 12 NaN b1 99 99
a4 13 24 a3 12.0 23.0 a4 13 NaN
a5 14 25 a2 11.0 22.0 a5 14 NaN
实例2:fill_value填充缺失值
df.reindex(['a7', 'a6', 'a4', 'a3','a2'], fill_value=99)
df.reindex(['a7', 'a6', 'a4', 'a3','a2'], fill_value='NG')
A1 A2 A1 A2
a7 99 99 a7 NG NG
a6 99 99 a6 NG NG
a4 13 24 a4 13 24
a3 12 23 a3 12 23
a2 11 22 a2 11 22
实例3:method-创建单调递增索引
date_index = pd.date_range('1/1/2019', periods=4, freq='D')
df2 = pd.DataFrame({"prices": [100, 101, np.nan, 103]},index=date_index)
date_index2 = pd.date_range('12/31/2018', periods=6, freq='D')
df2.reindex(date_index2)
df2.reindex(date_index2, method='bfill')#原始数据值Nan不填充;索引必单调递增或递减
df2.reindex(date_index2, method='pad')#原始数据值Nan不填充;索引必单调递增或递减
# df result1 result2 result3
prices prices prices prices
2019-01-01 100.0 2018-12-31 NaN 2018-12-31 100.0 2018-12-31 NaN
2019-01-02 101.0 2019-01-01 100.0 2019-01-01 100.0 2019-01-01 100.0
2019-01-03 NaN 2019-01-02 101.0 2019-01-02 101.0 2019-01-02 101.0
2019-01-04 103.0 2019-01-03 NaN 2019-01-03 NaN 2019-01-03 NaN
2019-01-04 103.0 2019-01-04 103.0 2019-01-04 103.0
2019-01-05 NaN 2019-01-05 NaN 2019-01-05 103.0
实例4:reindex的坑
df = pd.DataFrame(np.arange(12).reshape(6, 2), columns=['A', 'B'],index=list('abcdef'))
df.reindex(['b', 'c', 'e']) #应该这样用 等价df.iloc[[1, 2, 4]]
df.reindex([1, 2, 4]) #出现异常值
A B A B
b 2 3 1 NaN NaN
c 4 5 2 NaN NaN
e 8 9 4 NaN NaN