pandas重塑层次化索引

在处理数据时，我们有时需要对数据的结构进行重排，也可称作是重塑(reshape)或者轴向旋转(pivot)。

层次化索引为Dataframe的数据重排提供了良好的一致性的方式。功能有二：

stack：将数据的列旋转为行
unstack：将数据的行旋转为列

看几个简单的例子解释一下：

  
In [15]: data = pd.DataFrame(np.arange(6).reshape((2, 3)),
    ...: index=pd.Index(['Oh', 'Co'], name='state'),columns=pd.Index(['one', 'two', 'three'], name='number'))

In [16]: data
Out[16]: 
number  one  two  three
state
Oh        0    1      2
Co        3    4      5

In [17]: res = data.stack()  # 列索引旋转为行索引   得到一个Series
In [18]: res
Out[18]: 
state  number
Oh     one       0
       two       1
       three     2
Co     one       3
       two       4
       three     5
dtype: int32

In [20]: res.unstack()  # 和上面相反的操作  行索引旋转为列索引
Out[20]: 
number  one  two  three
state
Oh        0    1      2
Co        3    4      5

In [21]: data
Out[21]: 
number  one  two  three
state
Oh        0    1      2
Co        3    4      5

In [24]: res1 = data.unstack()  # 得到一个Series  

In [25]: res1
Out[25]: 
number  state
one     Oh       0
        Co       3
two     Oh       1
        Co       4
three   Oh       2
        Co       5
dtype: int32
In [41]: res1.stack()
---------------------------------------------------------------
AttributeError                Traceback (most recent call last)
<ipython-input-41-d2140643737a> in <module>()
----> 1 res1.stack()

D:\projects\env\Lib\site-packages\pandas\core\generic.py in __g
etattr__(self, name)
   4370             if self._info_axis._can_hold_identifiers_an
d_holds_name(name):
   4371                 return self[name]
-> 4372             return object.__getattribute__(self, name)
   4373
   4374     def __setattr__(self, name, value):

AttributeError: 'Series' object has no attribute 'stack'
In [27]: res1.unstack()
Out[27]: 
state   Oh  Co
number
one      0   3
two      1   4
three    2   5

上面几个例子可多看出，对于DataFrame，无论是unstack，还是stack，都会得到一个Series对象。

而Series对象，只有unstack方法。

unstack 和 stack 默认都是对最内层的操作，可以手动指定分层级别的编号或者名称对其他级别进行操作：

In [63]: res.unstack(level=0)
Out[63]: 
state   Oh  Co
number
one      0   3
two      1   4
three    2   5

In [64]: res.unstack(level='state')
Out[64]: 
state   Oh  Co
number
one      0   3
two      1   4
three    2   5

在旋转时若出现缺失值的情况，传入 dropna=False 即可消除这种情况。

轴向旋转

pivot(index,columns,values)：将index指定为行索引，columns是列索引，values则是DataFrame中的值

In [77]: df = pd.DataFrame({'book':['java','java','R','R','py'
    ...: ,'py'],'info':['P','Q','P','Q','P','Q'],'val':[46,33,
    ...: 50,44,66,55]})
In [78]: df
Out[78]: 
   book info  val
0  java    P   46
1  java    Q   33
2     R    P   50
3     R    Q   44
4    py    P   66
5    py    Q   55
In [79]: df.pivot('book','info')    # book 作为行索引， info 作为列索引
Out[79]: 
     val
info   P   Q
book
R     50  44
java  46  33
py    66  55

pivot可以用set_index和unstack等价的实现

In [84]: df.set_index(['book','info']).unstack()
Out[84]: 
     val
info   P   Q
book
R     50  44
java  46  33
py    66  55

pandas重塑层次化索引

轴向旋转

猜你喜欢