Python pandas模块之Dataframe操作汇集

前言：
在学习过程，不断地接触到dataframe，而数据框也确实是非常好用的。故在此总结一下我遇到问题查的的资料。如果有没说到的望补充。

创建dataframe：
创建dataframe的数据集可以是列表，数组和字典

>>> df = pd.DataFrame([1, 2, 3, 4], columns=['one'], index=['a','b','c','d'])
>>> df
   one
a    1
b    2
c    3
d    4

>>> df = pd.DataFrame(np.array([[1,2,3,4],[5,6,7,8]]), columns=['one','two','three','four'])
>>> df
   one  two  three  four
0    1    2      3     4
1    5    6      7     8

>>> df = pd.DataFrame({'one':[1,2],'two':[3,4]},index=['a','b'])
>>> df
   one  two
a    1    3
b    2    4

查看选定特定数据：
1.head(num)查看前几行，tail(num)查看后几行

>>> df = pd.DataFrame([[1,2,3,4],[5,6,7,8],[11,22,33,44],[55,66,77,88]], columns=['one','two','three','four'])
>>> df
   one  two  three  four
0    1    2      3     4
1    5    6      7     8
2   11   22     33    44
3   55   66     77    88
>>> df.head(2)
   one  two  three  four
0    1    2      3     4
1    5    6      7     8
>>> df.tail(3)
   one  two  three  four
1    5    6      7     8
2   11   22     33    44
3   55   66     77    88

2.取最后一列，取最后几列

>>> df[df.columns[-1]]#取最后一列
0     4
1     8
2    44
3    88
Name: four, dtype: int64

>>> df.iloc[:,-1]#取最后一列
0     4
1     8
2    44
3    88
Name: four, dtype: int64

>>> df.iloc[:,-3:-1]#取-3：-1列
   two  three
0    2      3
1    6      7
2   22     33
3   66     77

3.df.values查看全部数据（值），返回数组

>>> df.values
array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [11, 22, 33, 44],
       [55, 66, 77, 88]], dtype=int64)

4.取行或列

>>> df[1:3]#取行
   one  two  three  four
1    5    6      7     8
2   11   22     33    44
>>> df.ix[1:3]#取行
   one  two  three  four
1    5    6      7     8
2   11   22     33    44
3   55   66     77    88

>>> df.one#知道标签取某一列
0     1
1     5
2    11
3    55
Name: one, dtype: int64

5.知道标签的情况下：

a.loc[‘one’]则会默认表示选取行为’one’的行；

a.loc[:,[‘a’,’b’] ] 表示选取所有的行以及columns为a,b的列；

a.loc[[‘one’,’two’],[‘a’,’b’]] 表示选取’one’和’two’这两行以及columns为a,b的列；

a.loc[‘one’,’a’]与a.loc[[‘one’],[‘a’]]作用是一样的，不过前者只显示对应的值，而后者会显示对应的行和列标签。

6.知道位置的情况下：

a.iloc[1:2,1:2] 则会显示第一行第一列的数据;(切片后面的值取不到)

a.iloc[1:2] 即后面表示列的值没有时，默认选取行位置为1的数据;

a.iloc[[0,2],[1,2]] 即可以自由选取行位置，和列位置对应的数据。

7.使用条件查找

>>> df[df.two>20]      #‘two’列中值大于20的所在行
   one  two  three  four
2   11   22     33    44
3   55   66     77    88

df1[df1['two'].isin([6])]     #使用isin()选出特定列中包含特定值的行
   one  two  three  four
1    5    6      7     8

>>> df[df>6]        #直接选择df中所有大于6的数据
    one   two  three  four
0   NaN   NaN    NaN   NaN
1   NaN   NaN    7.0   8.0
2  11.0  22.0   33.0  44.0
3  55.0  66.0   77.0  88.0

资料：
dataframe删除行或列
 计算，行列扩充，合并
 较全面

Python pandas模块之Dataframe操作汇集

猜你喜欢