- Merge
首先merge的操作非常类似sql里面的join,实现将两个Dataframe根据一些共有的列连接起来,当然,在实际场景中,这些共有列一般是Id,可以选择inner(默认),left,right,outer 这几种模式,分别对应的是内连接,左连接,右连接
import numpy as np
import pandas as pd
dframe1 = pd.DataFrame({'key':['X','Z','Y','Z','X','X'],'value_df1': np.arange(6)})
print(dframe1)
dframe2 = pd.DataFrame({'key':['Q','Y','Z'],'value_df2':[1,2,3]})
print(dframe2)
- InnerMerge (内连接)
pd.merge(dframe1,dframe2,on='key',how='inner') #how值默认inner
- LeftMerge (左连接)
pd.merge(dframe1,dframe2,on='key',how='left')
- RightMerge (右连接)
pd.merge(dframe1,dframe2,on='key',how='right')
- OuterMerge (全连接)
pd.merge(dframe1,dframe2,on='key',how='outer')
- MultipleKey Merge (基于多个key上的merge)
df_left = pd.DataFrame({'key1': ['SF', 'SF', 'LA'],
'key2': ['one', 'two', 'one'],
'left_data': [10,20,30]})
df_right = pd.DataFrame({'key1': ['SF', 'SF', 'LA', 'LA'],
'key2': ['one', 'one', 'one', 'two'],
'right_data': [40,50,60,70]})
print(df_left)
print(df_right)
pd.merge(df_left,df_right,on='key1')
- Merge on Index (基于index上的merge)
df_left = pd.DataFrame({'key': ['X','Y','Z','X','Y'],
'data': range(5)})
df_right = pd.DataFrame({'group_data': [10, 20]}, index=['X', 'Y'])
print(df_left)
print(df_right)
pd.merge(df_left,df_right,left_on='key',right_index=True,how='outer')
2、join
left = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']},
index = ['K0', 'K1', 'K2', 'K3'])
right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']},
index = ['K0', 'K1', 'K2', 'K3'])
print(left)
print(right)
left.join(right)
3、Concat
pd.concat(['df1','df2'],axis = 1/0)