首页 > 编程笔记 > Python笔记 阅读:6

Pandas merge()函数的用法(附带实例)

Pandas 是具有功能全面的高性能内存中的连接操作的,与 SQL 等关系数据库非常相似。

Pandas 提供了一个单独的 merge() 函数,用作 DataFrame 对象之间所有标准数据库连接操作的入口。函数的格式为:
pd.merge(left,right,how='inner',on=None,left_on=None,right_on=None,left_index=False,right_index=False,sort=True)
其中:
【实例】创建两个不同的 DataFrame 并对其执行合并操作。
import pandas as pd
left = pd.DataFrame({
    'id':[1,2,3,4,5],
    'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
    'subject_id':['sub1','sub2','sub4','sub6','sub5']})
right = pd.DataFrame({
    'id':[1,2,3,4,5],
    'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
    'subject_id':['sub2','sub4','sub3','sub6','sub5']})
print (left)
print("----------------------------")
print (right)
运行程序,输出如下:
   id   Name subject_id
0   1   Alex       sub1
1   2    Amy       sub2
2   3  Allen       sub4
3   4  Alice       sub6
4   5  Ayoung       sub5
----------------------------
   id      Name subject_id
0   1    Billy       sub2
1   2    Brian       sub4
2   3     Bran       sub3
3   4    Bryce       sub6
4   5    Betty       sub5

还可以在一个键上合并两个数据帧,例如:
import pandas as pd
left = pd.DataFrame({
    'id':[1,3,5,7,9],
    'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
    'subject_id':['sub1','sub2','sub4','sub6','sub5']})
right = pd.DataFrame({
    'id':[1,3,5,7,9],
    'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
    'subject_id':['sub2','sub4','sub3','sub6','sub5']})
rs = pd.merge(left,right,on='id')
print(rs)
运行程序,输出如下:
   id Name_x  subject_id_x Name_y subject_id_y
0   1  Alex   sub1         Billy  sub2
1   3  Amy    sub2         Brian  sub4
2   5  Allen  sub4         Bran   sub3
3   7  Alice  sub6         Bryce  sub6
4   9  Ayoung sub5         Betty  sub5

此外,还可以合并多个键上的两个数据帧,例如:
import pandas as pd
left = pd.DataFrame({
    'id':[1,3,5,7,9],
    'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
    'subject_id':['sub1','sub2','sub4','sub6','sub5']})
right = pd.DataFrame({
    'id':[1,3,5,7,9],
    'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
    'subject_id':['sub2','sub4','sub3','sub6','sub5']})
rs = pd.merge(left,right,on=['id','subject_id'])
print(rs)
运行程序,输出如下:
   id Name_x subject_id Name_y
0   7 Alice  sub6       Bryce
1   9 Ayoung sub5       Betty
在 merge() 函数中,参数 how 的取值不同,会得到不同的合并效果。如果组合键没有出现在左侧或右侧表中,则连接表中的值将为 NA。

【实例】使用 how 参数合并数据。
import pandas as pd
left = pd.DataFrame({
    'id':[1,3,5,7,9],
    'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
    'subject_id':['sub1','sub2','sub4','sub6','sub5']})
right = pd.DataFrame({
    'id':[1,3,5,7,9],
    'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
    'subject_id':['sub2','sub4','sub3','sub6','sub5']})
rs = pd.merge(left, right, on='subject_id', how='left')
print("how='left'连接结果:")
print(rs)
rs = pd.merge(left, right, on='subject_id', how='right')
print("how='right'连接结果:")
print(rs)
rs = pd.merge(left, right, how='outer', on='subject_id')
print("how='outer'连接结果:")
print(rs)
# 连接将在索引上进行
rs = pd.merge(left, right, on='subject_id', how='inner')
print("how='inner'连接结果:")
print(rs)
运行程序,输出如下:
how='left'连接结果:
   id_x    Name_x     subject_id  id_y   Name_y
0     1    Alex       sub1        NaN    NaN
1     3    Amy        sub2        1.0    Billy
2     5    Allen      sub4        3.0    Brian
3     7    Alice      sub6        7.0    Bryce
4     9    Ayoung     sub5        9.0    Betty
how='right'连接结果:
   id_x    Name_x     subject_id  id_y   Name_y
0   3.0    Amy        sub2        1.0    Billy
1   5.0    Allen      sub4        3.0    Brian
2   7.0    Alice      sub6        7.0    Bryce
3   9.0    Ayoung     sub5        9.0    Betty
4   NaN    NaN        sub3        5.0    Bran
how='outer'连接结果:
   id_x    Name_x     subject_id  id_y   Name_y
0   1.0    Alex       sub1        NaN    NaN
1   3.0    Amy        sub2        1.0    Billy
2   5.0    Allen      sub4        3.0    Brian
3   7.0    Alice      sub6        7.0    Bryce
4   9.0    Ayoung     sub5        9.0    Betty
5   NaN    NaN        sub3        5.0    Bran
how='inner'连接结果:
   id_x    Name_x     subject_id  id_y   Name_y
0     3    Amy        sub2        1.0    Billy
1     5    Allen      sub4        3.0    Brian
2     7    Alice      sub6        7.0    Bryce
3     9    Ayoung     sub5        9.0    Betty

相关文章