Pandas DataFrame DataFrame.merge() 函数
Suraj Joshi
2023年1月30日
2020年11月7日
-
pandas.DataFrame.merge()
语法 -
示例代码:
DataFrame.merge()
函数合并两个DataFrame
-
示例代码:在
merge
方法中设置how
参数,使用各种技术合并DataFrame
-
示例代码:在 Pandas 中使用
DataFrame.merge()
函数只合并特定的列 -
示例代码:使用索引作为连接键来合并
DataFrame
Python Pandas DataFrame.merge()
函数合并 DataFrame
或命名的 Series 对象。
pandas.DataFrame.merge()
语法
DataFrame.merge(right,
how='inner',
on=None,
left_on=None,
right_on=None,
left_index=False,
right_index=False,
sort=False,
suffixes='_x', '_y',
copy=True,
indicator=False,
validate=None)
参数
right |
DataFrame 或命名的 Series 。要合并的对象 |
how |
left , right ,inner 或 outer 。如何进行合并操作 |
on |
标签或列表。要合并的列或索引名称 |
left_on |
标签或列表。要合并在左侧 DataFrame 中的列名或索引名。 |
right_on |
标签或列表。列名或索引名要合并到右边的 DataFrame 中。 |
left_index |
布尔型。使用左侧 DataFrame 的索引作为连接键(left_index=True ) |
right_index |
布尔型。使用右侧 DataFrame 的索引作为连接键(right_index=True ) |
sort |
布尔型。在输出中按字母顺序对连接键进行排序(sort=True ) |
suffixes |
后缀分别应用于左侧和右侧重叠的列名 |
copy |
布尔型。避免复制 copy=False 。 |
indicator |
在输出的 DataFrame 中增加一列名为 _merge 的列,其中包含每行的来源信息(indicator=True ),并在输出的 DataFrame 中增加一列名为 string 的列(indicator=string ) |
validate |
检查合并是否为指定类型的参数 |
返回值
它返回一个合并给定对象的 DataFrame
。
示例代码:DataFrame.merge()
函数合并两个 DataFrame
import pandas as pd
df1 = pd.DataFrame({"Name":
["Suraj","Zeppy","Alish","Sarah"],
'Working Hours':
[1, 2, 3, 5]})
df2 = pd.DataFrame({"Name":
["Suraj","Zack","Alish","Raphel"],
'Pay': [5, 6, 7, 8]})
print("1st DataFrame:")
print(df1)
print("2nd DataFrame:")
print(df2)
merged_df=df1.merge(df2)
print("Merged DataFrame:")
print(merged_df)
输出:
1st DataFrame:
Name Working Hours
0 Suraj 1
1 Zeppy 2
2 Alish 3
3 Sarah 5
2nd DataFrame:
Name Pay
0 Suraj 5
1 Zack 6
2 Alish 7
3 Raphel 8
Merged DataFrame:
Name Working Hours Pay
0 Suraj 1 5
1 Alish 3 7
它使用 SQL
的内连接技术将 df1
和 df2
合并为一个 DataFrame
。
对于 inner-join
方法,我们必须确保两个 DataFrame
至少有一列是共同的。
在这里,merge()
函数将把具有相同值的公共列的行连接到两个 DataFrame
。
示例代码:在 merge
方法中设置 how
参数,使用各种技术合并 DataFrame
import pandas as pd
df1 = pd.DataFrame({"Name":
["Suraj","Zeppy","Alish","Sarah"],
'Working Hours':
[1, 2, 3, 5]})
df2 = pd.DataFrame({"Name":
["Suraj","Zack","Alish","Raphel"],
'Pay': [5, 6, 7, 8]})
print("1st DataFrame:")
print(df1)
print("2nd DataFrame:")
print(df2)
merged_df=df1.merge(df2,how='right')
print("Merged DataFrame:")
print(merged_df)
输出:
1st DataFrame:
Name Working Hours
0 Suraj 1
1 Zeppy 2
2 Alish 3
3 Sarah 5
2nd DataFrame:
Name Pay
0 Suraj 5
1 Zack 6
2 Alish 7
3 Raphel 8
Merged DataFrame:
Name Working Hours Pay
0 Suraj 1.0 5
1 Alish 3.0 7
2 Zack NaN 6
3 Raphel NaN 8
它使用 SQL
的 right-join
技术将 df1
和 df2
合并为一个 DataFrame
。
在这里,merge()
函数从右边的 DataFrame
返回所有的行。然而,只存在于左侧 DataFrame
中的行将得到 NaN
值。
同样,我们也可以使用 how
参数的 left
和 outer
值。
示例代码:在 Pandas 中使用 DataFrame.merge()
函数只合并特定的列
import pandas as pd
df1 = pd.DataFrame({"Name":
["Suraj","Zeppy","Alish","Sarah"],
'Working Hours':
[1, 2, 3, 5],
"Position":["Salesman","CEO","Manager","Sales Head"]})
df2 = pd.DataFrame({"Name":
["Suraj","Zack","Alish","Raphel"],
"Pay": [5, 6, 7, 8],
"Position":["Salesman","CEO","Manager","Sales Head"]})
print("1st DataFrame:")
print(df1)
print("2nd DataFrame:")
print(df2)
merged_df=df1.merge(df2,on="Name")
print("Merged DataFrame:")
print(merged_df)
输出:
1st DataFrame:
Name Working Hours Position
0 Suraj 1 Salesman
1 Zeppy 2 CEO
2 Alish 3 Manager
3 Sarah 5 Sales Head
2nd DataFrame:
Name Pay Position
0 Suraj 5 Salesman
1 Zack 6 CEO
2 Alish 7 Manager
3 Raphel 8 Sales Head
Merged DataFrame:
Name Working Hours Position_x Pay Position_y
0 Suraj 1 Salesman 5 Salesman
1 Alish 3 Manager 7 Manager
它只合并 df1
和 df2
的 Name
列。由于默认的连接方法是内连接,因此只有两个 DataFrame
的共同行才会被连接。position
列是两个 DataFrame
共同的,因此有两个位置列,即 Position_x
和 Position_y
。
默认情况下,_x
和 _y
后缀被附加到重叠列的名称中。我们可以使用 suffixes
参数指定后缀。
df1 = pd.DataFrame({"Name":
["Suraj","Zeppy","Alish","Sarah"],
'Working Hours':
[1, 2, 3, 5],
"Position":["Salesman","CEO","Manager","Sales Head"]})
df2 = pd.DataFrame({"Name":
["Suraj","Zack","Alish","Raphel"],
"Pay": [5, 6, 7, 8],
"Position":["Salesman","CEO","Manager","Sales Head"]})
print("1st DataFrame:")
print(df1)
print("2nd DataFrame:")
print(df2)
merged_df=df1.merge(df2,on="Name",suffixes=('_left', '_right'))
print("Merged DataFrame:")
print(merged_df)
输出:
1st DataFrame:
Name Working Hours Position
0 Suraj 1 Salesman
1 Zeppy 2 CEO
2 Alish 3 Manager
3 Sarah 5 Sales Head
2nd DataFrame:
Name Pay Position
0 Suraj 5 Salesman
1 Zack 6 CEO
2 Alish 7 Manager
3 Raphel 8 Sales Head
Merged DataFrame:
Name Working Hours Position_left Pay Position_right
0 Suraj 1 Salesman 5 Salesman
1 Alish 3 Manager 7 Manager
示例代码:使用索引作为连接键来合并 DataFrame
import pandas as pd
df1 = pd.DataFrame({"Name":
["Suraj","Zeppy","Alish","Sarah"],
'Working Hours':
[1, 2, 3, 5],
"Position":["Salesman","CEO","Manager","Sales Head"]})
df2 = pd.DataFrame({"Name":
["Suraj","Zack","Alish","Raphel"],
"Pay": [5, 6, 7, 8],
"Position":["Salesman","CEO","Manager","Sales Head"]})
print("1st DataFrame:")
print(df1)
print("2nd DataFrame:")
print(df2)
merged_df=df1.merge(df2,left_index=True,right_index=True,suffixes=("_left","_right"))
print("Merged DataFrame:")
print(merged_df)
输出:
1st DataFrame:
Name Working Hours Position
0 Suraj 1 Salesman
1 Zeppy 2 CEO
2 Alish 3 Manager
3 Sarah 5 Sales Head
2nd DataFrame:
Name Pay Position
0 Suraj 5 Salesman
1 Zack 6 CEO
2 Alish 7 Manager
3 Raphel 8 Sales Head
Merged DataFrame:
Name_left Working Hours Position_left Name_right Pay Position_right
0 Suraj 1 Salesman Suraj 5 Salesman
1 Zeppy 2 CEO Zack 6 CEO
2 Alish 3 Manager Alish 7 Manager
3 Sarah 5 Sales Head Raphel 8 Sales Head
它合并两个 DataFrame
的相应行,不考虑列的相似性。如果两个 DataFrame
上出现相同的列名,则在合并后将后缀附加到列名上,使之成为不同的列。
Author: Suraj Joshi
Suraj Joshi is a backend software engineer at Matrice.ai.
LinkedIn