在 Pandas join 方法中指定后缀
本教程解释了如何在 Pandas 中使用 DataFrame.join()
方法加入两个 DataFrame,并在加入时指定后缀。
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
student_df = pd.DataFrame({
'Name': ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
'Gender': ["Female", "Male", "Male", "Female", "Female", "Male"],
'Age': [17, 18, 17, 16, 18, 16]
})
grades_df = pd.DataFrame({
"Roll No": [501, 502, 503, 504, 505, 506],
"Grades": ["A", "B+", "A-", "A", "B", "A+"]
}
)
print("Student DataFrame:")
print(student_df, "\n")
print("Grades DataFrame:")
print(grades_df)
输出:
Student DataFrame:
Name Gender Age
0 Jennifer Female 17
1 Travis Male 18
2 Bob Male 17
3 Emma Female 16
4 Luna Female 18
5 Anish Male 16
Grades DataFrame:
Roll No Grades
0 501 A
1 502 B+
2 503 A-
3 504 A
4 505 B
5 506 A+
我们将通过演示连接 students_df
和 grades_df
两个 DataFrame 来解释 DataFrame.join()
方法。
使用 DataFrame.join()
方法连接两个 DataFrame
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
student_df = pd.DataFrame({
'Name': ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
'Gender': ["Female", "Male", "Male", "Female", "Female", "Male"],
'Age': [17, 18, 17, 16, 18, 16]
})
grades_df = pd.DataFrame({
"Roll No": [501, 502, 503, 504, 505, 506],
"Grades": ["A", "B+", "A-", "A", "B", "A+"]
}
)
joined_df = student_df.join(grades_df)
print("Student DataFrame:")
print(student_df, "\n")
print("Grades DataFrame:")
print(grades_df, "\n")
print("Joined DataFrame:")
print(joined_df, "\n")
输出:
Student DataFrame:
Name Gender Age
0 Jennifer Female 17
1 Travis Male 18
2 Bob Male 17
3 Emma Female 16
4 Luna Female 18
5 Anish Male 16
Grades DataFrame:
Roll No Grades
0 501 A
1 502 B+
2 503 A-
3 504 A
4 505 B
5 506 A+
Joined DataFrame:
Name Gender Age Roll No Grades
0 Jennifer Female 17 501 A
1 Travis Male 18 502 B+
2 Bob Male 17 503 A-
3 Emma Female 16 504 A
4 Luna Female 18 505 B
5 Anish Male 16 506 A+
它将 student_df
和 grades_df
连接起来,并创建 joined_df
。默认情况下,join()
方法使用两个 DataFrame 的索引来连接它们。加入方法默认为 Left Join
。在这里,左边 DataFrame 中的所有行,即 student_df
被保存在 joined_df
中,而右边 DataFrame 中的一行与左边 DataFrame 中的一行具有相同的索引值,被加入并放置在同一行中。
使用 DataFrame.join()
方法连接具有共同列名的 DataFrame
如果我们使用 DataFrame.join()
方法试图连接的两个 DataFrames 中都有一个名称相同的列,我们会得到一个错误信息 ValueError: columns overlap but no suffix specified
。我们可以在 DataFrame.join()
方法中设置 lsuffix
和 rsuffix
参数的值来解决这个错误。
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
student_df = pd.DataFrame({
"Roll No": [501, 502, 503, 504, 505, 506],
'Name': ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
'Gender': ["Female", "Male", "Male", "Female", "Female", "Male"],
'Age': [17, 18, 17, 16, 18, 16]
})
grades_df = pd.DataFrame({
"Roll No": [501, 502, 503, 504, 505, 506],
"Grades": ["A", "B+", "A-", "A", "B", "A+"]
}
)
joined_df = student_df.join(grades_df, lsuffix="_left", rsuffix="_right")
print("Student DataFrame:")
print(student_df, "\n")
print("Grades DataFrame:")
print(grades_df, "\n")
print("Joined DataFrame:")
print(joined_df, "\n")
输出:
Student DataFrame:
Roll No Name Gender Age
0 501 Jennifer Female 17
1 502 Travis Male 18
2 503 Bob Male 17
3 504 Emma Female 16
4 505 Luna Female 18
5 506 Anish Male 16
Grades DataFrame:
Roll No Grades
0 501 A
1 502 B+
2 503 A-
3 504 A
4 505 B
5 506 A+
Joined DataFrame:
Roll No_left Name Gender Age Roll No_right Grades
0 501 Jennifer Female 17 501 A
1 502 Travis Male 18 502 B+
2 503 Bob Male 17 503 A-
3 504 Emma Female 16 504 A
4 505 Luna Female 18 505 B
5 506 Anish Male 16 506 A+
它将 grades_df
加入到 student_df
的右边。DataFrame.join()
并不能合并各个 DataFrame,也就是说,即使 Roll No
列是两个 DataFrame 共同的,它们也会在 join()
之后被作为单独的字段放置。为了区分具有共同名称的列名,我们使用 lsuffix
和 rsuffix
参数为左右两个 DataFrame 中的列提供后缀。
另外,我们也可以通过 DataFrame.merge()
方法,将常用的列名作为 on
参数传入该方法中来解决这个问题。
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
student_df = pd.DataFrame({
"Roll No": [501, 502, 503, 504, 505, 506],
'Name': ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
'Gender': ["Female", "Male", "Male", "Female", "Female", "Male"],
'Age': [17, 18, 17, 16, 18, 16]
})
grades_df = pd.DataFrame({
"Roll No": [501, 502, 503, 504, 505, 506],
"Grades": ["A", "B+", "A-", "A", "B", "A+"]
}
)
merged_df = student_df.merge(grades_df, on="Roll No")
print("Student DataFrame:")
print(student_df, "\n")
print("Grades DataFrame:")
print(grades_df, "\n")
print("Merged DataFrame:")
print(merged_df, "\n")
输出:
Student DataFrame:
Roll No Name Gender Age
0 501 Jennifer Female 17
1 502 Travis Male 18
2 503 Bob Male 17
3 504 Emma Female 16
4 505 Luna Female 18
5 506 Anish Male 16
Grades DataFrame:
Roll No Grades
0 501 A
1 502 B+
2 503 A-
3 504 A
4 505 B
5 506 A+
Merged DataFrame:
Roll No Name Gender Age Grades
0 501 Jennifer Female 17 A
1 502 Travis Male 18 B+
2 503 Bob Male 17 A-
3 504 Emma Female 16 A
4 505 Luna Female 18 B
5 506 Anish Male 16 A+
它将 DataFrames student_df
和 grades_df
合并成一个 DataFrame。在这种情况下,Roll No
列将被合并为两个 DataFrame 的单一列。
Suraj Joshi is a backend software engineer at Matrice.ai.
LinkedIn