在 Pandas join 方法中指定字尾
本教程解釋瞭如何在 Pandas 中使用 DataFrame.join()
方法加入兩個 DataFrame,並在加入時指定字尾。
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
student_df = pd.DataFrame({
'Name': ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
'Gender': ["Female", "Male", "Male", "Female", "Female", "Male"],
'Age': [17, 18, 17, 16, 18, 16]
})
grades_df = pd.DataFrame({
"Roll No": [501, 502, 503, 504, 505, 506],
"Grades": ["A", "B+", "A-", "A", "B", "A+"]
}
)
print("Student DataFrame:")
print(student_df, "\n")
print("Grades DataFrame:")
print(grades_df)
輸出:
Student DataFrame:
Name Gender Age
0 Jennifer Female 17
1 Travis Male 18
2 Bob Male 17
3 Emma Female 16
4 Luna Female 18
5 Anish Male 16
Grades DataFrame:
Roll No Grades
0 501 A
1 502 B+
2 503 A-
3 504 A
4 505 B
5 506 A+
我們將通過演示連線 students_df
和 grades_df
兩個 DataFrame 來解釋 DataFrame.join()
方法。
使用 DataFrame.join()
方法連線兩個 DataFrame
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
student_df = pd.DataFrame({
'Name': ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
'Gender': ["Female", "Male", "Male", "Female", "Female", "Male"],
'Age': [17, 18, 17, 16, 18, 16]
})
grades_df = pd.DataFrame({
"Roll No": [501, 502, 503, 504, 505, 506],
"Grades": ["A", "B+", "A-", "A", "B", "A+"]
}
)
joined_df = student_df.join(grades_df)
print("Student DataFrame:")
print(student_df, "\n")
print("Grades DataFrame:")
print(grades_df, "\n")
print("Joined DataFrame:")
print(joined_df, "\n")
輸出:
Student DataFrame:
Name Gender Age
0 Jennifer Female 17
1 Travis Male 18
2 Bob Male 17
3 Emma Female 16
4 Luna Female 18
5 Anish Male 16
Grades DataFrame:
Roll No Grades
0 501 A
1 502 B+
2 503 A-
3 504 A
4 505 B
5 506 A+
Joined DataFrame:
Name Gender Age Roll No Grades
0 Jennifer Female 17 501 A
1 Travis Male 18 502 B+
2 Bob Male 17 503 A-
3 Emma Female 16 504 A
4 Luna Female 18 505 B
5 Anish Male 16 506 A+
它將 student_df
和 grades_df
連線起來,並建立 joined_df
。預設情況下,join()
方法使用兩個 DataFrame 的索引來連線它們。加入方法預設為 Left Join
。在這裡,左邊 DataFrame 中的所有行,即 student_df
被儲存在 joined_df
中,而右邊 DataFrame 中的一行與左邊 DataFrame 中的一行具有相同的索引值,被加入並放置在同一行中。
使用 DataFrame.join()
方法連線具有共同列名的 DataFrame
如果我們使用 DataFrame.join()
方法試圖連線的兩個 DataFrames 中都有一個名稱相同的列,我們會得到一個錯誤資訊 ValueError: columns overlap but no suffix specified
。我們可以在 DataFrame.join()
方法中設定 lsuffix
和 rsuffix
引數的值來解決這個錯誤。
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
student_df = pd.DataFrame({
"Roll No": [501, 502, 503, 504, 505, 506],
'Name': ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
'Gender': ["Female", "Male", "Male", "Female", "Female", "Male"],
'Age': [17, 18, 17, 16, 18, 16]
})
grades_df = pd.DataFrame({
"Roll No": [501, 502, 503, 504, 505, 506],
"Grades": ["A", "B+", "A-", "A", "B", "A+"]
}
)
joined_df = student_df.join(grades_df, lsuffix="_left", rsuffix="_right")
print("Student DataFrame:")
print(student_df, "\n")
print("Grades DataFrame:")
print(grades_df, "\n")
print("Joined DataFrame:")
print(joined_df, "\n")
輸出:
Student DataFrame:
Roll No Name Gender Age
0 501 Jennifer Female 17
1 502 Travis Male 18
2 503 Bob Male 17
3 504 Emma Female 16
4 505 Luna Female 18
5 506 Anish Male 16
Grades DataFrame:
Roll No Grades
0 501 A
1 502 B+
2 503 A-
3 504 A
4 505 B
5 506 A+
Joined DataFrame:
Roll No_left Name Gender Age Roll No_right Grades
0 501 Jennifer Female 17 501 A
1 502 Travis Male 18 502 B+
2 503 Bob Male 17 503 A-
3 504 Emma Female 16 504 A
4 505 Luna Female 18 505 B
5 506 Anish Male 16 506 A+
它將 grades_df
加入到 student_df
的右邊。DataFrame.join()
並不能合併各個 DataFrame,也就是說,即使 Roll No
列是兩個 DataFrame 共同的,它們也會在 join()
之後被作為單獨的欄位放置。為了區分具有共同名稱的列名,我們使用 lsuffix
和 rsuffix
引數為左右兩個 DataFrame 中的列提供字尾。
另外,我們也可以通過 DataFrame.merge()
方法,將常用的列名作為 on
引數傳入該方法中來解決這個問題。
import pandas as pd
roll_no = [501, 502, 503, 504, 505]
student_df = pd.DataFrame({
"Roll No": [501, 502, 503, 504, 505, 506],
'Name': ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
'Gender': ["Female", "Male", "Male", "Female", "Female", "Male"],
'Age': [17, 18, 17, 16, 18, 16]
})
grades_df = pd.DataFrame({
"Roll No": [501, 502, 503, 504, 505, 506],
"Grades": ["A", "B+", "A-", "A", "B", "A+"]
}
)
merged_df = student_df.merge(grades_df, on="Roll No")
print("Student DataFrame:")
print(student_df, "\n")
print("Grades DataFrame:")
print(grades_df, "\n")
print("Merged DataFrame:")
print(merged_df, "\n")
輸出:
Student DataFrame:
Roll No Name Gender Age
0 501 Jennifer Female 17
1 502 Travis Male 18
2 503 Bob Male 17
3 504 Emma Female 16
4 505 Luna Female 18
5 506 Anish Male 16
Grades DataFrame:
Roll No Grades
0 501 A
1 502 B+
2 503 A-
3 504 A
4 505 B
5 506 A+
Merged DataFrame:
Roll No Name Gender Age Grades
0 501 Jennifer Female 17 A
1 502 Travis Male 18 B+
2 503 Bob Male 17 A-
3 504 Emma Female 16 A
4 505 Luna Female 18 B
5 506 Anish Male 16 A+
它將 DataFrames student_df
和 grades_df
合併成一個 DataFrame。在這種情況下,Roll No
列將被合併為兩個 DataFrame 的單一列。
Suraj Joshi is a backend software engineer at Matrice.ai.
LinkedIn