比較 Pandas DataFrame 物件

Suraj Joshi 2021年1月22日
比較 Pandas DataFrame 物件

本教程介紹瞭如何在 Python 中比較 Pandas DataFrame 物件。我們可以使用 == 運算子來比較 DataFrame。

import pandas as pd

data_season1 = {"Player": ["Lewandowski", "Haland", "Ronaldo", "Messi", "Mbappe"],
                "Goals": [10, 8, 6, 5, 4]}

data_season2 = {"Player": ["Lewandowski", "Haland", "Ronaldo", "Messi", "Mbappe"],
                "Goals": [7, 8, 6, 7, 4]}

df_1 = pd.DataFrame(data_season1)
df_2 = pd.DataFrame(data_season2)

print("df_1:")
print(df_1)

print("")

print("df_2:")
print(df_2)

輸出:

df_1:
        Player  Goals
0  Lewandowski     10
1       Haland      8
2      Ronaldo      6
3        Messi      5
4       Mbappe      4

df_2:
        Player  Goals
0  Lewandowski      7
1       Haland      8
2      Ronaldo      6
3        Messi      7
4       Mbappe      4

在本文中,我們將使用 DataFrame df_1df_2 來演示 DataFrame 的比較。

使用 == 運算子比較 Pandas 的 DataFrame 物件

import pandas as pd

data_season1 = {"Player": ["Lewandowski", "Haland", "Ronaldo", "Messi", "Mbappe"],
                "Goals": [10, 8, 6, 5, 4]}

data_season2 = {"Player": ["Lewandowski", "Haland", "Ronaldo", "Messi", "Mbappe"],
                "Goals": [7, 8, 6, 7, 4]}

df_1 = pd.DataFrame(data_season1)
df_2 = pd.DataFrame(data_season2)

print(df_1 == df_2)

輸出:

   Player  Goals
0    True  False
1    True   True
2    True   True
3    True  False
4    True   True

比較 df_1df_2 的對應元素,如果該位置的對應元素相同,則返回 True,否則返回 False

我們可以使用 pandas.DataFrame.all() 方法來知道 df_1df_2 中哪些行是相同的。

import pandas as pd

data_season1 = {"Player": ["Lewandowski", "Haland", "Ronaldo", "Messi", "Mbappe"],
                "Goals": [10, 8, 6, 5, 4]}

data_season2 = {"Player": ["Lewandowski", "Haland", "Ronaldo", "Messi", "Mbappe"],
                "Goals": [7, 8, 6, 7, 4]}

df_1 = pd.DataFrame(data_season1)
df_2 = pd.DataFrame(data_season2)

print((df_1 == df_2).all(axis=1))

輸出:

0    False
1     True
2     True
3    False
4     True
dtype: bool

在輸出中,值為 True 的行與對應的元素值相同。因此,輸出值為 False 的行與對應元素的值不同。

我們可以使用索引來列出所有在 df_1df_2 中值不同的行。

import pandas as pd

data_season1 = {"Player": ["Lewandowski", "Haland", "Ronaldo", "Messi", "Mbappe"],
                "Goals": [10, 8, 6, 5, 4]}

data_season2 = {"Player": ["Lewandowski", "Haland", "Ronaldo", "Messi", "Mbappe"],
                "Goals": [7, 8, 6, 7, 4]}

df_1 = pd.DataFrame(data_season1)
df_2 = pd.DataFrame(data_season2)

print(df_1[(df_1 == df_2).all(axis=1) == False])

輸出:

        Player  Goals
0  Lewandowski     10
3        Messi      5

它列出了 df_1 中所有的行,這些行的值與 df_2 中對應的行的值不同。

如果我們對 df_1df_2 有不同的索引,我們會得到一個錯誤,說 ValueError: Can only compare identically-labeled DataFrame objects

import pandas as pd

data_season1 = {"Player": ["Lewandowski", "Haland", "Ronaldo", "Messi", "Mbappe"],
                "Goals": [10, 8, 6, 5, 4]}

data_season2 = {"Player": ["Lewandowski", "Haland", "Ronaldo", "Messi", "Mbappe"],
                "Goals": [7, 8, 6, 7, 4]}

df_1 = pd.DataFrame(data_season1)
df_2 = pd.DataFrame(data_season2, index=['a', 'b', 'c', 'd', 'e'])

print(df_1 == df_2)

輸出:

Traceback (most recent call last):
...
ValueError: Can only compare identically-labeled DataFrame objects

我們可以使用 pandas.DataFrame.reset_index() 方法來重置索引,以克服上述問題。

import pandas as pd

data_season1 = {"Player": ["Lewandowski", "Haland", "Ronaldo", "Messi", "Mbappe"],
                "Goals": [10, 8, 6, 5, 4]}

data_season2 = {"Player": ["Lewandowski", "Haland", "Ronaldo", "Messi", "Mbappe"],
                "Goals": [7, 8, 6, 7, 4]}

df_1 = pd.DataFrame(data_season1)
df_2 = pd.DataFrame(data_season2, index=['a', 'b', 'c', 'd', 'e'])
df_2.reset_index(drop=True, inplace=True)

print(df_1 == df_2)

輸出:

   Player  Goals
0    True  False
1    True   True
2    True   True
3    True  False
4    True   True

它在比較 df_1df_2 之前重置了 df_2 的索引,這樣兩個 DataFrame 就有了相同的索引,使比較成為可能。

在比較它們之前,還必須確保在 DataFrame 中具有相同數量的行。

Author: Suraj Joshi
Suraj Joshi avatar Suraj Joshi avatar

Suraj Joshi is a backend software engineer at Matrice.ai.

LinkedIn

相關文章 - Pandas DataFrame