Pandas DataFrame 的唯一值計數

Suraj Joshi 2023年1月30日 2021年1月22日
  1. 使用 Series.value_counts() 計算 DataFrame 中的唯一值
  2. 使用 DataFrame.nunique() 計算 DataFrame 中的唯一值
Pandas DataFrame 的唯一值計數

本教程解釋瞭如何使用 Series.value_counts()DataFrame.nunique() 方法獲得 DataFrame 中所有唯一值的計數。

import pandas as pd

patients_df = pd.DataFrame({
    'Name': ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
    'Date':  ["2020-12-01", "2020-12-01", "2020-12-02", "2020-12-02", "2020-12-02", "2020-12-03"],
    'Age': [17, 18, 17, 16, 18, 16]
})

print(patients_df)

輸出:

       Name        Date  Age
0  Jennifer  2020-12-01   17
1    Travis  2020-12-01   18
2       Bob  2020-12-02   17
3      Emma  2020-12-02   16
4      Luna  2020-12-02   18
5     Anish  2020-12-03   16 

我們將使用 DataFrame patients_df,其中包含患者的姓名、預約日期和年齡,來解釋如何獲得 DataFrame 中所有唯一值的計數。

使用 Series.value_counts() 計算 DataFrame 中的唯一值

import pandas as pd

patients_df = pd.DataFrame({
    'Name': ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
    'Date':  ["2020-12-01", "2020-12-01", "2020-12-02", "2020-12-02", "2020-12-02", "2020-12-03"],
    'Age': [17, 18, 17, 16, 18, 16]
})

print("The DataFrame is:")
print(patients_df, "\n")

print("No of appointments for each date:")
print(patients_df["Date"].value_counts())

輸出:

The DataFrame is:
       Name        Date  Age
0  Jennifer  2020-12-01   17
1    Travis  2020-12-01   18
2       Bob  2020-12-02   17
3      Emma  2020-12-02   16
4      Luna  2020-12-02   18
5     Anish  2020-12-03   16 

No of appointments for each date:
2020-12-02    3
2020-12-01    2
2020-12-03    1
Name: Date, dtype: int64

它顯示 DataFrame 中 Date 列的每個唯一值的計數。

使用 DataFrame.nunique() 計算 DataFrame 中的唯一值

import pandas as pd

patients_df = pd.DataFrame({
    'Name': ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
    'Date':  ["2020-12-01", "2020-12-01", "2020-12-02", "2020-12-02", "2020-12-02", "2020-12-03"],
    'Age': [17, 18, 17, 16, 18, 16]
})

print(patients_df, "\n")

print(patients_df.groupby('Date').Name.nunique())

輸出:

       Name        Date  Age
0  Jennifer  2020-12-01   17
1    Travis  2020-12-01   18
2       Bob  2020-12-02   17
3      Emma  2020-12-02   16
4      Luna  2020-12-02   18
5     Anish  2020-12-03   16 

Date
2020-12-01    2
2020-12-02    3
2020-12-03    1
Name: Name, dtype: int64

它根據 Date 列的值將 DataFrame 分割開來,即把 Date 值相同的行放在同一組,然後計算每一個名字在某一組中的出現次數,以瞭解 DataFrame 中 Date 列的每一個唯一值的數量。

Author: Suraj Joshi
Suraj Joshi avatar Suraj Joshi avatar

Suraj Joshi is a backend software engineer at Matrice.ai.

LinkedIn