Pandas 由兩列來 groupby

Suraj Joshi 2023年1月30日 2021年1月22日
  1. Pandas Groupby 多列分組
  2. 計算每組的行數 Pandas
Pandas 由兩列來 groupby

本教程介紹瞭如何在 Pandas 中使用 DataFrame.groupby() 方法將兩列的 DataFrame 分成若干組。我們還可以從建立的組中獲得更多的資訊。

我們將在本文中使用下面的 DataFrame。

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

data = pd.DataFrame({
    'Name': ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
    'Gender':  ["Female", "Male", "Male", "Female", "Female", "Male"],
    'Employed': ["Yes", "No", "Yes", "No", "Yes", "No"],
    'Age': [30, 28, 27, 24, 28, 25]
})

print(data)

輸出:

       Name  Gender Employed  Age
0  Jennifer  Female      Yes   30
1    Travis    Male       No   28
2       Bob    Male      Yes   27
3      Emma  Female       No   24
4      Luna  Female      Yes   28
5     Anish    Male       No   25

Pandas Groupby 多列分組

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

data = pd.DataFrame({
    'Name': ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
    'Gender':  ["Female", "Male", "Male", "Female", "Female", "Male"],
    'Employed': ["Yes", "No", "Yes", "No", "Yes", "No"],
    'Age': [30, 28, 27, 24, 28, 25]
})

print(data)
print("")
print("Groups in DataFrame:")
groups = data.groupby(['Gender', 'Employed'])
for group_key, group_value in groups:
    group = groups.get_group(group_key)
    print(group)
    print("")

輸出:

       Name  Gender Employed  Age
0  Jennifer  Female      Yes   30
1    Travis    Male       No   28
2       Bob    Male      Yes   27
3      Emma  Female       No   24
4      Luna  Female      Yes   28
5     Anish    Male       No   25

Groups in DataFrame:
   Name  Gender Employed  Age
3  Emma  Female       No   24

       Name  Gender Employed  Age
0  Jennifer  Female      Yes   30
4      Luna  Female      Yes   28

     Name Gender Employed  Age
1  Travis   Male       No   28
5   Anish   Male       No   25

  Name Gender Employed  Age
2  Bob   Male      Yes   27

它從 DataFrame 中建立了 4 個組。所有 GenderEmployed 列值相同的行都會被放在同一個組。

計算每組的行數 Pandas

要使用 DataFrame.groupby() 方法統計每個建立的組的行數,我們可以使用 size() 方法。

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

data = pd.DataFrame({
    'Name': ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
    'Gender':  ["Female", "Male", "Male", "Female", "Female", "Male"],
    'Employed': ["Yes", "No", "Yes", "No", "Yes", "No"],
    'Age': [30, 28, 27, 24, 28, 25]
})

print(data)
print("")
print("Count of Each group:")
grouped_df = data.groupby(['Gender', 'Employed']
                          ).size().reset_index(name="Count")
print(grouped_df)

輸出:

       Name  Gender Employed  Age
0  Jennifer  Female      Yes   30
1    Travis    Male       No   28
2       Bob    Male      Yes   27
3      Emma  Female       No   24
4      Luna  Female      Yes   28
5     Anish    Male       No   25

Count of Each group:
   Gender Employed  Count
0  Female       No      1
1  Female      Yes      2
2    Male       No      2
3    Male      Yes      1

它顯示 DataFrame,從 DataFrame 中建立的組,以及每個組的元素數。

如果我們想得到 Employed 列中每個值的最大計數值,我們可以從上面建立的組再組成一個組,並對值進行計數,然後使用 max() 方法得到計數的最大值。

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

data = pd.DataFrame({
    'Name': ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
    'Gender':  ["Female", "Male", "Male", "Female", "Female", "Male"],
    'Employed': ["Yes", "No", "Yes", "No", "Yes", "No"],
    'Age': [30, 28, 27, 24, 28, 25]
})

print(data)
print("")

groups = data.groupby(['Gender', 'Employed']).size().groupby(level=1)
print(groups.max())

輸出:

       Name  Gender Employed  Age
0  Jennifer  Female      Yes   30
1    Travis    Male       No   28
2       Bob    Male      Yes   27
3      Emma  Female       No   24
4      Luna  Female      Yes   28
5     Anish    Male       No   25

Employed
No     2
Yes    2
dtype: int64

它顯示了從 GenderEmployed 列建立的組中,Employed 列值的最大計數。

Author: Suraj Joshi
Suraj Joshi avatar Suraj Joshi avatar

Suraj Joshi is a backend software engineer at Matrice.ai.

LinkedIn

相關文章 - Pandas DataFrame Column

相關文章 - Pandas Groupby