Pandas 由两列来 groupby

Suraj Joshi 2023年1月30日 2021年1月22日
  1. Pandas Groupby 多列分组
  2. 计算每组的行数 Pandas
Pandas 由两列来 groupby

本教程介绍了如何在 Pandas 中使用 DataFrame.groupby() 方法将两列的 DataFrame 分成若干组。我们还可以从创建的组中获得更多的信息。

我们将在本文中使用下面的 DataFrame。

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

data = pd.DataFrame({
    'Name': ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
    'Gender':  ["Female", "Male", "Male", "Female", "Female", "Male"],
    'Employed': ["Yes", "No", "Yes", "No", "Yes", "No"],
    'Age': [30, 28, 27, 24, 28, 25]
})

print(data)

输出:

       Name  Gender Employed  Age
0  Jennifer  Female      Yes   30
1    Travis    Male       No   28
2       Bob    Male      Yes   27
3      Emma  Female       No   24
4      Luna  Female      Yes   28
5     Anish    Male       No   25

Pandas Groupby 多列分组

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

data = pd.DataFrame({
    'Name': ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
    'Gender':  ["Female", "Male", "Male", "Female", "Female", "Male"],
    'Employed': ["Yes", "No", "Yes", "No", "Yes", "No"],
    'Age': [30, 28, 27, 24, 28, 25]
})

print(data)
print("")
print("Groups in DataFrame:")
groups = data.groupby(['Gender', 'Employed'])
for group_key, group_value in groups:
    group = groups.get_group(group_key)
    print(group)
    print("")

输出:

       Name  Gender Employed  Age
0  Jennifer  Female      Yes   30
1    Travis    Male       No   28
2       Bob    Male      Yes   27
3      Emma  Female       No   24
4      Luna  Female      Yes   28
5     Anish    Male       No   25

Groups in DataFrame:
   Name  Gender Employed  Age
3  Emma  Female       No   24

       Name  Gender Employed  Age
0  Jennifer  Female      Yes   30
4      Luna  Female      Yes   28

     Name Gender Employed  Age
1  Travis   Male       No   28
5   Anish   Male       No   25

  Name Gender Employed  Age
2  Bob   Male      Yes   27

它从 DataFrame 中创建了 4 个组。所有 GenderEmployed 列值相同的行都会被放在同一个组。

计算每组的行数 Pandas

要使用 DataFrame.groupby() 方法统计每个创建的组的行数,我们可以使用 size() 方法。

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

data = pd.DataFrame({
    'Name': ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
    'Gender':  ["Female", "Male", "Male", "Female", "Female", "Male"],
    'Employed': ["Yes", "No", "Yes", "No", "Yes", "No"],
    'Age': [30, 28, 27, 24, 28, 25]
})

print(data)
print("")
print("Count of Each group:")
grouped_df = data.groupby(['Gender', 'Employed']
                          ).size().reset_index(name="Count")
print(grouped_df)

输出:

       Name  Gender Employed  Age
0  Jennifer  Female      Yes   30
1    Travis    Male       No   28
2       Bob    Male      Yes   27
3      Emma  Female       No   24
4      Luna  Female      Yes   28
5     Anish    Male       No   25

Count of Each group:
   Gender Employed  Count
0  Female       No      1
1  Female      Yes      2
2    Male       No      2
3    Male      Yes      1

它显示 DataFrame,从 DataFrame 中创建的组,以及每个组的元素数。

如果我们想得到 Employed 列中每个值的最大计数值,我们可以从上面创建的组再组成一个组,并对值进行计数,然后使用 max() 方法得到计数的最大值。

import pandas as pd

roll_no = [501, 502, 503, 504, 505]

data = pd.DataFrame({
    'Name': ["Jennifer", "Travis", "Bob", "Emma", "Luna", "Anish"],
    'Gender':  ["Female", "Male", "Male", "Female", "Female", "Male"],
    'Employed': ["Yes", "No", "Yes", "No", "Yes", "No"],
    'Age': [30, 28, 27, 24, 28, 25]
})

print(data)
print("")

groups = data.groupby(['Gender', 'Employed']).size().groupby(level=1)
print(groups.max())

输出:

       Name  Gender Employed  Age
0  Jennifer  Female      Yes   30
1    Travis    Male       No   28
2       Bob    Male      Yes   27
3      Emma  Female       No   24
4      Luna  Female      Yes   28
5     Anish    Male       No   25

Employed
No     2
Yes    2
dtype: int64

它显示了从 GenderEmployed 列创建的组中,Employed 列值的最大计数。

Author: Suraj Joshi
Suraj Joshi avatar Suraj Joshi avatar

Suraj Joshi is a backend software engineer at Matrice.ai.

LinkedIn

相关文章 - Pandas DataFrame Column

相关文章 - Pandas Groupby