R 中按列刪除重複行

Jinku Hu 2023年1月30日

R R Data Frame

使用 dplyr 包的 distinct 函式按列刪除 R 中的重複行
在 R 中使用 group_by、filter 和 duplicated 函式按列刪除重複行
在 R 中使用 group_by 和 slice 函式按列刪除重複行

本文將介紹如何在 R 中逐列刪除重複行。

使用 `dplyr` 包的 `distinct` 函式按列刪除 R 中的重複行

dplyr 包提供了 distinct 函式，這是 R 語言中最常用的資料操作庫之一。distinct 在給定的資料框中選擇唯一的行。它將資料框作為第一個引數，然後是選擇過程中需要考慮的變數。可以提供多個列變數來過濾唯一行，但在以下程式碼片段中，我們演示了單個變數示例。第三個引數是可選的，具有預設值 - FALSE，但如果使用者明確傳遞 TRUE，該函式將在過濾後保留資料框中的所有變數。請注意，dplyr 使用名為管道的運算子函式 - %>%，它被解釋為提供左變數作為右函式的第一個引數。即，x %?% f(y) 符號變為 f(x, y)。

library(dplyr)

df1 <- data.frame(id = c(1, 2, 2, 3, 3, 4, 5, 5),
                 gender = c("F", "F", "M", "F", "B", "B", "F", "M"),
                 variant = c("a", "b", "c", "d", "e", "f", "g", "h"))

t1 <- df1 %>% distinct(id, .keep_all = TRUE)
t2 <- df1 %>% distinct(gender, .keep_all = TRUE)
t3 <- df1 %>% distinct(variant, .keep_all = TRUE)

df2 <- mtcars

tmp1 <- df2 %>% distinct(cyl, .keep_all = TRUE)
tmp2 <- df2 %>% distinct(mpg, .keep_all = TRUE)

在 R 中使用 `group_by`、`filter` 和 `duplicated` 函式按列刪除重複行

按列值刪除重複行的另一種解決方案是將資料框與列變數分組，然後使用 filter 和 duplicated 函式過濾元素。第一步是使用 group_by 函式完成的，該函式是 dplyr 包的一部分。接下來，前一個操作的輸出被重定向到 filter 函式以消除重複的行。

library(dplyr)

df1 <- data.frame(id = c(1, 2, 2, 3, 3, 4, 5, 5),
                 gender = c("F", "F", "M", "F", "B", "B", "F", "M"),
                 variant = c("a", "b", "c", "d", "e", "f", "g", "h"))

t1 <- df1 %>% group_by(id) %>% filter (! duplicated(id))
t2 <- df1 %>% group_by(gender) %>% filter (! duplicated(gender))
t3 <- df1 %>% group_by(variant) %>% filter (! duplicated(variant))

df2 <- mtcars

tmp3 <- df2 %>% group_by(cyl) %>% filter (! duplicated(cyl))
tmp4 <- df2 %>% group_by(mpg) %>% filter (! duplicated(mpg))

在 R 中使用 `group_by` 和 `slice` 函式按列刪除重複行

或者，可以將 group_by 函式與 slice 一起使用，以按列值刪除重複的行。slice 也是 dplyr 包的一部分，它按索引選擇行。有趣的是，當資料框被分組時，slice 將選擇每個組中給定索引上的行，如以下示例程式碼所示。

library(dplyr)

df1 <- data.frame(id = c(1, 2, 2, 3, 3, 4, 5, 5),
                 gender = c("F", "F", "M", "F", "B", "B", "F", "M"),
                 variant = c("a", "b", "c", "d", "e", "f", "g", "h"))

t1 <- df1 %>% group_by(id) %>% slice(1)
t2 <- df1 %>% group_by(gender) %>% slice(1)
t3 <- df1 %>% group_by(variant) %>% slice(1)

df2 <- mtcars

tmp5 <- df2 %>% group_by(cyl) %>% slice(1)
tmp6 <- df2 %>% group_by(mpg) %>% slice(1)

作者： Jinku Hu

Founder of DelftStack.com. Jinku has worked in the robotics and automotive industries for over 8 years. He sharpened his coding skills when he needed to do the automatic testing, data collection from remote servers and report creation from the endurance test. He is from an electrical/electronics engineering background but has expanded his interest to embedded electronics, embedded programming and front-/back-end programming.

使用 dplyr 包的 distinct 函式按列刪除 R 中的重複行

在 R 中使用 group_by、filter 和 duplicated 函式按列刪除重複行

在 R 中使用 group_by 和 slice 函式按列刪除重複行

相關文章 - R Data Frame

使用 `dplyr` 包的 `distinct` 函式按列刪除 R 中的重複行

在 R 中使用 `group_by`、`filter` 和 `duplicated` 函式按列刪除重複行

在 R 中使用 `group_by` 和 `slice` 函式按列刪除重複行