gpt4 book ai didi

R - 如何过滤掉具有唯一标识符的重复数据?

转载 作者:行者123 更新时间:2023-12-01 23:39:51 25 4
gpt4 key购买 nike

我有一个数据集,它是调查数据。如果我使用distinct(x),代码将不会读取任何重复项,因为我有一个始终不同的“调查ID”列。

x <- data.frame("Survey ID" = 1001:1008,
"First Initial" = c("M","P","S","B","H", "P", "L", "A"),
"Last Initial" = c("S","J", "A", "P", "Q", "J", "P", "C"),
"Age" = c(34,41,52,61,25,41,19,58),
"Gender" = c("M", "M", "M", "F","M","M","F","M"),
"Ethnicity" = c(2,2,1,1,3,2,1,4),
"Veteran Status" = c("A","Y","N","Y","N","Y","N","N")
)

我可以使用

y <- distinct(x[,-1]) 

它会过滤掉调查 ID,但我需要新数据集中的调查 ID。如何删除重复项但保留重复行之一的调查 ID?

谢谢

最佳答案

我们可以使用distinct_at

library(dplyr)
x %>%
distinct_at(-1, .keep_all = TRUE)

如果我们想指定列名

x %>%
distinct_at(vars(Age, Gender), .keep_all = TRUE)
<小时/>

或者另一个选项是data.table中的unique

library(data.table)
unique(setDT(x), by = names(x)[-1])

关于R - 如何过滤掉具有唯一标识符的重复数据?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59271041/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com