gpt4 book ai didi

用最常见的值替换数据输入错误 - dplyr

转载 作者:行者123 更新时间:2023-12-02 18:51:18 28 4
gpt4 key购买 nike

我有一个数据框,其中包含一些数据输入错误。

我希望将每组的这些异常值替换为每组最常见的值。

我的数据如下:

df <- data.frame(CODE = c("J1745","J1745","J1745","J1745","J1100","J1100","J1100","J1100","J1100","J1100"),NDC = c(1234,1234,1234,1234,5678,5678,5678,5678,5678,5678),DOSAGE = c("10ML","10 ML","10 ML","10 ML","5 ML","5 ML","5 ML","5 ML","50 ML","5 ML"),DESC = c("TEXT1","TEXT 1","TEXT 1","TEXT 1","TEXT 2","TEXT 2","TEXT 2","TEXT 2","TEXT 10","TEXT 2"))

enter image description here

正如您所看到的,我的 DOSAGEDESC 列包含一些不一致的地方,我想将它们替换为每组中最常见的值。

我想要的输出如下:

enter image description here

最佳答案

我同意这有潜在危险的评论。

下面的代码将出现次数 <= 指定次数的元素替换为最常见的值。我在替换函数中使用 base-R 机制,因为这是我知道该怎么做的。

repl_common <- function(x,n=1) {
tt <- tapply(x,x,length) ## count number of instances
m <- names(tt)[which.max(tt)] ## find mode
x[tt[as.character(x)]<=n] <- m ## replace
return(x)
}
## apply by group across specified columns
df %>% group_by(CODE) %>% mutate(across(c(DOSAGE,DESC), repl_common))

关于用最常见的值替换数据输入错误 - dplyr,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66739165/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com