gpt4 book ai didi

r - 根据指定的值差异过滤 ID

转载 作者:行者123 更新时间:2023-12-01 15:50:59 25 4
gpt4 key购买 nike

我正在尝试根据指定条件过滤 ID。例如,我想过滤处理前后问卷分数有特定差异的 ID。这个想法是为了获得分数有所提高、保持不变或恶化的 ID。这是我想要实现的目标的模拟数据集:-

    ID<-c("aaa","bbb","ccc","ddd","eee","fff","ggg","hhh","iii","aaa","bbb","ccc","ddd","eee","fff","ggg","hhh","iii","aaa","bbb","ccc","aaa","bbb","ccc")
Condition<-c("Pre","Pre","Pre","Pre","Pre","Pre","Pre","Pre","Pre","Post","Post","Post","Post","Post","Post","Post","Post","Post","Pre","Pre","Pre","Post","Post", "Post")
Score<-c(23,20,19,15,22,22,20,19,18,17,17,19,20,22,22,14,15,10,23,23,21,20,18,11)
df<-cbind(ID,Condition,Score)
df<-as.data.frame(df)
df$Condition<-as.factor(df$Condition)

这里的主要问题是 ID 在数据中出现两次,无论是前还是后。

我尝试使用 dplyr 解决方案从主数据框中选择适当的列,然后使用 tidyversespread函数转换为宽格式,因为从那里我可以很容易地找出差异。然而,我遇到了一个特殊的问题。它不起作用,因为存在 ID 再次出现在数据中的重复实例(例如 ID aaa、bbb 和 ccc)。

     df2<-df%>%
group_by(ID)%>%
spread(Condition, Score)

这给我带来了以下错误消息:-

Error: Each row of output must be identified by a unique combination of keys. Keys are shared for 12 rows: * 10, 22 * 11, 23 * 12, 24 * 1, 19 * 2, 20 * 3, 21 Do you need to create unique ID with tibble::rowid_to_column()?

理想情况下,我想要的结果是这样的:-

    #improved
ID Pre Post Difference
aaa 23 17 -6
bbb 20 17 -3
ggg 20 14 -6
hhh 19 15 -4
iii 18 10 -8
aaa 23 20 -3
bbb 23 18 -5
ccc 21 11 -10


#no improvement
ID Pre Post Difference
ccc 19 19 0
eee 22 22 0
fff 22 22 0


#worsened
ID Pre Post Difference
ddd 15 20 +5

或者类似的东西。只要它允许我包含重复的 ID 即可。理想情况下,我希望能够根据差异的大小进一步有条件地过滤。例如,如果我想对改进分数超过 5 或恶化分数超过 5 的 ID 进行子集/过滤。请记住,我的实际数据集将比示例中的示例有更多的 ID 可供使用。我刚刚弥补并提供。一如既往,任何帮助将不胜感激。

提前谢谢你:)

最佳答案

一个选项是首先将“分数”从因子转换为数字,按“ID”“条件”分组,创建一个序列列(“rn” ),展开为'wide'格式,获取'Post'和'Pre'分数的差异,并按'的sign进行分割 Difference' 列来创建 tibblelist

library(tidyverse)
df %>%
mutate(Score = as.numeric(as.character(Score))) %>%
group_by(ID, Condition) %>%
mutate(rn = row_number()) %>%
spread(Condition, Score) %>%
mutate(Difference = Post -Pre) %>%
ungroup %>%
select(-rn) %>%
group_split(grp = sign(Difference), keep = FALSE)
#[[1]]
# A tibble: 8 x 4
# ID Post Pre Difference
# <fct> <dbl> <dbl> <dbl>
#1 aaa 17 23 -6
#2 aaa 20 23 -3
#3 bbb 17 20 -3
#4 bbb 18 23 -5
#5 ccc 11 21 -10
#6 ggg 14 20 -6
#7 hhh 15 19 -4
#8 iii 10 18 -8

#[[2]]
# A tibble: 3 x 4
# ID Post Pre Difference
# <fct> <dbl> <dbl> <dbl>
#1 ccc 19 19 0
#2 eee 22 22 0
#3 fff 22 22 0

#[[3]]
# A tibble: 1 x 4
# ID Post Pre Difference
# <fct> <dbl> <dbl> <dbl>
#1 ddd 20 15 5

注意:不建议使用 as.data.frame(cbind,因为 cbind 会转换为 matrix矩阵只能容纳一个类,即如果有一个字符列,所有其他列都会转换为字符并用as.data.frame包装(默认选项是stringsAsFactors = TRUE)。

df <- data.frame(...) #directly create

关于r - 根据指定的值差异过滤 ID,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56317134/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com