gpt4 book ai didi

删除特定时间段内出现的重复项

转载 作者:行者123 更新时间:2023-12-04 13:31:43 25 4
gpt4 key购买 nike

我有一个包含 ID 变量、日期和几个代理的数据集(参见下面的示例)。代理已经对每位患者进行了多次测试,我想过滤第一个出现的每个 ID,并删除第一个出现后 4 周内出现的所有其他测试。在此之后,我再次想过滤第一个并删除 4 周内出现的所有其他人 - 在整个数据集中。我还生成了显示周、月和年的变量。

ID <- rep(1, times = 20)
Date <- c("2010-12-09", "2010-12-09", "2010-12-09", "2010-12-09", "2010-12-09", "2010-12-09", "2010-12-09", "2010-12-09", "2010-12-27", "2010-12-27", "2010-12-27", "2010-12-27", "2011-01-14", "2011-01-14", "2011-01-14", "2011-01-14", "2011-01-14", "2011-01-14", "2011-01-14", "2011-01-14")
Agent <- c("Agent1", "Agent2", "Agent3", "Agent4", "Agent1", "Agent2", "Agent3", "Agent4", "Agent1", "Agent2", "Agent3", "Agent4", "Agent1", "Agent2", "Agent3", "Agent4", "Agent1", "Agent2", "Agent3", "Agent4")

df <- data.frame(ID, Date, Agent)


ID Date Week Month Year Agent
1 1 2010-12-09 49 12 2010 Agent1
2 1 2010-12-09 49 12 2010 Agent2
3 1 2010-12-09 49 12 2010 Agent3
4 1 2010-12-09 49 12 2010 Agent4
5 1 2010-12-09 49 12 2010 Agent1
6 1 2010-12-09 49 12 2010 Agent2
7 1 2010-12-09 49 12 2010 Agent3
8 1 2010-12-09 49 12 2010 Agent4
9 1 2010-12-27 52 12 2010 Agent1
10 1 2010-12-27 52 12 2010 Agent2
11 1 2010-12-27 52 12 2010 Agent3
12 1 2010-12-27 52 12 2010 Agent4
13 1 2011-01-14 2 1 2011 Agent1
14 1 2011-01-14 2 1 2011 Agent2
15 1 2011-01-14 2 1 2011 Agent3
16 1 2011-01-14 2 1 2011 Agent4
17 1 2011-01-14 2 1 2011 Agent1
18 1 2011-01-14 2 1 2011 Agent2
19 1 2011-01-14 2 1 2011 Agent3
20 1 2011-01-14 2 1 2011 Agent4
我需要的是这个:
     ID Date        Week Month Year  Agent
1 1 2010-12-09 49 12 2010 Agent1
2 1 2010-12-09 49 12 2010 Agent2
3 1 2010-12-09 49 12 2010 Agent3
4 1 2010-12-09 49 12 2010 Agent4
13 1 2011-01-14 2 1 2011 Agent1
14 1 2011-01-14 2 1 2011 Agent2
15 1 2011-01-14 2 1 2011 Agent3
16 1 2011-01-14 2 1 2011 Agent4
我很高兴有任何帮助!

最佳答案

您可以减去最小值 Date每个ID创建一个由 4 周数据组成的新组,并为每个 ID 选择具有最小日期的行, groupAgent .

library(dplyr)

df %>%
mutate(Date = as.Date(Date)) %>%
group_by(ID) %>%
mutate(group = ceiling(as.integer(difftime(Date, min(Date), units = 'week')/4))) %>%
group_by(ID, group, Agent) %>%
slice(which.min(Date))

# ID Date Agent group
# <dbl> <date> <chr> <dbl>
#1 1 2010-12-09 Agent1 0
#2 1 2010-12-09 Agent2 0
#3 1 2010-12-09 Agent3 0
#4 1 2010-12-09 Agent4 0
#5 1 2011-01-14 Agent1 1
#6 1 2011-01-14 Agent2 1
#7 1 2011-01-14 Agent3 1
#8 1 2011-01-14 Agent4 1

关于删除特定时间段内出现的重复项,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64696494/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com