gpt4 book ai didi

返回因子变量第一次和最后一次出现的日期

转载 作者:行者123 更新时间:2023-12-03 23:31:45 26 4
gpt4 key购买 nike

问题

我有一个数据框,其中每一行都标记了公司之间的交换,公司在给定日期给予和接收一些东西(他们可以给不同的公司或自己)。从那以后,我想创建一个新的数据框,其中的列指示公司何时首次开始捐赠、何时首次停止捐赠、何时首次开始接受以及何时首次停止接受。这是我开始的示例数据框:

样本起始数据

samp <- structure(list(giver = structure(c(1L, 2L, 6L, 3L, 1L, 3L, 4L, 1L, 6L, 1L, 5L), .Label = c("A", "B", "C", "X", "Y", "Z"), class = "factor"), receiver = structure(c(1L, 2L, 2L, 3L, 1L, 3L, 3L, 1L, 2L, 1L, 2L), .Label = c("A", "B", "C"), class = "factor"), date = structure(c(1L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 9L), .Label = c("2000-01-01", "2000-01-02", "2000-01-03", "2000-01-04", "2000-01-05", "2000-01-06", "2000-01-07", "2000-01-08", "2000-01-09"), class = "factor")), .Names = c("giver", "receiver", "date"), class = "data.frame", row.names = c(NA, -11L))
samp$date <- as.Date(samp$date, "%Y-%m-%d") # Format date variable

samp
giver receiver date
A A 2000-01-01
B B 2000-01-01
Z B 2000-01-02
C C 2000-01-03
A A 2000-01-04
C C 2000-01-05
X C 2000-01-06
A A 2000-01-07
Z B 2000-01-08
A A 2000-01-09
Y B 2000-01-09

但是,我无法弄清楚如何为每个公司的第一次和最后一次出现扫描一列并返回不同列的日期值。我发现了类似的问题 herehere使用 match , duplicated , 或 tapply但不能完全让它们适合我正在尝试做的事情。这是我希望最终得到的示例数据框:

所需的结束数据
desire <- structure(list(company = structure(1:6, .Label = c("A", "B", "C", "X", "Y", "Z"), class = "factor"), start.giving = structure(c(1L, 1L, 3L, 4L, 5L, 2L), .Label = c("2000-01-01", "2000-01-02", "2000-01-03", "2000-01-05", "2000-01-09"), class = "factor"), stop.giving = structure(c(5L, 1L, 2L, 3L, 5L, 4L), .Label = c("2000-01-01", "2000-01-05", "2000-01-06", "2000-01-08", "2000-01-09"), class = "factor"), start.receiving = structure(c(1L, 1L, 2L, NA, NA, NA), .Label = c("2000-01-01", "2000-01-03"), class = "factor"), stop.receiving = structure(c(2L, 2L, 1L, NA, NA, NA), .Label = c("2000-01-06", "2000-01-09"), class = "factor")), .Names = c("company", "start.giving", "stop.giving", "start.receiving", "stop.receiving"), class = "data.frame", row.names = c(NA, -6L))

desire
company start.giving stop.giving start.receiving stop.receiving
A 2000-01-01 2000-01-09 2000-01-01 2000-01-09
B 2000-01-01 2000-01-01 2000-01-01 2000-01-09
C 2000-01-03 2000-01-05 2000-01-03 2000-01-06
X 2000-01-05 2000-01-06 <NA> <NA>
Y 2000-01-09 2000-01-09 <NA> <NA>
Z 2000-01-02 2000-01-08 <NA> <NA>

最佳答案

dplyr版本:

library("dplyr")
giving <- samp %>% group_by(giver) %>%
summarise(start.giving=min(date),
stop.giving=max(date)) %>%
rename(company=giver)
receiving <- samp %>% group_by(receiver) %>%
summarise(start.receiving=min(date),
stop.receiving=max(date)) %>%
rename(company=receiver)
full_join(giving,receiving)

多做一点工作,就可以进一步浓缩/不重复所有 summarise代码(类似于@Arun 回答中的 foo() 函数)...
foo <- function(x,f) {
ss <- c("start","stop")
group_by_(x,.dots=f) %>%
summarise(start=min(date),
stop=max(date)) %>%
rename_(.dots=c(company=f,
setNames(ss,paste(ss,f,sep="."))))
}
full_join(foo(samp,"giver"),foo(samp,"receiver"))

...虽然代码现在不那么透明,实际上也没有更短......如果你要经常做这种事情,那还是值得的。

关于返回因子变量第一次和最后一次出现的日期,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29784270/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com