gpt4 book ai didi

r - 每年使用 dplyr 或数据表的公司数量

转载 作者:行者123 更新时间:2023-12-02 07:24:43 25 4
gpt4 key购买 nike

假设我有数据框:

df <- data.frame(City = c("NY", "NY", "NY", "NY", "NY", "LA", "LA", "LA", "LA"),
YearFrom = c("2001", "2003", "2002", "2006", "2008", "2004", "2005", "2005", "2002"),
YearTo = c(NA, "2005", NA, NA, "2009", NA, "2008", NA, NA))

其中 YearFrom 是年份,例如公司成立,YearTo 是公司注销的年份。如果 YearTo 为 NA,那么它仍然有效。

我想计算每年的公司数量。

表格应该是这样的

City    |"Year"   |"Count"
"NY" |2001 1
"NY" |2002 2
"NY" |2003 3
"NY" |2004 3
"NY" |2005 2
"NY" |2006 3
"NY" |2007 3
"NY" |2008 4
"NY" |2009 3
"LA" |2001 0
"LA" |2002 1
"LA" |2003 1
"LA" |2004 2
"LA" |2005 4
"LA" |2006 4
"LA" |2007 4
"LA" |2008 2
"LA" |2009 2

我想通过 dplyr 或 datatable 包解决这个问题,但我不知道如何解决?

最佳答案

首先,清理数据...

curr_year = as.integer(year(Sys.Date()))

library(data.table)
setDT(df)
df[, YearTo := as.integer(as.character(YearTo)) ]
df[, YearFrom := as.integer(as.character(YearFrom)) ]
df[, quasiYearTo := YearTo ]
df[is.na(YearTo), quasiYearTo := curr_year ]

然后,非等值连接:

df[CJ(City = City, Year = min(YearFrom):max(YearTo, na.rm=TRUE), unique=TRUE), 
on=.(City, YearFrom <= Year, quasiYearTo > Year), allow.cartesian = TRUE,
.N
, by=.EACHI][, .(City, Year = YearFrom, N)]

City Year N
1: LA 2001 0
2: LA 2002 1
3: LA 2003 1
4: LA 2004 2
5: LA 2005 4
6: LA 2006 4
7: LA 2007 4
8: LA 2008 3
9: LA 2009 3
10: NY 2001 1
11: NY 2002 2
12: NY 2003 3
13: NY 2004 3
14: NY 2005 2
15: NY 2006 3
16: NY 2007 3
17: NY 2008 4
18: NY 2009 3

关于r - 每年使用 dplyr 或数据表的公司数量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43072063/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com