gpt4 book ai didi

r - 如何进行 data.table 滚动连接?

转载 作者:行者123 更新时间:2023-12-04 10:16:12 24 4
gpt4 key购买 nike

我有两个要合并的数据表。一个是随时间变化的公司市场值(value)数据,另一个是随时间变化的公司股息历史数据。我试图找出每家公司每个季度支付了多少费用,并将该值(value)随着时间的推移放在市场值(value)数据旁边。

library(magrittr)
library(data.table)
library(zoo)
library(lubridate)

set.seed(1337)
# data table of company market values
companies <-
data.table(companyID = 1:10,
Sedol = rep(c("91772E", "7A662B"), each = 5),
Date = (as.Date("2005-04-01") + months(seq(0, 12, 3))) - days(1),
MktCap = c(100 + cumsum(rnorm(5,5)),
50 + cumsum(rnorm(5,1,5)))) %>%
setkey(Sedol, Date)

# data table of dividends
dividends <-
data.table(DivID = 1:7,
Sedol = c(rep('91772E', each = 4), rep('7A662B', each = 3)),
Date = as.Date(c('2004-11-19', '2005-01-13', '2005-01-29',
'2005-10-01', '2005-06-29', '2005-06-30',
'2006-04-17')),
DivAmnt = rnorm(7, .8, .3)) %>%
setkey(Sedol, Date)

我相信这是一种您可以使用 data.table 滚动连接的情况,例如:
dividends[companies, roll = "nearest"]

尝试获得一个看起来像的数据集
       DivID  Sedol       Date   DivAmnt companyID    MktCap
1: NA 7A662B <NA> NA 6 61.21061
2: 5 7A662B 2005-06-29 0.7772631 7 66.92951
3: 6 7A662B 2005-06-30 1.1815343 7 66.92951
4: NA 7A662B <NA> NA 8 78.33914
5: NA 7A662B <NA> NA 9 88.92473
6: NA 7A662B <NA> NA 10 87.85067
7: 2 91772E 2005-01-13 0.2964291 1 105.19249
8: 3 91772E 2005-01-29 0.8472649 1 105.19249
9: NA 91772E <NA> NA 2 108.74579
10: 4 91772E 2005-10-01 1.2467408 3 113.42261
11: NA 91772E <NA> NA 4 120.04491
12: NA 91772E <NA> NA 5 124.35588

(请注意,我已将股息与公司市场值(value)相匹配)

但我不确定如何执行它。如果 roll,CRAN pdf 关于数字是什么或应该是什么数字相当模糊。是一个值(你能传递日期吗?一个数字是否量化了要携带的天数?观察次数?)并改变 rollends周围似乎没有得到我想要的。

最后,我最终将股息日期映射到他们的季度末,然后加入。一个很好的解决方案,但如果我最终需要知道如何执行滚动连接就没有用了。在您的回答中,您能否描述滚动连接是唯一解决方案的情况,并帮助我了解如何执行它们?

最佳答案

您可能希望使用带有 foverlaps 的重叠连接而不是滚动连接。 的功能:

# create an interval in the 'companies' datatable
companies[, `:=` (start = compDate - days(90), end = compDate + days(15))]
# create a second date in the 'dividends' datatable
dividends[, Date2 := divDate]

# set the keys for the two datatable
setkey(companies, Sedol, start, end)
setkey(dividends, Sedol, divDate, Date2)

# create a vector of columnnames which can be removed afterwards
deletecols <- c("Date2","start","end")

# perform the overlap join and remove the helper columns
res <- foverlaps(companies, dividends)[, (deletecols) := NULL]

结果:

> res
Sedol DivID divDate DivAmnt companyID compDate MktCap
1: 7A662B NA <NA> NA 6 2005-03-31 61.21061
2: 7A662B 5 2005-06-29 0.7772631 7 2005-06-30 66.92951
3: 7A662B 6 2005-06-30 1.1815343 7 2005-06-30 66.92951
4: 7A662B NA <NA> NA 8 2005-09-30 78.33914
5: 7A662B NA <NA> NA 9 2005-12-31 88.92473
6: 7A662B NA <NA> NA 10 2006-03-31 87.85067
7: 91772E 2 2005-01-13 0.2964291 1 2005-03-31 105.19249
8: 91772E 3 2005-01-29 0.8472649 1 2005-03-31 105.19249
9: 91772E NA <NA> NA 2 2005-06-30 108.74579
10: 91772E 4 2005-10-01 1.2467408 3 2005-09-30 113.42261
11: 91772E NA <NA> NA 4 2005-12-31 120.04491
12: 91772E NA <NA> NA 5 2006-03-31 124.35588


与此同时, 作者引入了非对等连接( v1.9.8 )。你也可以用它来解决这个问题。使用非对等连接,您只需要:
companies[, `:=` (start = compDate - days(90), end = compDate + days(15))]
dividends[companies, on = .(Sedol, divDate >= start, divDate <= end)]

以获得预期的结果。

使用的数据(与问题相同,但没有创建 key ):
set.seed(1337)
companies <- data.table(companyID = 1:10, Sedol = rep(c("91772E", "7A662B"), each = 5),
compDate = (as.Date("2005-04-01") + months(seq(0, 12, 3))) - days(1),
MktCap = c(100 + cumsum(rnorm(5,5)), 50 + cumsum(rnorm(5,1,5))))
dividends <- data.table(DivID = 1:7, Sedol = c(rep('91772E', each = 4), rep('7A662B', each = 3)),
divDate = as.Date(c('2004-11-19','2005-01-13','2005-01-29','2005-10-01','2005-06-29','2005-06-30','2006-04-17')),
DivAmnt = rnorm(7, .8, .3))

关于r - 如何进行 data.table 滚动连接?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35046161/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com