gpt4 book ai didi

使用嵌套 ID 和缺失数据与 data.table 滚动交叉连接

转载 作者:行者123 更新时间:2023-12-03 20:20:39 25 4
gpt4 key购买 nike

我有一个投资组合持有数据集:

# Input test data
portolios <- structure(list(portfolioid = c(1L, 1L, 1L, 1L, 1L, 1L), secid = c("A", "B", "A", "C", "C", "A"), reportdate = c("2010-03-31", "2010-03-31", "2010-06-30", "2010-06-30", "2010-07-15", "2010-08-31"), report_type = c("Full", "Full", "Full", "Full", "Partial", "Full"), shares = c(100L, 100L, 130L, 50L, 75L, 80L)), .Names = c("portfolioid", "secid", "reportdate", "report_type", "shares"), row.names = c(NA, -6L), class = c("data.table", "data.frame"))

portfolioid secid reportdate report_type shares
1: 1 A 2010-03-31 Full 100
2: 1 B 2010-03-31 Full 100
3: 1 A 2010-06-30 Full 130
4: 1 C 2010-06-30 Full 50
5: 1 C 2010-07-15 Partial 75
6: 1 A 2010-08-31 Full 80

我需要估算以下缺失的记录:
7:           1    B 2010-06-30       Full       0
8: 1 C 2010-08-31 Full 0

业务问题是,有时不会为 Full report_type 报告头寸的销售(份额 = 0),因此必须根据先前的报告估算缺失的 SecID。

最终,我试图根据每个投资组合 ID 的先前报告计算每个 SecID 的份额变化,以便我的数据集如下所示:
changes <- structure(list(portfolioid = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), secid = c("A", "B", "A", "B", "C", "C", "A", "C"), reportdate = c("2010-03-31", "2010-03-31", "2010-06-30", "2010-06-30", "2010-06-30", "2010-07-15", "2010-08-31", "2010-08-31"), report_type = c("Full", "Full", "Full", "Full", "Full", "Partial", "Full", "Full"), shares = c(100L, 100L, 130L, 0L, 50L, 75L, 80L, 0L), change = c(100L, 100L, 30L, -100L, 50L, 25L, -50L, -75L)), .Names = c("portfolioid", "secid", "reportdate", "report_type", "shares", "change"), row.names = c(NA, -8L), class = c("data.table", "data.frame"))

portfolioid secid reportdate report_type shares change
1: 1 A 2010-03-31 Full 100 100
2: 1 B 2010-03-31 Full 100 100
3: 1 A 2010-06-30 Full 130 30
4: 1 B 2010-06-30 Full 0 -100
5: 1 C 2010-06-30 Full 50 50
6: 1 C 2010-07-15 Partial 75 25
7: 1 A 2010-08-31 Full 80 -50
8: 1 C 2010-08-31 Full 0 -75

坚持如何为外连接组合 [i] 创建 i。我的问题是我不想使用 i <- CJ(reportdate, secid) 因为它会产生太多不必要的记录,因为并非每个 secid 都存在于每个 ReportDate 并且不能正确表示需要填充的数据。

我想我需要在 reportdate,reportdate[-1,secid] 之间进行滚动交叉连接

当完整报告中缺少 secid 但它存在于先前报告中(部分或完整)时,我想前滚 secid 并设置共享 := 0。我相信我会使用选项 roll=1 来做到这一点,但我不确定在哪里或如何实现。

我认为我的问题类似于

How to Calculate a rolling statistic in R using data.table on unevenly spaced data

我确定我缺少一些基本的理解或 CJ() 的技巧可以创建必要的 i

最佳答案

这样的事情应该可以工作(如果我理解正确的话)

首先使用日期类设置 reportdate。还可以获得独特的日期

portolios[, reportdate := as.IDate(reportdate)]
uniq.dts <- unique(portolios$reportdate)
uniq.dts <- uniq.dts[order(uniq.dts)]

对每个 i 执行自连接,并仅提取在我们知道 secid 已经存在之后发生的日期(应该比 CJ 更有效地存储内存)
setkey(portolios,secid)
setorder(portolios,sec,id,reportdate)

impute <- portolios[portolios, {
tmp = max(reportdate) < uniq.dts;
list(portfolioid=1,reportdate=uniq.dts[tmp][1],report_type="Full",shares=0)
},by=.EACHI][!is.na(reportdate)][,.SD[1],by=secid]

接下来是 rbindlist原始表和 impute表。
portolios <- rbindlist(list(portolios,impute),fill=TRUE)

#Order data by secid and reportdate
portolios <- portolios[order(secid,reportdate)]

#Lag data by group
portolios[, prev.shares := c(NA,lag(shares)), by=secid]

#Calculate change WHEN a previous share amount exists
portolios[, change := ifelse(is.na(prev.shares),shares,shares-prev.shares), by=secid]

print(portolios[order(reportdate)])
portfolioid secid reportdate report_type shares prev.shares change
1: 1 A 2010-03-31 Full 100 NA 100
2: 1 B 2010-03-31 Full 100 NA 100
3: 1 A 2010-06-30 Full 130 100 30
4: 1 B 2010-06-30 Full 0 100 -100
5: 1 C 2010-06-30 Full 50 NA 50
6: 1 C 2010-07-15 Partial 75 50 25
7: 1 A 2010-08-31 Full 80 130 -50
8: 1 C 2010-08-31 Full 0 75 -75

关于使用嵌套 ID 和缺失数据与 data.table 滚动交叉连接,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27471461/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com