gpt4 book ai didi

r - 当您必须跟踪运行平衡时,是否有比 for 循环更好的解决方案?

转载 作者:行者123 更新时间:2023-12-04 11:29:32 28 4
gpt4 key购买 nike

我有一个包含数百万行的大型数据框。它是时间序列数据。例如:

dates <- c(1,2,3)
purchase_price <- c(5,2,1)
income <- c(2,2,2)
df <- data.frame(dates=dates,price=purchase_price,income=income)

我想创建一个新列,告诉我我每天花了多少钱,并附上一些规则,例如“如果我有足够的钱,那就买它。否则,省下钱”。

我目前正在循环遍历数据帧的每一行,并跟踪总资金。但是,对于大型数据集,这需要永远。据我所知,我不能做向量操作,因为我必须跟踪这个运行变量。

在 for 循环内我正在做:
balance = balance + row$income
buy_amt = min(balance,row$price)
balance = balance - buy_amt

有没有更快的解决方案?

谢谢!

最佳答案

正如保罗指出的那样,一些迭代是必要的。您在一个实例和前一点之间存在依赖关系。

但是,依赖关系仅在购买时发生(阅读:您只需要在...时重新计算余额)。因此,您可以“批量”迭代

尝试通过确定哪一行是有足够余额进行购买的下一行来完全做到这一点。然后它在一次调用中处理所有先前的行,然后从该点继续。

library(data.table)
DT <- as.data.table(df)

## Initial Balance
b.init <- 2

setattr(DT, "Starting Balance", b.init)

## Raw balance for the day, regardless of purchase
DT[, balance := b.init + cumsum(income)]
DT[, buying := FALSE]

## Set N, to not have to call nrow(DT) several times
N <- nrow(DT)

## Initialize
ind <- seq(1:N)

# Identify where the next purchase is
while(length(buys <- DT[ind, ind[which(price <= balance)]]) && min(ind) < N) {
next.buy <- buys[[1L]] # only grab the first one
if (next.buy > ind[[1L]]) {
not.buys <- ind[1L]:(next.buy-1L)
DT[not.buys, buying := FALSE]
}
DT[next.buy, `:=`(buying = TRUE
, balance = (balance - price)
) ]

# If there are still subsequent rows after 'next.buy', recalculate the balance
ind <- (next.buy+1) : N
# if (N > ind[[1]]) { ## So that
DT[ind, balance := cumsum(income) + DT[["balance"]][[ ind[[1]]-1L]] ]
# }
}
# Final row needs to be outside of while-loop, or else will buy that same item multiple times
if (DT[N, !buying && (balance > price)])
DT[N, `:=`(buying = TRUE, balance = (balance - price)) ]

结果:
## Show output
{
print(DT)
cat("Starting Balance was", attr(DT, "Starting Balance"), "\n")
}


## Starting with 3:
dates price income balance buying
1: 1 5 2 0 TRUE
2: 2 2 2 0 TRUE
3: 3 3 2 2 FALSE
4: 4 5 2 4 FALSE
5: 5 2 2 4 TRUE
6: 6 1 2 5 TRUE
Starting Balance was 3

## Starting with 2:
dates price income balance buying
1: 1 5 2 4 FALSE
2: 2 2 2 4 TRUE
3: 3 3 2 3 TRUE
4: 4 5 2 0 TRUE
5: 5 2 2 0 TRUE
6: 6 1 2 1 TRUE
Starting Balance was 2


# I modified your original data slightly, for testing
df <- rbind(df, df)
df$dates <- seq_along(df$dates)
df[["price"]][[3]] <- 3

关于r - 当您必须跟踪运行平衡时,是否有比 for 循环更好的解决方案?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19621560/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com