gpt4 book ai didi

r - 交易从长到宽 reshape ,加入买卖数据帧

转载 作者:行者123 更新时间:2023-12-01 11:38:51 25 4
gpt4 key购买 nike

我有长格式的买卖交易,我想将其转换为宽格式。看例子:

enter image description here

对于某个股票的每笔买入交易,必须存在关闭头寸的同一股票的卖出交易。如果 SELL 交易不存在或股票数量为零,则将 NA 置于卖出价。

说明:

我们以 34.56 的价格买入了 100 股 AIG 股票。接下来,我们必须找到相同股票代码 AIG 的 BUY 交易的退出(SELL)交易。本次交易存在于下方,股数为 600 股。因此,我们以 100 股结束我们的 AIG 买入交易,将卖出交易的股数从 600 股减少到 500 股,并用买入价和卖出价以宽格式写下这笔交易。

下一个交易是GOOG。对于这个股票,我们找到了两个 SELL 交易并将它们全部写成宽格式,但 100 股未售出,因此我们将此交易作为“未完成”,卖出价格为 NA。

如有必要,我可以稍后将算法放入伪代码中。但我希望,我的解释是清楚的。

我的问题如下:用干净的矢量化代码在 R 中很容易做到吗?这个算法很容易用命令式范式语言编程,比如 C++。但是对于 R 我有麻烦。

编辑 1:为 R 添加了输入和输出数据帧:

inputDF1 <- data.frame(Ticker = c("AIG", "GOOG", rep("AIG", 3), rep("GOOG", 2), rep("NEM", 3)), Side = c(rep("BUY", 4), rep("SELL", 3), "BUY", rep("SELL", 2)), Shares = c(100, 400, 200, 400, 600, 200, 100, 100, 50, 50), Price = c(34.56, 457, 28.56, 24.65, 30.02, 460, 461, 45, 56, 78))
inputDF2 <- data.frame(Ticker = c(rep("AIG", 3), rep("GOOG", 3)), Side = c(rep("BUY", 2), "SELL", "BUY", rep("SELL", 2)), Shares = c(100, 100, 200, 300, 200, 100), Price = c(34, 35, 36, 457, 458, 459))
inputDF3 <- data.frame(Ticker = c(rep("AIG", 3), rep("GOOG", 3)), Side = c(rep("BUY", 2), "SELL", "BUY", rep("SELL", 2)), Shares = c(100, 100, 100, 300, 100, 100), Price = c(34, 35, 36, 457, 458, 459))

outputDF1 <- data.frame(Ticker = c("AIG", rep("GOOG", 3), rep("AIG", 3), rep("NEM", 2)), Side = rep("BUY", 9), Shares = c(100, 200, 100, 100, 200, 300, 100, 50, 50), BuyPrice = c(34.56, 457, 457, 457, 28.56, 24.65, 24.65, 45, 45), SellPrice = c(30.02, 460, 461, NA, 30.02, 30.02, NA, 56, 78))
outputDF2 <- data.frame(Ticker = c(rep("AIG", 2), rep("GOOG", 2)), Side = rep("BUY", 4), Shares = c(100, 100, 200, 100), BuyPrice = c(34, 35, 457, 457), SellPrice = c(36, 36, 458, 459))
outputDF3 <- data.frame(Ticker = c(rep("AIG", 2), rep("GOOG", 3)), Side = rep("BUY", 5), Shares = rep(100, 5), BuyPrice = c(34, 35, rep(457, 3)), SellPrice = c(36, NA, 458, 459, NA))

编辑 2:更新了 R 的示例和输入/输出数据

最佳答案

原始答案(虽然问题仍在开发中,但我没有给予足够的关注)

使用 dcast来自 reshape2 :

> t <- c("AIG", "GOOG", "AIG", "AIG", "AIG", "GOOG", "GOOG")
> sd <- c(rep("BUY", 4), rep("SELL", 3))
> sh <- c(100, 400, 200, 400, 600, 200, 100)
> pr <- c(34.56, 457, 28.56, 24.65, 30.02, 460, 461)
> df <- data.frame(Ticker = t, Side = sd, Shares = sh, Price = pr)
>
> library(reshape2)
> df
Ticker Side Shares Price
1 AIG BUY 100 34.56
2 GOOG BUY 400 457.00
3 AIG BUY 200 28.56
4 AIG BUY 400 24.65
5 AIG SELL 600 30.02
6 GOOG SELL 200 460.00
7 GOOG SELL 100 461.00
> dcast(df, Ticker*Shares ~ Side, value.var="Price")
Ticker Shares BUY SELL
1 AIG 100 34.56 NA
2 AIG 200 28.56 NA
3 AIG 400 24.65 NA
4 AIG 600 NA 30.02
5 GOOG 100 NA 461.00
6 GOOG 200 NA 460.00
7 GOOG 400 457.00 NA

新答案

这里的关键症结在于,R 中的“基于向量”通常与“函数式”相关(例如 apply() 系列),但纯函数式方法在这里不太适用,因为您必须更新销售 list 对于每个(每个的一部分)购买交易。我真的觉得你可以用 aggregate 做一些神奇的事情或 by和精心设计的功能,但我想到的最佳可读解决方案涉及一个简单的 for -环形。

带有 for 的版本
inputDF <- data.frame(Ticker = c("AIG", "GOOG", "AIG", "AIG", "AIG", "GOOG", "GOOG"), 
Side = c(rep("BUY", 4), rep("SELL", 3)),
Shares = c(100, 400, 200, 400, 600, 200, 100),
Price = c(34.56, 457, 28.56, 24.65, 30.02, 460, 461))
buys <- subset(inputDF,Side=="BUY")
sells <- subset(inputDF,Side=="SELL")
transactions <- NULL

# go through every buy operation
for(i in 1:nrow(buys)){
ticker <- buys[i,"Ticker"]
bp <- buys[i,"Price"]
shares <- buys[i,"Shares"]

# keep going as long as we can find sellers
while(shares > 0 & sum(sells[sells$Ticker == ticker,"Shares"]) > 0){
sp <- sells[sells$Ticker == ticker & sells$Shares > 0,][1,"Price"]
if(sells[sells$Ticker == ticker & sells$Shares > 0,][1,"Shares"] > shares){
shares.sold <- shares
}else{
shares.sold <- sells[sells$Ticker == ticker & sells$Shares > 0,][1,"Shares"]
}
shares <- shares - shares.sold
sells[sells$Shares >= shares & sells$Ticker == ticker,][1,"Shares"] <- sells[sells$Shares >= shares & sells$Ticker == ticker,][1,"Shares"] - shares.sold
transactions <- rbind(transactions,data.frame("Ticker"=ticker
,"Side"="BUY"
,"Shares"=shares.sold
,"BuyPrice"=bp
,"SellPrice"=sp))
}
# not enough sellers
if(shares > 0){
transactions <- rbind(transactions,data.frame("Ticker"=ticker
,"Side"="BUY"
,"Shares"=shares
,"BuyPrice"=bp
,"SellPrice"="NA"))

}

}

print(transactions)

输出:
  Ticker Side Shares BuyPrice SellPrice
1 AIG BUY 100 34.56 30.02
2 GOOG BUY 200 457.00 460
3 GOOG BUY 100 457.00 461
4 GOOG BUY 100 457.00 NA
5 AIG BUY 200 28.56 30.02
6 AIG BUY 300 24.65 30.02
7 AIG BUY 100 24.65 NA

如果我们尝试使用 foreach,更新就会变得明显。包以自动并行化循环。很快就很明显我们在 sell 上存在竞争条件。数据框。

带有 apply 的版本

上面的代码有一些效率低下的地方可以改进。附加操作通过 rbind()效率不高,可能会稍微优化一下,或者减少对 rbind() 的调用次数或将其全部消除。您还可以将所有内容打包成一个函数并将其转换为对 apply() 的调用。 ,即使对于串行 apply() 也确实会更快因为循环是在更优化的级别完成的。 (CPython 也是如此——列表推导式和 str.join() 比 for 循环快得多,因为它们“更了解”操作的总大小,并且因为它们是用优化的 C 编写的。)这是第一个尝试——注意我们使用 do.call(rbind, list(...))为了简化我们从原始调用中返回的小数据框列表 apply .这不是非常有效(来自 rbindlistdata.table 明显更快,参见 here ),但它没有任何外部依赖。您从 apply() 返回的列表实际上以它自己的方式很有趣——每个元素都是完成整个购买操作所需的交易列表。如果您将行名称添加到 buys数据框,然后您可以按名称调用每组事务。
inputDF <- data.frame(Ticker = c("AIG", "GOOG", "AIG", "AIG", "AIG", "GOOG", "GOOG"), 
Side = c(rep("BUY", 4), rep("SELL", 3)),
Shares = c(100, 400, 200, 400, 600, 200, 100),
Price = c(34.56, 457, 28.56, 24.65, 30.02, 460, 461))
buys <- subset(inputDF,Side=="BUY")
sells <- subset(inputDF,Side=="SELL")
transactions <- NULL

# go through every buy operation
buy.operation <- function(x){
ticker <- x["Ticker"]
# apply() converts to matix implicity, and all the elements of a matrix have
# have the same data type, so everything gets converted to characters
# thus, we need to convert back
bp <- as.numeric(x["Price"])
shares <- as.numeric(x["Shares"])

# keep going as long as we can find sellers
while(shares > 0 & sum(sells[sells$Ticker == ticker,"Shares"]) > 0){
sp <- sells[sells$Ticker == ticker & sells$Shares > 0,][1,"Price"]
if(sells[sells$Ticker == ticker & sells$Shares > 0,][1,"Shares"] > shares){
shares.sold <- shares
}else{
shares.sold <- sells[sells$Ticker == ticker & sells$Shares > 0,][1,"Shares"]
}
shares <- shares - shares.sold
sells[sells$Shares >= shares & sells$Ticker == ticker,][1,"Shares"] <- sells[sells$Shares >= shares & sells$Ticker == ticker,][1,"Shares"] - shares.sold
transactions <- rbind(transactions,data.frame("Ticker"=ticker
,"Side"="BUY"
,"Shares"=shares.sold
,"BuyPrice"=bp
,"SellPrice"=sp))
}
# not enough sellers
if(shares > 0){
transactions <- rbind(transactions,data.frame("Ticker"=ticker
,"Side"="BUY"
,"Shares"=shares
,"BuyPrice"=bp
,"SellPrice"="NA"))

}

transactions
}

transactions <- do.call(rbind, apply(buys,1,buy.operation) )
# get rid of weird row names
row.names(transactions) <- NULL
print(transactions)

输出:
  Ticker Side Shares BuyPrice SellPrice
1 AIG BUY 100 34.56 30.02
2 GOOG BUY 200 457.00 460
3 GOOG BUY 100 457.00 461
4 GOOG BUY 100 457.00 NA
5 AIG BUY 200 28.56 30.02
6 AIG BUY 400 24.65 30.02

不幸的是,最终未完成的 AIG 交易丢失了。我还没有想出如何解决这个问题。

关于r - 交易从长到宽 reshape ,加入买卖数据帧,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23981074/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com