gpt4 book ai didi

R Dataframe - 在时间序列中应用表达式,并将结果输出到新的数据帧中

转载 作者:行者123 更新时间:2023-12-02 02:21:57 25 4
gpt4 key购买 nike

我正在学习 R,并且遇到了一个我无法克服/找到答案的问题。

我有一个数据框

  ID=c("a1","a1","a1","a1", 
"a2","a2","a2","a2",
"a3","a3","a3","a3",
"b1","b1","b1","b1",
"b2","b2","b2","b2",
"b3","b3","b3","b3"),
Date=c("January-19", "February-19", "March-19", "April-19",
"January-19", "February-19", "March-19", "April-19",
"January-19", "February-19", "March-19", "April-19",
"January-19", "February-19", "March-19", "April-19",
"January-19", "February-19", "March-19", "April-19",
"January-19", "February-19", "March-19", "April-19",
"May-19", "June-19", "July-19", "August-19",
"May-19", "June-19", "July-19", "August-19",
"May-19", "June-19", "July-19", "August-19",
"May-19", "June-19", "July-19", "August-19",
"May-19", "June-19", "July-19", "August-19",
"May-19", "June-19", "July-19", "August-19"),
Value=c(1,2,5,4,7,3,9,8,9,10,44,3,15,16,17,2, 3, 22, 12, 3, 4, 44, 24, 5))

“ID”列是“字符”,“日期”列是“日期”,“值”列是“数字”。

基于此数据框(df),我尝试创建一个新的数据框,它将在一列中显示表达式的结果,以及它在另一列中引用的日期。

例如对于“df”中的给定日期,我想找到给定表达式“(a1 + b1)/b1”的“值”,并将结果放入新的数据框中,显示该日期期间的单个值指的是并应用于“日期”时间序列。

使用“df”值和示例表达式,新数据框将如下所示:

January-19  | 1.06
February-19 | 1.13
March-19 | 1.29
April-19 | 3
May-19 | 1.06
June-19 | 1.13
July-19 | 1.29

这些表达式比给出的示例要复杂得多,但我不确定这是否重要,因为我试图找出的是如何应用任何计算并将其针对新的一系列日期输出数据框 - 无论复杂性如何。

如果这是一个简单的问题,我们深表歉意,并提前感谢您。

最佳答案

这是一个适用于所有 ID 集的基本 R 解决方案。这也假设条目之间是对称的。

重要的一步是将数据调整为正确的顺序。后续步骤仅处理条目。

使用这种方法的好处是可扩展的执行时间、对数据的最大程度的控制以及包独立性(这是个人偏好)。

数据:

df <- structure(list(ID = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L,
5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L), class = "factor", .Label = c("a1",
"a2", "a3", "b1", "b2", "b3")), Date = structure(c(4L, 3L, 7L,
1L, 4L, 3L, 7L, 1L, 4L, 3L, 7L, 1L, 4L, 3L, 7L, 1L, 4L, 3L, 7L,
1L, 4L, 3L, 7L, 1L, 8L, 6L, 5L, 2L, 8L, 6L, 5L, 2L, 8L, 6L, 5L,
2L, 8L, 6L, 5L, 2L, 8L, 6L, 5L, 2L, 8L, 6L, 5L, 2L), .Label = c("April-19",
"August-19", "February-19", "January-19", "July-19", "June-19",
"March-19", "May-19"), class = "factor"), Value = c(1, 2, 5,
4, 7, 3, 9, 8, 9, 10, 44, 3, 15, 16, 17, 2, 3, 22, 12, 3, 4,
44, 24, 5, 1, 2, 5, 4, 7, 3, 9, 8, 9, 10, 44, 3, 15, 16, 17,
2, 3, 22, 12, 3, 4, 44, 24, 5)), class = "data.frame", row.names = c(NA,
-48L))

首先,重新排序数据框:

df_reo <- df[ order( matrix( unlist( strsplit( as.character(df$ID), "" ) ),
ncol=2, byrow=T )[,2],
as.Date(df$Date, "%b-%d") ), ]

设置辅助变量:

li <- matrix( 1:nrow(df_reo), ncol=2, byrow=T ) # helper ids for the rows
colnames(li) <- c("a","b")

ds <- as.numeric( unlist(strsplit(sort(as.character( df$ID )), "" )[nrow(df)])[2] ) # ID-sets, only for nicer formatting

然后进行计算:

df_fin <- matrix( vapply( 1:nrow(li), function(x){
( df_reo$Value[li[x,"a"]] + df_reo$Value[li[x,"b"]] ) /
df_reo$Value[li[x,"b"]] }, 1.0 ), ncol=ds )

rownames(df_fin) <- unique(df_reo$Date)
> data.frame( df_fin )
X1 X2 X3
January-19 1.066667 3.333333 3.250000
February-19 1.125000 1.136364 1.227273
March-19 1.294118 1.750000 2.833333
April-19 3.000000 3.666667 1.600000
May-19 1.066667 3.333333 3.250000
June-19 1.125000 1.136364 1.227273
July-19 1.294118 1.750000 2.833333
August-19 3.000000 3.666667 1.600000

关于R Dataframe - 在时间序列中应用表达式,并将结果输出到新的数据帧中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66286961/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com