gpt4 book ai didi

r - 如何使用R滞后一个整数变量?

转载 作者:行者123 更新时间:2023-12-04 10:56:29 24 4
gpt4 key购买 nike

假设我有以下历史联赛结果:

Season <- c(1,1,2,2,3,3,4,4,5,5)
Team <- c("Diverpool","Deverton","Diverpool","Deverton","Diverpool","Deverton","Diverpool","Deverton","Diverpool","Deverton")
End.Rank <- c(8,17,4,15,3,6,4,16,3,17)
PLRank <- cbind(Season,Team,End.Rank)

我想(有效地)根据两个标准为每个团队创建一个一年的滞后变量:
  • 滞后 End.Rank来自 Season (即 t-1 以 Season 作为时间变量)
  • 按团队分开(德弗顿落后 End.Rank 与 Diverpool 落后 End.Rank)

  • 本质上,我希望输出如下:
    l.End.Rank <- c(NA,NA,8,17,4,15,3,6,4,16)

    试过 lag() ,并且在尝试这样做时失败了 for()此刻循环。

    最佳答案

    您可以尝试以下方法之一...

    请注意,我使用了 data.frame而不是 matrix您可以通过 cbind 获得:

    PLRank <- data.frame(Season, Team, End.Rank)

    使用“数据表”:
    library(data.table)
    setDT(PLRank)[, l.End.Rank := shift(End.Rank), by = .(Team)][]
    # Season Team End.Rank l.End.Rank
    # 1: 1 Diverpool 8 NA
    # 2: 1 Deverton 17 NA
    # 3: 2 Diverpool 4 8
    # 4: 2 Deverton 15 17
    # 5: 3 Diverpool 3 4
    # 6: 3 Deverton 6 15
    # 7: 4 Diverpool 4 3
    # 8: 4 Deverton 16 6
    # 9: 5 Diverpool 3 4
    # 10: 5 Deverton 17 16

    或者,使用“dplyr”:
    library(dplyr)
    PLRank %>%
    group_by(Team) %>%
    mutate(l.End.Rank = lag(End.Rank))
    # Source: local data frame [10 x 4]
    # Groups: Team [2]
    #
    # Season Team End.Rank l.End.Rank
    # (dbl) (fctr) (dbl) (dbl)
    # 1 1 Diverpool 8 NA
    # 2 1 Deverton 17 NA
    # 3 2 Diverpool 4 8
    # 4 2 Deverton 15 17
    # 5 3 Diverpool 3 4
    # 6 3 Deverton 6 15
    # 7 4 Diverpool 4 3
    # 8 4 Deverton 16 6
    # 9 5 Diverpool 3 4
    # 10 5 Deverton 17 16

    更新

    老实说,我完全误读了您想要按季节分组的内容。

    如果按季节滞后,也许您应该考虑扩大数据,以便每个季节只有一行。那么按季节滞后就很容易了。

    例子:

    在这里,我们使用 dcast从“data.table”到“Team”传播“End.Rank”的值。然后,我们只滞后新创建的列。
    library(data.table)
    teams <- as.character(unique(PLRank$Team))
    dcast(as.data.table(PLRank), Season ~ Team, value.var = "End.Rank")[
    , (teams) := lapply(.SD, shift), .SDcols = teams][]
    # Season Deverton Diverpool
    # 1: 1 NA NA
    # 2: 2 17 8
    # 3: 3 15 4
    # 4: 4 6 3
    # 5: 5 16 4

    或者,如果您希望团队名称和值都采用广泛的形式,您可以尝试以下操作:
    dcast(as.data.table(PLRank)[, ind := sequence(.N), by = Season], 
    Season ~ ind, value.var = c("Team", "End.Rank"))[
    , c("End.Rank_1", "End.Rank_2") := lapply(.SD, shift),
    .SDcols = c("End.Rank_1", "End.Rank_2")][]
    # Season Team_1 Team_2 End.Rank_1 End.Rank_2
    # 1: 1 Diverpool Deverton NA NA
    # 2: 2 Diverpool Deverton 8 17
    # 3: 3 Diverpool Deverton 4 15
    # 4: 4 Diverpool Deverton 3 6
    # 5: 5 Diverpool Deverton 4 16

    “dplyr”中的方法类似。由于您要使用广泛的表格,因此您还需要加载“tidyr”。
    library(dplyr)
    library(tidyr)
    PLRank %>%
    spread(Team, End.Rank) %>%
    mutate_each(funs(lag), -Season)
    # Season Deverton Diverpool
    # 1 1 NA NA
    # 2 2 17 8
    # 3 3 15 4
    # 4 4 6 3
    # 5 5 16 4

    关于r - 如何使用R滞后一个整数变量?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34558637/

    24 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com