gpt4 book ai didi

r - 使用 R 中的 data.table 进行有效分组

转载 作者:行者123 更新时间:2023-12-04 09:27:39 25 4
gpt4 key购买 nike

可以缩写以下脚本:

  • 不要使用这么多的链式操作。
  • 尽可能避免使用 .SD
  • library(data.table)
    DT<-structure(list(title = c("a", "a", "a", "a", "b", "b", "b", "b", "c", "c", "c", "c", "d", "d", "d", "d"), date = c("12-07-2020", "13-07-2020", "14-07-2020", "15-07-2020", "12-07-2020", "13-07-2020",
    "14-07-2020", "15-07-2020", "12-07-2020", "13-07-2020", "14-07-2020", "15-07-2020",
    "12-07-2020", "13-07-2020", "14-07-2020", "15-07-2020"),
    bucket = c(1, 1, 1, 4, 9, 7, 10, 10, 8, 5, 5, 5, 8, 10, 9, 10),
    score = c(86, 22, 24, 54, 66, 76, 43, 97, 9, 53, 45, 40, 21, 99, 91, 90)),
    row.names = c(NA, -16L), class = c("data.table","data.frame"))

    DT[DT[, .I[bucket == min(bucket)], by = title]$V1]
    DT[, .SD[which(bucket == min(bucket))], by =title][,
    `:=`(avg_score = mean(score)), by = .(title)][,
    .SD[.N,c(1,2,4)], by = .(title)]

    原始代码是使用 dplyr 的脚本。: RStudio Community
    tt <- data %>% 
    group_by(title) %>%
    filter(bucket == min(bucket)) %>%
    mutate(avg_score = mean(score)) %>%
    slice_max(date) %>%
    select(-score)
    >
    title date bucket avg_score
    <chr> <chr> <dbl> <dbl>
    1 a 14-07-2020 1 44
    2 b 13-07-2020 7 76
    3 c 15-07-2020 5 46
    4 d 12-07-2020 8 21
    >

    最佳答案

    这是一个没有链接的解决方案,也没有 .SD :

    # Convert from character to Date to be able to select the max
    DT[, date := as.Date(date, "%d-%m-%Y")]

    DT[,
    {
    mb <- which(bucket == min(bucket))
    .(
    date = max(date[mb]), bucket = bucket[mb][1L], avg_score = mean(score[mb])
    )
    },
    by = title]

    # title date bucket avg_score
    # 1: a 2020-07-14 1 44
    # 2: b 2020-07-13 7 76
    # 3: c 2020-07-15 5 46
    # 4: d 2020-07-12 8 21

    关于r - 使用 R 中的 data.table 进行有效分组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62950785/

    25 4 0