gpt4 book ai didi

r - 根据日期合并长格式的两个数据帧

转载 作者:行者123 更新时间:2023-12-03 17:12:10 24 4
gpt4 key购买 nike

我有 2 个数据框,一个( df1 )记录每天发生的不同事件,另一个( df2 )记录白天发生的事件的属性。

来自 df1可以识别事件的重复发生以及持续时间。一天开始的时间由 Date 指定多变的。

例如:

  • id 12 事件从第 1 天开始,到第 7 天结束。在这种情况下,出现次数为 7,持续时间为 11。
  • id 123 一周从第 5 天开始,到第 7 天结束;由于第 6 天有间隔天数且持续时间为 6 并且 id 123(从第 6 天开始到第 7 天结束)连续发生 2 次且持续时间为 6,因此重复发生。

  • df1变量 Date 定义记录开始的日期。例如 id 12 记录从第 1 天开始,依此类推。

    我想确定在连续发生期间是否有关于 df2 中的事件属性的记录。 .

    例如 id 12,发生了 7 次,持续时间为 12 有记录为星期三(在 df1 中为第 3 天),该记录对应于连续发生的第 3 天。对于 id 123 没有数据(例如没有连续发生),但是对于 id 10 的 6 天发生和持续时间 18 有第 6 天的记录。

    DF1:
    id   day1 day2 day3 day4 day5 day6  day7   Date
    12 2 1 2 1 1 3 1 Mon
    123 0 3 0 3 3 0 3 Fri
    10 0 3 3 3 3 3 3 Sat

    DF2:
        id   c1 c2  Date
    12 3 3 Wednesday
    123 3 2 Fri
    10 3 1 Sat

    结果:
     id c1 c2  Occurrence Position
    12 3 3 7 3
    123 0 0 0 0
    10 3 1 2 1

    样本数据:df1
    structure(list(id = c(12L, 123L, 10L), day1 = c(2L, 0L, 3L), 
    day2 = c(1L, 3L, 3L), day3 = c(2L, 0L, 3L), day4 = c(1L,
    3L, 3L), day5 = c(1L, 3L, 3L), day6 = c(3L, 0L, 3L), day7 = c(1L,
    3L, 3L), Date = c("Monday", "Friday", "Saturday")), row.names = c(NA,
    -3L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x000002a81a571ef0>)

    df2:
    structure(list(id = c(12, 123, 10), c1 = c(3, 3, 3), c2 = c(3, 
    2, 1), Date = structure(c(3L, 1L, 2L), .Label = c("Friday", "Saturday",
    "Wednesday"), class = "factor")), row.names = c(NA, -3L), class = "data.frame")

    最佳答案

    dplyr的解决方案(也许不是最短的):

    # library
    library(tidyverse)

    # get data
    df1 <- structure(list(id = c(12L, 123L, 10L),
    day1 = c(2L, 0L, 3L),
    day2 = c(1L, 3L, 3L),
    day3 = c(2L, 0L, 3L),
    day4 = c(1L,3L, 3L),
    day5 = c(1L, 3L, 3L),
    day6 = c(3L, 0L, 3L),
    day7 = c(1L,3L, 3L),
    Date = c("Monday", "Friday", "Saturday")),
    row.names = c(NA,-3L), class = c("data.table", "data.frame"))


    df2 <- structure(list(id = c(12, 123, 10),
    c1 = c(3, 3, 3),
    c2 = c(3, 2, 1),
    Date = structure(c(3L, 1L, 2L), .Label = c("Friday", "Saturday","Wednesday"),
    class = "factor")), row.names = c(NA, -3L), class = "data.frame")


    # change days to nummeric (will help you later)
    df1 %>% mutate(
    Date_nr_df1=case_when(
    Date=="Monday" ~ 1,
    Date=="Tuesday" ~2,
    Date=="Wednesday" ~3,
    Date=="Thursday" ~4,
    Date=="Friday" ~5,
    Date=="Saturday" ~6,
    Date=="Sunday" ~7)) -> df1

    df2 %>% mutate(
    Date_nr_df2=case_when(
    Date=="Monday" ~ 1,
    Date=="Tuesday" ~2,
    Date=="Wednesday" ~3,
    Date=="Thursday" ~4,
    Date=="Friday" ~5,
    Date=="Saturday" ~6,
    Date=="Sunday" ~7)) -> df2

    # combine data by the id column
    left_join(df1,df2, by=c("id")) -> df

    # adjust data
    df %>%
    group_by(id) %>% # to make changes per row
    mutate(days=paste0(day1,day2,day3,day4,day5,day6,day7)) %>% #pastes the values together
    mutate(days_correct=substring(days,Date_nr_df1)) %>% # applies the start day
    mutate(Occurrence_seq=str_split(days_correct, fixed("0"))[[1]][1]) %>% # extracts all days before 0
    mutate(Occurrence=nchar(Occurrence_seq)) %>% ## counts these days
    mutate(Occurrence=case_when(Occurrence==1 ~ 0, TRUE ~ as.numeric(Occurrence))) %>% # sets Occurrence to 0 if there is no consecutive occurrence
    mutate(Position=Date_nr_df2-Date_nr_df1+1) %>% ## calculates the position you wanted
    mutate(c1=case_when(Occurrence==0 ~0, TRUE ~ c1),
    c2=case_when(Occurrence==0 ~0, TRUE ~c1),
    Position=case_when(Occurrence==0 ~ 0, TRUE ~ as.numeric(Position))) %>%
    ungroup() %>% ungroups the df
    select(id,c1,c2,Occurrence,Position) # selects the wanted variables
    #> # A tibble: 3 x 5
    #> id c1 c2 Occurrence Position
    #> <dbl> <dbl> <dbl> <dbl> <dbl>
    #> 1 12 3 3 7 3
    #> 2 123 0 0 0 0
    #> 3 10 3 3 2 1

    创建于 2020-04-10 由 reprex package (v0.2.1)

    关于r - 根据日期合并长格式的两个数据帧,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61138677/

    24 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com