gpt4 book ai didi

r - 如何在R中旋转包含带有部分和子部分的列的数据框

转载 作者:行者123 更新时间:2023-12-03 14:43:24 25 4
gpt4 key购买 nike

我有一个下面提到的数据框:

structure(
list(ID = c("P-1", " P-1", "P-1", "P-2", "P-3", "P-4", "P-5", "P-6", "P-7",
"P-8"),
Date = c("2020-03-16 12:11:33", "2020-03-16 13:16:04",
"2020-03-16 06:13:55", "2020-03-16 10:03:43",
"2020-03-16 12:37:09", "2020-03-16 06:40:24",
"2020-03-16 09:46:45", "2020-03-16 12:07:44",
"2020-03-16 14:09:51", "2020-03-16 09:19:23"),
Status = c("SA", "SA", "SA", "RE", "RE", "RE", "RE", "XA", "XA", "XA"),
Flag = c("L", "L", "L", NA, "K", "J", NA, NA, "H", "G"),
Value = c(5929.81, 5929.81, 5929.81, NA, 6969.33, 740.08, NA, NA, 1524.8,
NA),
Flag2 = c("CL", "CL", "CL", NA, "RY", "", NA, NA, "", NA),
Flag3 = c(NA, NA, NA, NA, "RI", "PO", NA, "SS", "DDP", NA)),
.Names=c("ID", "Date", "Status", "Flag", "Value", "Flag2", "Flag3"),
row.names=c(NA, 10L), class="data.frame")

我正在使用下面提到的代码:

    df %>% mutate(L = ifelse(Flag == "L",1,0),
K = ifelse(Flag == "K",1,0),
# etc for Flag) %>%
mutate(sub_status = NA) %>%
mutate(sub_status = ifelse(!is.na(Flag2) & Flag3 == 0, "a", sub_status),
sub_status = ifelse(is.na(Flag2) & Flag3 != 0, "b", sub_status),
# etc for sub-status) %>%
mutate(value_class = ifelse(0 <= Value & Value <= 15000, "0-15000",
"15000-50000")) %>%
group_by(Date, status, sub_status, value_class) %>%
summarise(L = sum(L),
K = sum(K),
# etc
count = n())

这为我提供了以下输出:
    Date         Status  sub_status   value_class G H I J K L NA Count
2020-03-20 SA a 0-15000 0 0 0 0 1 1 0 2
2020-03-20 SA b 0-15000 0 0 0 0 1 0 0 1
................
................

我想使用 DF 获得以下输出,其中 Status列具有不同的 3 个值和 Flag2具有值或 [null] 或 NA,最后是 Flag3列具有 [null] 或 NA 的不同 7 个值。对于一个不同的 ID我们有多个条目 Flag3柱子。

我需要通过基于 Value 创建一个 3 组来创建以下数据框比如 0-15000、15000-50000。
  • 如果对于不同的 ID Flag2具有除 0 或 [null]/NA 以外的某些值,但 Flag3值为 0 或 [null]/NA 那么它将是 a .
  • 如果对于不同的 ID Flag3具有除 0 或 [null]/NA 以外的某些值,但 Flag2值为 0 或 [null]/NA 那么它将是 b
  • 如果对于不同的 ID,Flag2 & Flag3具有除 0 或 [Null]/NA 以外的其他值,则为 c
  • 如果对于不同的 ID,Flag2 & Flag3值为 0 或 [Null]/NA,它将是 d

  • 我想用 percent 将上面提到的 datafrmae 安排在以下结构中和 Total柱子。

    我提到了像 2/5 这样的百分比。显示状态将除以总计,而 sub_status将除以它们各自的 Status .
    16/03/2020         0 - 15000                    15000 - 50000
    Status count percent L K J H G [Null] count percent L K J H G [Null] Total
    SA 1 1/8 (12.50%) 1 0 0 0 0 0 0 - 0 0 0 0 0 0 1
    a 1 1/1(100.00%) 1 0 0 0 0 0 0 - 0 0 0 0 0 0 1
    b 0 - 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0
    c 0 - 1 0 0 0 0 0 0 - 0 0 0 0 0 0 0
    d 0 - 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0
    RE 4 50.00% 0 1 1 0 0 2 0 - 0 0 0 0 0 0 4
    a 0 - 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0
    b 1 25.00% 0 0 1 0 0 1 0 - 0 0 0 0 0 0 1
    c 1 25.00% 0 1 0 0 0 1 0 - 0 0 0 0 0 0 1
    d 2 50.00% 0 0 0 0 0 2 0 - 0 0 0 0 0 0 2
    XA 3 37.50% 0 0 0 1 1 1 0 - 0 0 0 0 0 0 3
    a 0 - 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0
    b 2 66.67% 0 0 0 1 0 1 0 - 0 0 0 0 0 0 2
    c 0 - 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0
    d 1 33.33% 0 0 0 0 1 0 0 - 0 0 0 0 0 0 1
    Total 8 100.00% 1 1 0 0 1 3 0 - 0 0 0 0 0 0 8

    如果数据框没有根据 startdate 的最新日期,我已经提到了基于最新日期 16/03/2020 的所需输出。将所有值 0 保留在输出数据帧中。百分比列仅供引用,将计算百分比值。

    另外,我想保持结构静态。例如,如果某一天不存在任何参数,则输出结构将与 0 值相同。

    例如,假设日期 17/03/2020没有任何状态为 SA 的行或子状态 c占位符将出现在输出中,值为 0 .

    最佳答案

    希望这足以让您开始,更进一步,我需要一个看起来像是来自 R 的预期输出,并进一步解释如何计算变量。

    library(tidyverse)
    df <- structure(
    list(ID = c("P-1", " P-1", "P-1", "P-2", "P-3", "P-4", "P-5", "P-6", "P-7",
    "P-8"),
    Date = c("2020-03-16 12:11:33", "2020-03-16 13:16:04",
    "2020-03-16 06:13:55", "2020-03-16 10:03:43",
    "2020-03-16 12:37:09", "2020-03-16 06:40:24",
    "2020-03-16 09:46:45", "2020-03-16 12:07:44",
    "2020-03-16 14:09:51", "2020-03-16 09:19:23"),
    Status = c("SA", "SA", "SA", "RE", "RE", "RE", "RE", "XA", "XA", "XA"),
    Flag = c("L", "L", "L", NA, "K", "J", NA, NA, "H", "G"),
    Value = c(5929.81, 5929.81, 5929.81, NA, 6969.33, 740.08, NA, NA, 1524.8,
    NA),
    Flag2 = c("CL", "CL", "CL", NA, "RY", "", NA, NA, "", NA),
    Flag3 = c(NA, NA, NA, NA, "RI", "PO", NA, "SS", "DDP", NA)),
    .Names=c("ID", "Date", "Status", "Flag", "Value", "Flag2", "Flag3"),
    row.names=c(NA, 10L), class="data.frame")

    df2 <- df %>%
    mutate(
    # add variables
    Value = ifelse(0 <= Value & Value <= 15000, "0-15000", "15000-50000"),
    substatus = case_when(
    !is.na(Flag2) & is.na(Flag3) ~ "a",
    !is.na(Flag3) & is.na(Flag2) ~ "b",
    !is.na(Flag3) & !is.na(Flag2) ~ "c",
    TRUE ~ "d"),
    # make Date an actual date rather than a timestamp
    Date = as.Date(Date),
    # remove obsolete columns
    Flag2 = NULL,
    Flag3 = NULL,
    ID = NULL,
    # renames NAs into the name of the desired column
    Flag = ifelse(is.na(Flag), "[Null]", Flag),
    # create column of 1 for pivot
    temp = 1,
    # and row id
    id = row_number()
    ) %>%
    # create new columns L K etc, this also drops the Flag col
    pivot_wider(names_from = "Flag", values_from = "temp", values_fill = list(temp=0)) %>%
    # move `[Null]` column to the end
    select(everything(), -`[Null]`, `[Null]`) %>%
    mutate(
    id = NULL,
    count = 1,
    Total = rowSums(select(., L:`[Null]`)))
    df2
    #> # A tibble: 10 x 12
    #> Date Status Value substatus L K J H G `[Null]`
    #> <date> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
    #> 1 2020-03-16 SA 0-15~ a 1 0 0 0 0 0
    #> 2 2020-03-16 SA 0-15~ a 1 0 0 0 0 0
    #> 3 2020-03-16 SA 0-15~ a 1 0 0 0 0 0
    #> 4 2020-03-16 RE <NA> d 0 0 0 0 0 1
    #> 5 2020-03-16 RE 0-15~ c 0 1 0 0 0 0
    #> 6 2020-03-16 RE 0-15~ c 0 0 1 0 0 0
    #> 7 2020-03-16 RE <NA> d 0 0 0 0 0 1
    #> 8 2020-03-16 XA <NA> b 0 0 0 0 0 1
    #> 9 2020-03-16 XA 0-15~ c 0 0 0 1 0 0
    #> 10 2020-03-16 XA <NA> d 0 0 0 0 1 0
    #> # ... with 2 more variables: count <dbl>, Total <dbl>

    # As you didn't tell what to do with NA values so I left them as NA

    bind_rows(
    df2 %>%
    # add missing combinations of abcd
    complete(nesting(Date, Status, Value), substatus) %>%
    group_by(Date, Value, Status, substatus) %>%
    summarize_all(~sum(., na.rm=TRUE)) %>%
    group_by(Status, Value) %>%
    mutate(percent = paste(round(100 * Total / sum(Total), 2), "%")) %>%
    ungroup(),
    df2 %>%
    mutate(substatus = Status, Status = paste0(Status, "_")) %>%
    group_by(Date, Value, Status, substatus) %>%
    mutate(count = n()) %>%
    group_by(count, add = TRUE) %>%
    summarize_all(~sum(., na.rm=TRUE)) %>%
    group_by(Value) %>%
    mutate(percent = paste(round(100 * Total / sum(Total), 2), "%"))
    ) %>%
    arrange(Date, Value, desc(Status)) %>%
    mutate(Status = NULL) %>%
    rename(Status = substatus) %>%
    print(n=Inf)
    #> # A tibble: 25 x 12
    #> Date Value Status L K J H G `[Null]` count Total
    #> <date> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
    #> 1 2020-03-16 0-15~ XA 0 0 0 1 0 0 1 1
    #> 2 2020-03-16 0-15~ a 0 0 0 0 0 0 0 0
    #> 3 2020-03-16 0-15~ b 0 0 0 0 0 0 0 0
    #> 4 2020-03-16 0-15~ c 0 0 0 1 0 0 1 1
    #> 5 2020-03-16 0-15~ d 0 0 0 0 0 0 0 0
    #> 6 2020-03-16 0-15~ SA 3 0 0 0 0 0 3 3
    #> 7 2020-03-16 0-15~ a 3 0 0 0 0 0 3 3
    #> 8 2020-03-16 0-15~ b 0 0 0 0 0 0 0 0
    #> 9 2020-03-16 0-15~ c 0 0 0 0 0 0 0 0
    #> 10 2020-03-16 0-15~ d 0 0 0 0 0 0 0 0
    #> 11 2020-03-16 0-15~ RE 0 1 1 0 0 0 2 2
    #> 12 2020-03-16 0-15~ a 0 0 0 0 0 0 0 0
    #> 13 2020-03-16 0-15~ b 0 0 0 0 0 0 0 0
    #> 14 2020-03-16 0-15~ c 0 1 1 0 0 0 2 2
    #> 15 2020-03-16 0-15~ d 0 0 0 0 0 0 0 0
    #> 16 2020-03-16 <NA> XA 0 0 0 0 1 1 2 2
    #> 17 2020-03-16 <NA> a 0 0 0 0 0 0 0 0
    #> 18 2020-03-16 <NA> b 0 0 0 0 0 1 1 1
    #> 19 2020-03-16 <NA> c 0 0 0 0 0 0 0 0
    #> 20 2020-03-16 <NA> d 0 0 0 0 1 0 1 1
    #> 21 2020-03-16 <NA> RE 0 0 0 0 0 2 2 2
    #> 22 2020-03-16 <NA> a 0 0 0 0 0 0 0 0
    #> 23 2020-03-16 <NA> b 0 0 0 0 0 0 0 0
    #> 24 2020-03-16 <NA> c 0 0 0 0 0 0 0 0
    #> 25 2020-03-16 <NA> d 0 0 0 0 0 2 2 2
    #> # ... with 1 more variable: percent <chr>

    关于r - 如何在R中旋转包含带有部分和子部分的列的数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61145404/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com