gpt4 book ai didi

r - 整理 "Benchmark"列

转载 作者:行者123 更新时间:2023-12-02 02:46:17 27 4
gpt4 key购买 nike

我有一个大部分整洁的数据框,但有 2 列包含基准,而不是将基准合并为观察结果。我该如何整理,以便将“Facility_score”和“TTP”col_names 添加为每个独特的 FYQ 和 Metric 组合的“Facility_label”下的观察值?

输入数据:

library(zoo)

dd <- structure(list(Facility_label = structure(c(1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("A", "B", "C",
"D", "Nashville"), class = "factor"), FYQ = structure(c(2017.75,
2018, 2018.25, 2018.5, 2017.75, 2018, 2018.25, 2018.5, 2018.75,
2017.75, 2018, 2018.25, 2018.5, 2018.75, 2017.75, 2018, 2018.25,
2018.5, 2018.75, 2017.75, 2018, 2018.25, 2018.5, 2018.75, 2017.75,
2018, 2018.25, 2018.5, 2018.75, 2017.75), class = "yearqtr"),
Metric = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
1L, 1L, 1L, 1L, 1L, 2L), .Label = c("Safety Recall", "Turnaround days",
"Consult Active <= 30d", "Consult Pending <- 7d", "Consult Scheduling <- 90d",
"ICB Compliance Rate", "FCA Assessment", "Minor construction execution",
"NRM funding execution", "Deficincies", "%Deficienceis corrected among corrected or action plan",
"%Deficienceis corrected or action plan", "Ratio of Hospital Staff to HR FTE",
"Turnover Rate", "GEMS no Action", "Lost time claims", "RTMS risk score",
"DOC Control", "Loaner deficiencies", "Pretreatment", "RME rate",
"SPS staff vacany rate", "Stock Inactive", "MSPV-NG", "Days to close prosthetis consult",
"%Prosthetic PO using national contracts"), class = "factor"),
Facility_score = c(84.78802993, 95.59659091, 100, 100, 77.61732852,
57.87671233, 81.28898129, 33.33333333, 31.57894737, 10.2,
7.902356902, 8.62, 11.71, 13.15, 30.98236776, 33.26086957,
31.19584055, 54.54545455, 27.27272727, 11, 17.19132653, 26.02008197,
22.29, 30.41, 89.09090909, 93.47826087, 82.10735586, 91.66666667,
87.5, 3.2), `Facility mean` = c(85.35550152, 87.31899147,
93.11498231, 100, 85.35550152, 87.31899147, 93.11498231,
100, 100, 12, 13.06073298, 12.2, 11.51, 10.56, 85.35550152,
87.31899147, 93.11498231, 100, 100, 12, 13.06073298, 12.2,
11.51, 10.56, 85.35550152, 87.31899147, 93.11498231, 100,
100, 12), TTP_score = c(100, 100, 100, 100, 100, 100, 100,
100, 100, 5.65, 5.063953488, 4.779310345, 4.47, 4.545, 100,
100, 100, 100, 100, 5.65, 5.063953488, 4.779310345, 4.47,
4.545, 100, 100, 100, 100, 100, 5.65)), row.names = c(NA,
-30L), class = c("tbl_df", "tbl", "data.frame"))

期望的输出:

dd_output <- structure(list(Facility_label = c("A", "Facility mean", "TTP score", 
"A", "Facility mean", "TTP score", "A", "Facility mean", "TTP score",
"A", "Facility mean", "TTP score", "B", "B", "B", "B", "B", "B",
"Facility mean", "TTP score", "B", "Facility mean", "TTP score",
"B", "Facility mean", "TTP score", "B", "Facility mean", "TTP score",
"B", "Facility mean", "TTP score", "C", "C", "C", "C", "C", "C",
"C", "C", "C", "C", "D", "D", "D", "D", "D", "D"), FYQ = c("2017 Q4",
"2017 Q4", "2017 Q4", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q2",
"2018 Q2", "2018 Q2", "2018 Q3", "2018 Q3", "2018 Q3", "2017 Q4",
"2018 Q1", "2018 Q2", "2018 Q3", "2018 Q4", "2017 Q4", "2017 Q4",
"2017 Q4", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q2", "2018 Q2",
"2018 Q2", "2018 Q3", "2018 Q3", "2018 Q3", "2018 Q4", "2018 Q4",
"2018 Q4", "2017 Q4", "2018 Q1", "2018 Q2", "2018 Q3", "2018 Q4",
"2017 Q4", "2018 Q1", "2018 Q2", "2018 Q3", "2018 Q4", "2017 Q4",
"2018 Q1", "2018 Q2", "2018 Q3", "2018 Q4", "2017 Q4"), Metric = c("Safety Recall",
"Safety Recall", "safety Recall", "Safety Recall", "Safety Recall",
"Safety Recall", "Safety Recall", "Safety Recall", "Safety Recall",
"Safety Recall", "Safety Recall", "Safety Recall", "Safety Recall",
"Safety Recall", "Safety Recall", "Safety Recall", "Safety Recall",
"Turnaround days", "Turnaround days", "Turnaround days", "Turnaround days",
"Turnaround days", "Turnaround days", "Turnaround days", "Turnaround days",
"Turnaround days", "Turnaround days", "Turnaround days", "Turnaround days",
"Turnaround days", "Turnaround days", "Turnaround days", "Safety Recall",
"Safety Recall", "Safety Recall", "Safety Recall", "Safety Recall",
"Turnaround days", "Turnaround days", "Turnaround days", "Turnaround days",
"Turnaround days", "Safety Recall", "Safety Recall", "Safety Recall",
"Safety Recall", "Safety Recall", "Turnaround days"), Facility_score = c(84.78802993,
85.35550152, 100, 95.59659091, 87.31899147, 100, 100, 93.11498231,
100, 100, 100, 100, 77.61732852, 57.87671233, 81.28898129, 33.33333333,
31.57894737, 10.2, 12, 5.65, 7.902356902, 13.06073298, 5.063953488,
8.62, 12.2, 4.779310345, 11.71, 11.51, 4.47, 13.15, 10.56, 4.545,
30.98236776, 33.26086957, 31.19584055, 54.54545455, 27.27272727,
11, 17.19132653, 26.02008197, 22.29, 30.41, 89.09090909, 93.47826087,
82.10735586, 91.66666667, 87.5, 3.2)), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -48L), spec = structure(list(
cols = list(X1 = structure(list(), class = c("collector_skip",
"collector")), Facility_label = structure(list(), class = c("collector_character",
"collector")), FYQ = structure(list(), class = c("collector_character",
"collector")), Metric = structure(list(), class = c("collector_character",
"collector")), Facility_score = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))

最佳答案

我们可以通过 tidyr::gather 和一些 dplyr::mutate 操作来做到这一点:

library(tidyverse)

dd %>%
mutate(ID = row_number()) %>%
gather(var, Facility_score, Facility_score:TTP_score) %>%
group_by(FYQ, Metric,
temp_ID = case_when(var == "Facility mean" ~ 1,
var == "TTP_score" ~ 2,
TRUE ~ 0)) %>%
slice(if(any(temp_ID == 0)) row_number() else 1) %>%
mutate(Facility_label = if_else(var == "Facility_score", as.character(Facility_label), var)) %>%
ungroup() %>%
arrange(ID, temp_ID) %>%
select(ID, everything(), -var, -temp_ID)

请注意,我添加了 ID 列以指示原始行号。当同一行中的所有分数合并时,这会减少困惑。

注意事项:

  1. gather 将表格从宽格式 reshape 为长格式,处理 Facility_scoreFacility meanTTP_score< 的条目 作为新的 Facility_score。创建一个变量 var 来临时存储值标签。

  2. 然后我们创建了 group_by FYQMetric 和一个临时 ID 变量 (temp_ID)通过将 Facility mean 设置为 1,将 TTP_score 设置为 2,以及 var 中的所有其他内容> 到 0

  3. 根据 temp_ID,如果它是 0,我们使用 slice 来抓取所有行,否则只抓取第一行。这有效地返回了对应于 Facility_score 的所有行,但在每个 FYQ 中只有 Facility meanTTP_score 之一 + Metric 组合。

  4. 接下来,我们将 Facility_label 替换为 var 中的相应标签。

  5. 最后,取消分组,按IDtemp_ID排列,并重新排列列顺序,同时删除 vartemp_ID(everything 当我们想把一个变量放在前面同时保持其他变量不变时很有用)。

输出:

# A tibble: 50 x 5
ID Facility_label FYQ Metric Facility_score
<int> <chr> <S3: yearqtr> <fct> <dbl>
1 1 A 2017 Q4 Safety Recall 84.8
2 1 Facility mean 2017 Q4 Safety Recall 85.4
3 1 TTP_score 2017 Q4 Safety Recall 100
4 2 A 2018 Q1 Safety Recall 95.6
5 2 Facility mean 2018 Q1 Safety Recall 87.3
6 2 TTP_score 2018 Q1 Safety Recall 100
7 3 A 2018 Q2 Safety Recall 100
8 3 Facility mean 2018 Q2 Safety Recall 93.1
9 3 TTP_score 2018 Q2 Safety Recall 100
10 4 A 2018 Q3 Safety Recall 100
11 4 Facility mean 2018 Q3 Safety Recall 100
12 4 TTP_score 2018 Q3 Safety Recall 100
13 5 B 2017 Q4 Safety Recall 77.6
14 6 B 2018 Q1 Safety Recall 57.9
15 7 B 2018 Q2 Safety Recall 81.3
16 8 B 2018 Q3 Safety Recall 33.3
17 9 B 2018 Q4 Safety Recall 31.6
18 9 Facility mean 2018 Q4 Safety Recall 100
19 9 TTP_score 2018 Q4 Safety Recall 100
20 10 B 2017 Q4 Turnaround days 10.2
# ... with 30 more rows

关于r - 整理 "Benchmark"列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54753014/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com