gpt4 book ai didi

从宽变量组 reshape 到长变量组

转载 作者:行者123 更新时间:2023-12-05 08:55:01 26 4
gpt4 key购买 nike

这个问题与已经存在的 question 非常相似。

但是我无法将其扩展到多组变量。这是我正在处理的数据集

A tibble: 12 x 9
Month Cabo_BU_PCT Acapulco_BU_PCT Cabo_LOS_AVG Acapulco_LOS_AVG BED_BUGS_Cabo BED_BUGS_Acapulco TOTAL_OCCUPIED_Cabo TOTAL_OCCUPIED_Acapulco

1 0.6470034 0.6260116 5.223000 4.307667 5 3 19216 6498
2 0.6167027 0.6777457 5.893571 4.247500 3 0 17095 6566
3 0.6372108 0.6348126 5.229677 4.327742 5 1 19556 6809
4 0.6357912 0.6548170 5.356667 4.220000 4 6 18883 6797
5 0.6449006 0.6409659 5.344194 4.162903 2 5 19792 6875
6 0.6747811 0.6935453 5.812667 4.362000 4 3 20041 7199
7 0.6697947 0.6932687 5.544516 4.462903 5 6 20556 7436
8 0.6595960 0.6777923 5.260323 4.135806 0 7 20243 7270
9 0.6792256 0.6863198 5.424333 4.133333 5 0 20173 7124
10 0.6976214 0.7370875 5.419677 4.350000 3 3 21410 7906
11 0.6600337 0.6615607 5.450000 4.184333 3 2 19603 6867
12 0.6761812 0.6773261 5.347097 4.318710 2 2 20752 7265

我的目标是将其 reshape 为如下所示的长格式,其中列 Cabo_BU_PCT Acapulco_BU_PCT 被转换为列名称 BU_PCT 下的长格式,类似的列, Cabo_LOS_AVG Acapulco_LOS_AVG 被转换为列名 LOS_AVG 下的长格式,依此类推。

  Month    Location    BU_PCT      LOS_AVG     BED_BUGS       TOTAL_OCCUPIED
1 Cabo 0.6470034 5.223000 5 19216
1 Acapulco 0.6260116 4.307667 3 6498
2 Cabo 0.6167027 5.893571 3 17095
2 Acapulco 0.6777457 4.247500 0 6566
.
.
.
12 Cabo 0.6761812 5.347097 2 20752
12 Acapulco 0.6773261 4.318710 2 7265

非常感谢任何帮助 reshape 此数据框的人。谢谢。

========数据集===========

df_wide <- structure(list(Month = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
), Cabo_BU_PCT = c(0.647003367003367, 0.616702741702742, 0.637210817855979,
0.635791245791246, 0.644900619094168, 0.674781144781145, 0.669794721407625,
0.65959595959596, 0.679225589225589, 0.69762137504073, 0.66003367003367,
0.676181166503747), Acapulco_BU_PCT = c(0.626011560693642, 0.677745664739884,
0.634812604885325, 0.654816955684008, 0.640965877307477, 0.69354527938343,
0.693268692895767, 0.677792280440052, 0.686319845857418, 0.737087451053515,
0.661560693641619, 0.677326123438374), Cabo_LOS_AVG = c(5.223,
5.89357142857143, 5.22967741935484, 5.35666666666667, 5.3441935483871,
5.81266666666667, 5.54451612903226, 5.26032258064516, 5.42433333333333,
5.41967741935484, 5.45, 5.34709677419355), Acapulco_LOS_AVG = c(4.30766666666667,
4.2475, 4.32774193548387, 4.22, 4.16290322580645, 4.362, 4.46290322580645,
4.1358064516129, 4.13333333333333, 4.35, 4.18433333333333, 4.31870967741935
), BED_BUGS_Cabo = c(5, 3, 5, 4, 2, 4, 5, 0, 5, 3, 3, 2), BED_BUGS_Acapulco = c(3,
0, 1, 6, 5, 3, 6, 7, 0, 3, 2, 2), TOTAL_OCCUPIED_Cabo = c(19216,
17095, 19556, 18883, 19792, 20041, 20556, 20243, 20173, 21410,
19603, 20752), TOTAL_OCCUPIED_Acapulco = c(6498, 6566, 6809,
6797, 6875, 7199, 7436, 7270, 7124, 7906, 6867, 7265)), class = c("tbl_df",
"tbl", "data.frame"), .Names = c("Month", "Cabo_BU_PCT", "Acapulco_BU_PCT",
"Cabo_LOS_AVG", "Acapulco_LOS_AVG", "BED_BUGS_Cabo", "BED_BUGS_Acapulco",
"TOTAL_OCCUPIED_Cabo", "TOTAL_OCCUPIED_Acapulco"), row.names = c(NA,
-12L))

最佳答案

如果你只有两个位置,你可以把它们放在正则表达式中,考虑到它们可能在名称的开头或结尾:

library(tidyverse)

df_wide %>%
gather(variable, value, -Month) %>%
mutate(location = sub('.*(Cabo|Acapulco).*', '\\1', variable),
variable = sub('_?(Cabo|Acapulco)_?', '', variable)) %>%
spread(variable, value)
#> # A tibble: 24 x 6
#> Month location BED_BUGS BU_PCT LOS_AVG TOTAL_OCCUPIED
#> * <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 1 Acapulco 3 0.6260116 4.307667 6498
#> 2 1 Cabo 5 0.6470034 5.223000 19216
#> 3 2 Acapulco 0 0.6777457 4.247500 6566
#> 4 2 Cabo 3 0.6167027 5.893571 17095
#> 5 3 Acapulco 1 0.6348126 4.327742 6809
#> 6 3 Cabo 5 0.6372108 5.229677 19556
#> 7 4 Acapulco 6 0.6548170 4.220000 6797
#> 8 4 Cabo 4 0.6357912 5.356667 18883
#> 9 5 Acapulco 5 0.6409659 4.162903 6875
#> 10 5 Cabo 2 0.6449006 5.344194 19792
#> # ... with 14 more rows

关于从宽变量组 reshape 到长变量组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47425451/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com