gpt4 book ai didi

r - 带有pivot_longer : Multiple columns into two columns 的整洁数据集

转载 作者:行者123 更新时间:2023-12-04 10:14:38 29 4
gpt4 key购买 nike

我大家,

我目前正在学习 R 并尝试使用 tidyverse 包中的 pivot_longer() 整理数据集。

我有这个小玩意

title               actor_1    actor_2    actor_3     actor_1_FB_likes actor_2_FB_likes actor_3_FB_likes
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 Avatar  CCH Pound… Joel Davi… Wes Studi 1000 936 855
2 Pirates of the Car… Johnny De… Orlando B… Jack Daven… 40000 5000 1000
3 The Dark Knight Ri… Tom Hardy Christian… Joseph Gor… 27000 23000 23000
4 John Carter  Daryl Sab… Samantha … Polly Walk… 640 632 530
5 Spider-Man 3  J.K. Simm… James Fra… Kirsten Du… 24000 11000 4000
6 Tangled  Brad Garr… Donna Mur… M.C. Gainey 799 553 284

我想将其更改为以下格式:
      title          actor_name    num_likes   
<chr> <chr> <dbl>
1 Avatar  CCH Pounder 1000
2 Avatar  Joel David Moore 936
2 Avatar  Wes Studi 855

等等......不幸的是我被卡住了。
无论我尝试什么,我都以某种方式结束了这样的格式:
title          actor_num actor_name       actor_likes      num_likes
<chr> <chr> <chr> <chr> <dbl>
1 Avatar actor_1 CCH Pounder actor_1_FB_likes 1000
2 Avatar actor_1 CCH Pounder actor_2_FB_likes 936
3 Avatar actor_1 CCH Pounder actor_3_FB_likes 855
4 Avatar actor_2 Joel David Moore actor_1_FB_likes 1000
5 Avatar actor_2 Joel David Moore actor_2_FB_likes 936
6 Avatar actor_2 Joel David Moore actor_3_FB_likes 855
7 Avatar actor_3 Wes Studi actor_1_FB_likes 1000
8 Avatar actor_3 Wes Studi actor_2_FB_likes 936
9 Avatar actor_3 Wes Studi actor_3_FB_likes 855

我的最后一次尝试包括以下步骤:
exercise8 <- exercise8 %>% pivot_longer(cols= actor_1:actor_3, names_to='actor_num', values_to='actor_name')
exercise8 <- exercise8 %>% pivot_longer(cols= actor_1_FB_likes:actor_3_FB_likes, names_to='actor_likes', values_to='num_likes')

我当然可以删除列 actor_num 和 actor_likes 但这仍然不会导致所需的格式。

任何人都可以帮忙吗?我是完全错误的开始还是遗漏了最后一步?
先感谢您!

最佳答案

如果名称和喜欢的列的列名末尾的数字值一致,则可能会更容易。

添加一行以重命名列名称,并在结尾处始终使用“_1”、“_2”等。

然后pivot_longer使用正则表达式模式在末尾假设数字。

library(tidyverse)

names(df) <- sub("(\\d+)_(\\w*)", "\\2_\\1", names(df))

df %>%
pivot_longer(starts_with("actor"),
names_to = c(".value", "group"),
names_pattern = "(\\w+)_(\\d+)$")

输出
# A tibble: 18 x 4
title group actor actor_FB_likes
<chr> <chr> <chr> <int>
1 Avatar 1 CCH_Pound… 1000
2 Avatar 2 Joel_Davi… 936
3 Avatar 3 Wes_Studi 855
4 Pirates_of_the_Car… 1 Johnny_De… 40000
5 Pirates_of_the_Car… 2 Orlando_B… 5000
6 Pirates_of_the_Car… 3 Jack_Daven… 1000
7 The_Dark_Knight_Ri… 1 Tom_Hardy 27000
8 The_Dark_Knight_Ri… 2 Christian… 23000
9 The_Dark_Knight_Ri… 3 Joseph_Gor… 23000
10 John_Carter 1 Daryl_Sab… 640
11 John_Carter 2 Samantha_… 632
12 John_Carter 3 Polly_Walk… 530
13 Spider-Man_3 1 J.K._Simm… 24000
14 Spider-Man_3 2 James_Fra… 11000
15 Spider-Man_3 3 Kirsten_Du… 4000
16 Tangled 1 Brad_Garr… 799
17 Tangled 2 Donna_Mur… 553
18 Tangled 3 M.C._Gainey 284

数据
df <- structure(list(title = c("Avatar", "Pirates_of_the_Car…", "The_Dark_Knight_Ri…", 
"John_Carter", "Spider-Man_3", "Tangled"), actor_1 = c("CCH_Pound…",
"Johnny_De…", "Tom_Hardy", "Daryl_Sab…", "J.K._Simm…",
"Brad_Garr…"), actor_2 = c("Joel_Davi…", "Orlando_B…",
"Christian…", "Samantha_…", "James_Fra…", "Donna_Mur…"
), actor_3 = c("Wes_Studi", "Jack_Daven…", "Joseph_Gor…",
"Polly_Walk…", "Kirsten_Du…", "M.C._Gainey"), actor_1_FB_likes = c(1000L,
40000L, 27000L, 640L, 24000L, 799L), actor_2_FB_likes = c(936L,
5000L, 23000L, 632L, 11000L, 553L), actor_3_FB_likes = c(855L,
1000L, 23000L, 530L, 4000L, 284L)), class = "data.frame", row.names = c(NA,
-6L))

关于r - 带有pivot_longer : Multiple columns into two columns 的整洁数据集,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61138600/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com