gpt4 book ai didi

r - 将列中以逗号分隔的字符串拆分为单独的行

转载 作者:行者123 更新时间:2023-12-03 04:56:20 24 4
gpt4 key购买 nike

我有一个数据框,如下所示:

data.frame(director = c("Aaron Blaise,Bob Walker", "Akira Kurosawa", 
"Alan J. Pakula", "Alan Parker", "Alejandro Amenabar", "Alejandro Gonzalez Inarritu",
"Alejandro Gonzalez Inarritu,Benicio Del Toro", "Alejandro González Iñárritu",
"Alex Proyas", "Alexander Hall", "Alfonso Cuaron", "Alfred Hitchcock",
"Anatole Litvak", "Andrew Adamson,Marilyn Fox", "Andrew Dominik",
"Andrew Stanton", "Andrew Stanton,Lee Unkrich", "Angelina Jolie,John Stevenson",
"Anne Fontaine", "Anthony Harvey"), AB = c('A', 'B', 'A', 'A', 'B', 'B', 'B', 'A', 'B', 'A', 'B', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'B', 'A'))

如您所见,director 列中的某些条目是用逗号分隔的多个名称。我想将这些条目分成单独的行,同时保留另一列的值。例如,上面数据框中的第一行应分为两行,director 列中各有一个名称,AB 列中各有一个名称.

最佳答案

几种替代方案:

1) 两种方式 :

library(data.table)
# method 1 (preferred)
setDT(v)[, lapply(.SD, function(x) unlist(tstrsplit(x, ",", fixed=TRUE))), by = AB
][!is.na(director)]
# method 2
setDT(v)[, strsplit(as.character(director), ",", fixed=TRUE), by = .(AB, director)
][,.(director = V1, AB)]

2) a /组合:

library(dplyr)
library(tidyr)
v %>%
mutate(director = strsplit(as.character(director), ",")) %>%
unnest(director)

3) 与 仅:tidyr 0.5.0 (及更高版本),您也可以只使用 separate_rows:

separate_rows(v, director, sep = ",")

您可以使用convert = TRUE参数自动将数字转换为数字列。

tidyr_1.3.0 (及更高版本),您可以使用 separate_longer_delim (并且 separate_rows 现已被取代):

separate_longer_delim(v, director, delim = ",")

4) 基数为 R:

# if 'director' is a character-column:
stack(setNames(strsplit(df$director,','), df$AB))

# if 'director' is a factor-column:
stack(setNames(strsplit(as.character(df$director),','), df$AB))

关于r - 将列中以逗号分隔的字符串拆分为单独的行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13773770/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com