gpt4 book ai didi

r - 如何在R中匹配不同组合的字符串

转载 作者:行者123 更新时间:2023-12-04 09:34:28 26 4
gpt4 key购买 nike

我有一个数据框 df,单词由 + 分隔,但我不希望在执行分析时顺序很重要。例如,我有

df <- as.data.frame(
c(("Yellow + Blue + Green"),
("Blue + Yellow + Green"),
("Green + Yellow + Blue")))

目前,它们被视为三个独特的响应,但我希望它们被视为相同。我尝试过蛮力方法,例如 ifelse 语句,但它们不适合大型数据集。

有没有一种方法可以重新排列这些术语,使它们匹配,或者像反向 combn 函数那样识别它们是相同的组合但顺序不同?

谢谢!

最佳答案

#DATA
df <- data.frame(cols =
c(("Yellow + Blue + Green"),
("Blue + Yellow + Green"),
("Green + Yellow + Blue"),
("Green + Yellow + Red")), stringsAsFactors = FALSE)

#Split, sort, and then paste together
df$group = sapply(df$cols, function(a)
paste(sort(unlist(strsplit(a, " \\+ "))), collapse = ", "))
df
# cols group
#1 Yellow + Blue + Green Blue, Green, Yellow
#2 Blue + Yellow + Green Blue, Green, Yellow
#3 Green + Yellow + Blue Blue, Green, Yellow
#4 Green + Yellow + Red Green, Red, Yellow

#Or you can convert to factors too (and back to numeric, if you like)
df$group2 = as.numeric(as.factor(sapply(df$cols, function(a)
paste(sort(unlist(strsplit(a, " \\+ "))), collapse = ", "))))
df
# cols group group2
#1 Yellow + Blue + Green Blue, Green, Yellow 1
#2 Blue + Yellow + Green Blue, Green, Yellow 1
#3 Green + Yellow + Blue Blue, Green, Yellow 1
#4 Green + Yellow + Red Green, Red, Yellow 2

关于r - 如何在R中匹配不同组合的字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44709253/

26 4 0