gpt4 book ai didi

r - 匹配单词串并返回不匹配的单词

转载 作者:行者123 更新时间:2023-12-03 23:07:29 24 4
gpt4 key购买 nike

我想在两列之间匹配一串单词并返回不匹配的单词。

示例数据框:

data = data.frame(animal1 = c("cat, dog, horse, mouse", "cat, dog, horse", "mouse, frog", "cat, dog, frog, cow"), animal2 = c("dog, horse, mouse", "cat, horse", "frog", "cat, dog, frog"))

我想添加一个新列“unique_animal”,结果数据框:
                 animal1           animal2 unique_animal
1 cat, dog, horse, mouse dog, horse, mouse cat
2 cat, dog, horse cat, horse dog
3 mouse, frog frog mouse
4 cat, dog, frog, cow cat, dog, frog cow

我已经尝试过这个问题的代码: Matching similar string vectors and return non-matching element
library(qualV)
common <- function(a,b) {
a2 <- strsplit(a,'')[[1]]
b2 <- strsplit(b,'')[[1]]
if(length(a2) < length(b2)) {
a2[(length(a2)+1):length(b2)] <- ' '
} else if(length(a2) > length(b2)) {
b2[(length(b2)+1):length(a2)] <- ' '
}
LCS(a2,b2)
}

result <- NULL
data$animal1 <- as.character(data$animal1)
data$animal2 <- as.character(data$animal2)
for (i in 1:nrow(data)){
data_temp <- data[i,]
z <- common(data_temp$animal1,data_temp$animal2)
paste0(z$LCS, collapse = '') # common string
x <- z$a[which(!seq(1,max(z$va)) %in% z$va)] # non-matching elements in `a`
x <- paste(x, collapse='' )
data_temp$unique_animal <- x
result <- rbind(data_temp, result)
}

产生这个:
                 animal1           animal2 unique_animal
1 cat, dog, horse, mouse dog, horse, mouse cat,
2 cat, dog, horse cat, horse , dog
3 mouse, frog frog mouse,
4 cat, dog, frog, cow cat, dog, frog ,

逗号不是问题,我可以轻松删除它们。但是当不匹配的单词位于字符串的末尾时它不起作用。出于某种原因,在这种情况下它不计算元素的总数。任何想法如何更改此代码使其不这样做?或者另一种方法?

谢谢!

最佳答案

,\\s* 处拆分列后,我们可以使用 map2做对应list之间的比较带有 setdiff 的元素

library(dplyr)
library(purrr)
library(stringr)
data %>%
mutate(unique_animal = map2_chr(strsplit(as.character(animal1), ",\\s+"),
strsplit(as.character(animal2), ",\\s+"),
~ str_c(setdiff(.x, .y), collapse=", ")))
# animal1 animal2 unique_animal
#1 cat, dog, horse, mouse dog, horse, mouse cat
#2 cat, dog, horse cat, horse dog
#3 mouse, frog frog mouse
#4 cat, dog, frog, cow cat, dog, frog cow

关于r - 匹配单词串并返回不匹配的单词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61486876/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com