gpt4 book ai didi

r - 将列表类型列与 DF 中的其他列匹配

转载 作者:行者123 更新时间:2023-12-01 23:29:48 24 4
gpt4 key购买 nike

我有一个大致结构的数据框:

         C1                   C2      C3
1 c("XXX", "Y3") "XXX" "Y31"
2 c("SFM", "DD31", "DSDW") "SFF" "DD31"

C1 列是一个列表。这是一个字符串,我将其拆分为单独的单词。其他 2 列是字符。我需要将 C2 和 C3 与 C1 进行匹配,以便在匹配的情况下(100% 匹配),将 C1 中的值替换为另一个值。例如:

第一行有 2 个匹配项,因为模糊匹配也是一个匹配项:

  1. C1~C2:用C1“XXX[TAG]”修改后的值替换C1中的“XXX”
  2. C1~C3:用C3“Y31[TAG]”修改后的值替换C1中的“Y3”

总的来说,我知道如何做到这一点:使用 for 循环、匹配函数和正则表达式,但我的知识不允许我将所有内容组合在一起。提前致谢!

已编辑

我有什么:

x <- structure(list(Description = list(c("2012", "Deere", "544K", 
"Wheel", "Loader,"), c("Caterpillar","Model", "988", "Year", "1972")),
Manufacturer = c("john deere", "caterpillar"),
Model = c("544k", "988")), .Names = c("Description", "Manufacturer", "Model"), row.names = 4:5, class = "data.frame")


#> Description Manufacturer Model
#> 4 2012, Deere, 544K, Wheel, Loader, john deere 544k
#> 5 Caterpillar, Model, 988, Year, 1972 caterpillar 988

我想要的东西:

x.new <- structure(list(Description = list(c("2012", "john deere[Manufacturer]", "544k[Model]", 
"Wheel", "Loader,"), c("caterpillar[Manufacturer]","Model", "988[Model]", "Year", "1972")),
Manufacturer = c("john deere", "caterpillar"),
Model = c("544k", "988")), .Names = c("Description", "Manufacturer", "Model"), row.names = 4:5, class = "data.frame")

#> Description Manufacturer Model
#> 4 2012, john deere[Manufacturer], 544k[Model], Wheel, Loader, john deere 544k
#> 5 caterpillar[Manufacturer], Model, 988[Model], Year, 1972 caterpillar 988

最佳答案

对于列表列,您将需要大量的 lapply 及其多元等效项 Map,它们允许您遍历列表列并返回一个列表可以重新分配为一列。例如,

df <- structure(list(C1 = list(c("XXX", "Y3"), c("SFM", "DD31", "DSDW")), 
C2 = c("XXX", "SFF"),
C3 = c("Y31", "DD31")),
.Names = c("C1", "C2", "C3"), row.names = c(NA, -2L), class = "data.frame")

df$C1_new <- Map(function(c1, c2, c3){
sapply(c1, function(x){
mtch <- grepl(x, c(c2, c3));
if (any(mtch)) {paste0(c(c2, c3)[mtch], '[', names(df)[-1][mtch], ']')} else {x}
})},
df$C1, df$C2, df$C3)

df
#> C1 C2 C3 C1_new
#> 1 XXX, Y3 XXX Y31 XXX[C2], Y31[C3]
#> 2 SFM, DD31, DSDW SFF DD31 SFM, DD31[C3], DSDW

还有许多其他方法可以设置它,包括使用像 purrrstringr 这样的包来使语法更简单和更统一。随心所欲。

要应用于列出的第二个数据集,它需要进行一些细微的编辑:

x <- structure(list(Description = list(c("2012", "Deere", "544K", "Wheel", "Loader,"), 
c("Caterpillar","Model", "988", "Year", "1972")),
Manufacturer = c("john deere", "caterpillar"),
Model = c("544k", "988")),
.Names = c("Description", "Manufacturer", "Model"), row.names = 4:5, class = "data.frame")

x$Description <- Map(function(desc, mfr, mdl){
sapply(desc, function(wrd){
mtch <- grepl(wrd, c(mfr, mdl), ignore.case = TRUE);
if (any(mtch)) {paste0(c(mfr, mdl)[mtch], '[', names(x)[-1][mtch], ']')} else {wrd}
})},
x$Description, x$Manufacturer, x$Model)

x
#> Description Manufacturer Model
#> 4 2012, john deere[Manufacturer], 544k[Model], Wheel, Loader, john deere 544k
#> 5 caterpillar[Manufacturer], Model, 988[Model], Year, 1972 caterpillar 988

关于r - 将列表类型列与 DF 中的其他列匹配,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41432385/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com