gpt4 book ai didi

r - 如何从R中的字符串中删除特定模式中的重复单词

转载 作者:行者123 更新时间:2023-12-04 09:15:00 26 4
gpt4 key购买 nike

我的目标是仅从字符串集中删除括号中的重复单词。

a = c( 'I (have|has|have) certain (words|word|worded|word) certain',
'(You|You|Youre) (can|cans|can) do this (works|works|worked)',
'I (am|are|am) (sure|sure|surely) you know (what|when|what) (you|her|you) should (do|do)' )

我想要的就是这样

a
[1]'I (have|has) certain (words|word|worded) certain'
[2]'(You|Youre) (can|cans) do this (works|worked)'
[3]'I (am|are) pretty (sure|surely) you know (what|when) (you|her) should (do|)'

为了得到结果,我使用了这样的代码

a = gsub('\\|', " | ",  a)
a = gsub('\\(', "( ", a)
a = gsub('\\)', " )", a)
a = vapply(strsplit(a, " "), function(x) paste(unique(x), collapse = " "), character(1L))

但是,它导致了不良输出。

a    
[1] "I ( have | has ) certain words word worded"
[2] "( You | Youre ) can cans do this works worked"
[3] "I ( am | are ) sure surely you know what when her should do"

为什么我的代码删除了位于字符串后半部分的括号?我应该怎么做才能得到我想要的结果?

最佳答案

我们可以使用gsubfn。在这里,我们的想法是通过匹配左括号(\\( 必须转义括号,因为它是元字符)后跟一个或多个不是关闭的字符来选择括号内的字符括号 ([^)]+),将其捕获为括号内的组。在替换中,我们用strsplit分割字符组(x),unlist输出list,得到unique 元素和 paste 一起

library(gsubfn)
gsubfn("\\(([^)]+)", ~paste0("(", paste(unique(unlist(strsplit(x,
"[|]"))), collapse="|")), a)
#[1] "I (have|has) certain (words|word|worded) certain"
#[2] "(You|Youre) (can|cans) do this (works|worked)"
#[3] "I (am|are) (sure|surely) you know (what|when) (you|her) should (do)"

关于r - 如何从R中的字符串中删除特定模式中的重复单词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41940894/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com