gpt4 book ai didi

r - 使用 r 根据模式列表中的精确匹配拆分文本

转载 作者:行者123 更新时间:2023-12-04 12:07:33 25 4
gpt4 key购买 nike

我有文字和图案。

text <- "It is only a very poor quality car that can give big problems with automatic gearbox" 
patterns <- c("very poor","big problems")

拆分文本

unlist(strsplit(text, "(\\s+)|(?!')(?=[[:punct:]])", perl = TRUE))

输出:

[1] "It"        "is"        "only"      "a"         "very"      "poor"      "quality"   "car"       "that"      "can"      
[11] "give" "big" "problems" "with" "automatic" "gearbox"

我需要的是匹配句子中的模式列表而不是“非常”“差”变成“非常差”与“大问题”相同。

示例输出:

[1] "It"     "is"     "only"    "a"    "very poor"   "quality"   "car"  "that"   "can"      
[10] "give" "big problems" "with" "automatic" "gearbox"

我应该怎么做?

最佳答案

这是一种方法:

library(stringr)
text <- "It is only a very poor quality car that can give big problems with automatic gearbox"
patterns <- c("very poor","big problems")
patterns_ns <- setNames(str_replace_all(patterns, " ", "&&"), patterns)
text_ns <- str_replace_all(text, patterns_ns)
text_split <- str_replace_all(unlist(str_split(text_ns, "\\s")), "&&", " ")
text_split

我假设 "&&" 是一个字符串,它实际上并没有出现在您的源文本中,并且您想在空格处拆分。

关于r - 使用 r 根据模式列表中的精确匹配拆分文本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54794921/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com