gpt4 book ai didi

regex - 拆分保留重复分隔符

转载 作者:行者123 更新时间:2023-12-04 01:59:16 25 4
gpt4 key购买 nike

我正在尝试使用 stringi要在分隔符上拆分的包(可能重复分隔符)但保留分隔符。这类似于我在几个月前问过的这个问题:R split on delimiter (split) keep the delimiter (split)但分隔符可以重复。我不认为基地strsplit可以处理这种类型的正则表达式。 stringi package 可以,但我不知道如何格式化正则表达式,如果有重复,并且不在字符串末尾留下空字符串,它会在分隔符上拆分。

基本 R 解决方案、stringr、stringi 等解决方案都受到欢迎。

出现后面的问题是因为我用了贪心*\\s但空间不是 garunteed 所以我只能想把它留在:

MWE

text.var <- c("I want to split here.But also||Why?",
"See! Split at end but no empty.",
"a third string. It has two sentences"
)

library(stringi)
stri_split_regex(text.var, "(?<=([?.!|]{1,10}))\\s*")

# 结果
## [[1]]
## [1] "I want to split here." "But also|" "|" "Why?"
## [5] ""
##
## [[2]]
## [1] "See!" "Split at end but no empty." ""
##
## [[3]]
## [1] "a third string." "It has two sentences"

# 预期结果
## [[1]]
## [1] "I want to split here." "But also||" "Why?"
##
## [[2]]
## [1] "See!" "Split at end but no empty."
##
## [[3]]
## [1] "a third string." "It has two sentences"

最佳答案

使用 strsplit

 strsplit(text.var, "(?<=[.!|])( +|\\b)", perl=TRUE)
#[[1]]
#[1] "I want to split here." "But also||" "Why?"

#[[2]]
#[1] "See!" "Split at end but no empty."

#[[3]]
#[1] "a third string." "It has two sentences"

或者
 library(stringi)
stri_split_regex(text.var, "(?<=[.!|])( +|\\b)")
#[[1]]
#[1] "I want to split here." "But also||" "Why?"

#[[2]]
#[1] "See!" "Split at end but no empty."

#[[3]]
#[1] "a third string." "It has two sentences"

关于regex - 拆分保留重复分隔符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26509700/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com