gpt4 book ai didi

regex - 使用 r 中的定界符数组拆分字符串

转载 作者:行者123 更新时间:2023-12-01 11:33:40 24 4
gpt4 key购买 nike

我是 R 的新手。我必须根据短语定界符拆分句子。我们可以使用 strsplit 基于一个分隔符拆分字符串。我想根据分隔符的数量拆分字符串,例如 [, . : ; ].我怎样才能一步完成。有适用于此的正则表达式吗?

例如:

my_string = "This is a sentence.  This is a question, right?  Yes!  It is."

预期输出:

"This is a sentence", "This is a question", "right", "yes", "It is"

最佳答案

你可以使用这个:

strsplit("This is a sentence. This is a question, right? Yes! It is.", "\\.|,|\\?|!")
#[[1]]
#[1] "This is a sentence" " This is a question" " right"
#[4] " Yes" " It is"

要去掉那些多余的空格,你可以这样做:

strsplit("This is a sentence. This is a question, right? Yes! It is.",
"\\. *|, |\\? *|! *")
#[[1]]
#[1] "This is a sentence" "This is a question" "right"
#[4] "Yes" "It is"

正如 thelatemail 指出的,这更简单:

strsplit("This is a sentence. This is a question, right? Yes! It is.",
"[,.:;?!]\\s*") # \\s* represents a space character appearing 0 or more times

您需要对某些被解释为元字符的字符进行转义。这就是为什么您会在 .? 前面看到 \\| 是一种“或”。

关于regex - 使用 r 中的定界符数组拆分字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29788633/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com