gpt4 book ai didi

R:如何删除字符向量中的重复元素

转载 作者:行者123 更新时间:2023-12-03 21:57:41 25 4
gpt4 key购买 nike

s <- "height(female), weight, BMI, and BMI."

在上面的字符串中,BMI 一词重复了两次。我希望字符串是:

"height (female), weight, and BMI."

我尝试了以下将字符串分解为独特的部分:

> unique(strsplit(s, " ")[[1]])
[1] "height" "(female)," "weight," "BMI," "and" "BMI."

但自“BMI”和“BMI”以来。不是相同的字符串,使用 unique 并不能摆脱其中一个。

编辑:我怎样才能移动重复的短语? (即体重指数而不是 BMI)

s <- "height (female), weight, weight, body mass index, body mass index." 
s <- stringr::str_replace(s, "(?<=, |^)\\b([()\\w\\s]+),\\s(.*?)((?: and)?(?=\\1))", "\\2")
> stringr::str_replace(s, "(\\w+)(\\(.*?\\))", "\\1 \\2")
[1] "height (female), weight, body mass index, body mass index."

最佳答案

首先使用如下正则表达式替换不需要的重复项可能会有所帮助:

(?<=,|^)([()\w\s]+),\s(.*?)((?: and)?(?=\1))

Demo

解释

  • (?<=, |^)\b前边界。 (\b 应该也可以工作,但不能以这种方式正确固定)
  • ([()\w\s]+), block 元素
  • \s(.*?)((?: and)?中间的一切
  • (?=\1))重复元素

代码示例:

#install.packages("stringr")
library(stringr)
s <- "height(female), weight, BMI, and BMI."
stringr::str_replace(s, "(?<=, |^)\\b([()\\w\\s]+),\\s(.*?)((?: and)?(?=\\1))", "\\2")

输出:

[1] "height(female), weight, and BMI."

关于括号中的部分分隔,使用另一个类似的替换:

stringr::str_replace(s, "(\\w+)(\\(.*?\\))", "\\1 \\2")

输出:

[1] "height (female), weight, and BMI."

测试和整合:

s <- c("height(female), weight, BMI, and BMI."
,"height(female), weight, whatever it is, and whatever it is."
,"height(female), weight, age, height(female), and BMI."
,"weight, weight.")
s <- stringr::str_replace(s, "(?<=, |^)\\b([()\\w\\s]+),\\s(.*?)((?: and)?(?=\\1))", "\\2")
stringr::str_replace(s, "(\\w+)(\\(.*?\\))", "\\1 \\2")

输出:

[1] "height (female), weight, and BMI."      "height (female), weight, and whatever it is."
[3] "weight, age, height (female), and BMI." "weight."

关于R:如何删除字符向量中的重复元素,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50615730/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com