gpt4 book ai didi

从字符串中删除单词

转载 作者:行者123 更新时间:2023-12-01 12:16:52 24 4
gpt4 key购买 nike

我正在尝试从数据框中删除某些单词:

name    age words
James 34 hello, my name is James.
John 30 hello, my name is John. Here is my favourite website https://stackoverflow.com
Jim 27 Hi! I'm another person whose name begins with a J! Here is something that should be filtered out: <filter>

df<-structure(list(name = structure(c(1L, 3L, 2L), .Label = c("James", 
"Jim", "John"), class = "factor"), age = c(34L, 30L, 27L), message = structure(1:3, .Label = c("hello, my name is James. ",
"hello, my name is John. Here is my favourite website https://stackoverflow.com",
"Hi! I'm another person whose name begins with a J! Here is something that should be filtered out: <filter>"
), class = "factor")), .Names = c("name", "age", "message"), class = "data.frame", row.names = c(NA,
-3L))

我正在尝试删除包含与 http 匹配的所有单词或 filter .

我想遍历每一行,在空格处拆分字符串,然后询问该单词是否包含 http<filter> (或换句话说)。如果是这样,那么我想用一个空格替换这个词。

有一个load of questions关于删除完全匹配另一个单词或单词列表的单词,但我找不到太多关于删除符合某些条件的单词(例如 httpwww. )。

我试过:

gsub , !grepltm_map方法(例如 this ),但我无法让它们中的任何一个产生我预期的输出:

name    age words
James 34 hello, my name is James.
John 30 hello, my name is John. Here is my favourite website
Jim 27 Hi! I'm another persoon whose name begins with a J! Here is something that should be filtered out:

最佳答案

删除任何非空白 block 包含 httpfilter (或换句话说)作为整个单词你可以使用 gsub使用以下 PCRE 正则表达式(添加 perl=TRUE 参数):

(?:\s+|^)\S*(?<!\w)(?:https?|<filter>)(?!\w)\S*

参见 regex demo

详情

  • (?:\s+|^) - 1+ wjhitespaces 或字符串开头
  • \S* - 尽可能多的 0+ 个非空白字符
  • (?<!\w) - 在当前位置的左边不允许有任何字符
  • (?:https?|<filter>) - http , https<filter>
  • (?!\w) - 不允许紧靠当前位置右侧的单词字符(在交替组中的单词之后)
  • \S* - 尽可能多的 0+ 个非空白字符。

查看 online R demo :

df<-structure(list(name = structure(c(1L, 3L, 2L), .Label = c("James", 
"Jim", "John"), class = "factor"), age = c(34L, 30L, 27L), message = structure(1:3, .Label = c("hello, my name is James. ",
"hello, my name is John. Here is my favourite website https://stackoverflow.com",
"Hi! I'm another persoon whose name begins with a J! Here is something that should be filtered out: <filter>"
), class = "factor")), .Names = c("name", "age", "message"), class = "data.frame", row.names = c(NA,
-3L))
df$message <- gsub("(?:\\s+|^)\\S*(?<!\\w)(?:https?|<filter>)(?!\\w)\\S*", "", df$message, perl=TRUE)
df$message

结果:

[1] "hello, my name is James. "                                                                         
[2] "hello, my name is John. Here is my favourite website"
[3] "Hi! I'm another persoon whose name begins with a J! Here is something that should be filtered out:"

关于从字符串中删除单词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47348365/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com