gpt4 book ai didi

r - 包 tm : removeWords How do I avoid removing CERTIAN (negations specifically) "english" stopwords if specified?

转载 作者:行者123 更新时间:2023-12-04 10:35:42 25 4
gpt4 key购买 nike

我想使用 removeWords (stopwords("english"))功能通过:corpus <- tm_map(corpus,removeWords, stopwords("english"))但有些词,如“不”,以及其他我想保留的否定词。

是否可以使用 removeWords, stopwords("english")功能但如果指定,则排除该列表中的某些单词?

例如,如何防止删除“not”?

(次要)是否可以将这种类型的控制列表设置为所有“否定”?

我不想只使用我感兴趣的非索引字表中的词来创建自己的自定义列表。

最佳答案

您可以通过区分 stopwords("en") 和要排除的单词列表来创建自定义停用词列表:

exceptions   <- c("not")
my_stopwords <- setdiff(stopwords("en"), exceptions)

如果你需要删除所有的否定,你可以从 stopwords() 列表中 grep 它们:

exceptions <- grep(pattern = "not|n't", x = stopwords(), value = TRUE)
# [1] "isn't" "aren't" "wasn't" "weren't" "hasn't" "haven't" "hadn't" "doesn't" "don't" "didn't"
# [11] "won't" "wouldn't" "shan't" "shouldn't" "can't" "cannot" "couldn't" "mustn't" "not"
my_stopwords <- setdiff(stopwords("en"), exceptions)

关于r - 包 tm : removeWords How do I avoid removing CERTIAN (negations specifically) "english" stopwords if specified?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33362801/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com