gpt4 book ai didi

r - 仅保留在向量 R 中找到的数据框中的单词

转载 作者:行者123 更新时间:2023-12-05 01:31:13 24 4
gpt4 key购买 nike

我需要从如下所示的数据框中删除所有非英语单词:

ID     text
1 they all went to the store bonkobuns and bought chicken
2 if we believe no exomunch standards are in order then we're ok
3 living among the calipodians seems reasonable
4 given the state of all relimited editions we should be fine

我想以这样的数据框结束:

 ID     text
1 they all went to the store and bought chicken
2 if we believe no standards are in order then we're ok
3 living among the seems reasonable
4 given the state of all editions we should be fine

我有一个包含所有英文单词的向量:word_vec

我可以使用 tm 包从数据框中删除向量中的所有单词

for(k in 1:nrow(frame){
for(i in 1:length(word_vec)){
frame[k,] <- removeWords(frame[i,],word_vec[i])
}
}

但我想做相反的事情。我只想“保留”向量中的单词。

最佳答案

这里有一个简单的方法:

txt <- "Hi this is an example"
words <- c("this", "is", "an", "example")
paste(intersect(strsplit(txt, "\\s")[[1]], words), collapse=" ")
[1] "this is an example"

当然,细节决定成败,因此您可能需要稍微调整一下以考虑撇号和其他标点符号。

关于r - 仅保留在向量 R 中找到的数据框中的单词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28891130/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com