gpt4 book ai didi

r - 用 R 提取 ngram

转载 作者:行者123 更新时间:2023-12-04 10:56:03 26 4
gpt4 key购买 nike

我正在尝试使用 ngramrr 包从 nirvana 文本中提取 3-grams

require(ngramrr)
require(tm)
require(magrittr)

nirvana <- c("hello hello hello how low", "hello hello hello how low",
"hello hello hello how low", "hello hello hello",
"with the lights out", "it's less dangerous", "here we are now",
"entertain us", "i feel stupid", "and contagious", "here we are now",
"entertain us", "a mulatto", "an albino", "a mosquito", "my libido",
"yeah", "hey yay")

ngramrr(nirvana[1], ngmax = 3)

Corpus(VectorSource(nirvana))

我得到这个结果:

[1] "hello"      "hello"    "hello"              "how"  "low"       "hello hello"  "hello hello"      
[8] "hello how" "how low" "hello hello hello" "hello hello how" "hello how low"

我想知道如何构造 TermDocumentMatrix,其中术语是 tri-grams 列表。

谢谢

最佳答案

我上面的评论差不多完成了,但它是这样的:

nirvana %>% tokens(ngrams = 1:3) %>% # generate tokens
dfm %>% # generate dfm
convert(to = "tm") %>% # convert to tm's document-term-matrix
t # transpose it to term-document-matrix

关于r - 用 R 提取 ngram,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43807448/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com