gpt4 book ai didi

r - simple_triplet_matrix 中的错误——无法使用 RWeka 来计算 Phrases

转载 作者:行者123 更新时间:2023-12-05 00:18:52 27 4
gpt4 key购买 nike

使用 TM,我将 DocumentTermMatrix 与字典列表进行比较以计算总数:

totals <- inspect(DocumentTermMatrix(x, list(dictionary = d)))

这对单个词非常有效,但我想包含双词,但不知道该怎么做。

我试过 RWeka:

TrigramTokenizer <- function(x) NGramTokenizer(x, 
Weka_control(min = 3, max = 3))
tdm <- TermDocumentMatrix(v.corpus,
control = list(tokenize = TrigramTokenizer))

但得到以下错误信息:

Error in simple_triplet_matrix(i = i, j = j, v = as.numeric(v), nrow = length(allTerms),  : 
'i, j, v' different lengths
In addition: Warning messages:
1: In parallel::mclapply(x, termFreq, control) :
all scheduled cores encountered errors in user code
2: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'
3: In simple_triplet_matrix(i = i, j = j, v = as.numeric(v), nrow = length(allTerms), :
NAs introduced by coercion.

你能帮忙解决错误信息吗?

谢谢!!

最佳答案

看我的回答here

Seems there are problems using RWeka with parallel package. I found workaround solution here.

1: http://r.789695.n4.nabble.com/RWeka-and-multicore-package-td4678473.html#a4678948

The most important point is not loading the RWeka package and use the namespace in a encapsulated function.

所以你的分词器应该是这样的

BigramTokenizer <- function(x) {RWeka::NGramTokenizer(x, RWeka::Weka_control(min = 2, max = 2))}

关于r - simple_triplet_matrix 中的错误——无法使用 RWeka 来计算 Phrases,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20577040/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com