gpt4 book ai didi

r - R 中的 TermDocumentMatrix 错误

转载 作者:行者123 更新时间:2023-12-04 00:05:57 25 4
gpt4 key购买 nike

我一直在研究 R 中 {tm} 包的许多在线示例,试图创建一个 TermDocumentMatrix。创建和清理语料库非常简单,但我在尝试创建矩阵时总是遇到错误。错误是:

Error in UseMethod("meta", x) : no applicable method for 'meta' applied to an object of class "character" In addition: Warning message: In mclapply(unname(content(x)), termFreq, control) : all scheduled cores encountered errors in user code



例如,这里是来自 Jon Starkweather 的文本挖掘的代码 example .提前为这么长的代码道歉,但这确实产生了一个可重现的例子。请注意错误出现在 {tdm} 函数的末尾。
#Read in data
policy.HTML.page <- readLines("http://policy.unt.edu/policy/3-5")

#Obtain text and remove mark-up
policy.HTML.page[186:202]
id.1 <- 3 + which(policy.HTML.page == " TOTAL UNIVERSITY </div>")
id.2 <- id.1 + 5
text.data <- policy.HTML.page[id.1:id.2]
td.1 <- gsub(pattern = "<p>", replacement = "", x = text.data,
ignore.case = TRUE, perl = FALSE, fixed = FALSE, useBytes = FALSE)

td.2 <- gsub(pattern = "</p>", replacement = "", x = td.1, ignore.case = TRUE,
perl = FALSE, fixed = FALSE, useBytes = FALSE)

text.d <- td.2; rm(text.data, td.1, td.2)

#Create corpus and clean
library(tm)
library(SnowballC)
txt <- VectorSource(text.d); rm(text.d)
txt.corpus <- Corpus(txt)
txt.corpus <- tm_map(txt.corpus, tolower)
txt.corpus <- tm_map(txt.corpus, removeNumbers)
txt.corpus <- tm_map(txt.corpus, removePunctuation)
txt.corpus <- tm_map(txt.corpus, removeWords, stopwords("english"))
txt.corpus <- tm_map(txt.corpus, stripWhitespace); #inspect(docs[1])
txt.corpus <- tm_map(txt.corpus, stemDocument)

# NOTE ERROR WHEN CREATING TDM
tdm <- TermDocumentMatrix(txt.corpus)

最佳答案

jazzurro 提供的链接指向解决方案。下面这行代码

 txt.corpus <- tm_map(txt.corpus, tolower)

必须改为
 txt.corpus <- tm_map(txt.corpus, content_transformer(tolower))

关于r - R 中的 TermDocumentMatrix 错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25551514/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com