gpt4 book ai didi

r - 如何在 R tm 包中显示语料库文本?

转载 作者:行者123 更新时间:2023-12-04 02:30:29 25 4
gpt4 key购买 nike

我是 R 和 tm 包的新手,所以请原谅我的愚蠢问题;-)
如何在 R tm 包中显示纯文本语料库的文本?

我在一个语料库中加载了一个包含 323 个纯文本文件的语料库:

 src <- DirSource("Korpora/technologie")
corpus <- Corpus(src)

但是当我调用语料库时:
corpus[[1]]

我总是得到一些这样的输出,而不是语料库文本本身:
<<PlainTextDocument>>
Metadata: 7
Content: chars: 144
Content: chars: 141
Content: chars: 224
Content: chars: 75
Content: chars: 105

如何显示语料库的文本?

谢谢!

更新
可重现的示例:我已经使用内置示例文本进行了尝试:
> data("crude")
> crude
<<VCorpus>>
Metadata: corpus specific: 0, document level (indexed): 0
Content: documents: 20
> crude[1]
<<VCorpus>>
Metadata: corpus specific: 0, document level (indexed): 0
Content: documents: 1
> crude[[1]]
<<PlainTextDocument>>
Metadata: 15
Content: chars: 527

如何打印文档的文本?

更新 2: session 信息:
> sessionInfo()
R version 3.1.3 (2015-03-09)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252
[3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
[5] LC_TIME=German_Germany.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] tm_0.6-1 NLP_0.1-7

loaded via a namespace (and not attached):
[1] parallel_3.1.3 slam_0.1-32 tools_3.1.3

最佳答案

您可以尝试将语料库文本转换为数据框,并从数据框本身访问所需的文本。我以内置的示例数据“粗略”(来自 tm 包)为例。

data("crude")
dataframe<-data.frame(text=unlist(sapply(crude, `[`, "content")), stringsAsFactors=F)

dataframe[1,]
[1] "Diamond Shamrock Corp said that\neffective today it had cut its contract prices for crude oil by\n1.50 dlrs a barrel.\n The reduction brings its posted price for West Texas\nIntermediate to 16.00 dlrs a barrel, the copany said.\n \"The price reduction today was made in the light of falling\noil product prices and a weak crude oil market,\" a company\nspokeswoman said.\n Diamond is the latest in a line of U.S. oil companies that\nhave cut its contract, or posted, prices over the last two days\nciting weak oil markets.\n Reuter"

关于r - 如何在 R tm 包中显示语料库文本?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30435054/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com