gpt4 book ai didi

r - 如何使用R将频率转换为文本?

转载 作者:行者123 更新时间:2023-12-04 10:44:08 25 4
gpt4 key购买 nike

我有这样的数据框(ID,频率 A B C D E)

ID A B C D E    
1 5 3 2 1 0
2 3 2 2 1 0
3 4 2 1 1 1

我想将此数据框转换为这样的基于测试的文档(ID 及其频率 ABCDE 作为单列中的单词)。然后我可能会使用 LDA 算法来识别每个 ID 的热门话题。
ID                     Text
1 "A" "A" "A" "A" "A" "B" "B" "B" "C" "C" "D"
2 "A" "A" "A" "B" "B" "C" "C" "D"
3 "A" "A" "A" "A" "B" "B" "C" "D" "E"

最佳答案

我们可以使用 data.table

library(data.table)
DT <- setDT(df1)[,.(list(rep(names(df1)[-1], unlist(.SD)))) ,ID]
DT$V1
#[[1]]
#[1] "A" "A" "A" "A" "A" "B" "B" "B" "C" "C" "D"

#[[2]]
#[1] "A" "A" "A" "B" "B" "C" "C" "D"

#[[3]]
#[1] "A" "A" "A" "A" "B" "B" "C" "D" "E"

base R选项是 split
lst <- lapply(split(df1[-1], df1$ID), rep, x=names(df1)[-1])
lst
#$`1`
#[1] "A" "A" "A" "A" "A" "B" "B" "B" "C" "C" "D"

#$`2`
#[1] "A" "A" "A" "B" "B" "C" "C" "D"

#$`3`
#[1] "A" "A" "A" "A" "B" "B" "C" "D" "E"

如果我们想将 'lst' 写入 csv 文件,一个选项是转换 listdata.frame通过附加 NA最后使长度相等,同时转换为 data.frame (如 data.frame 是一个 list,具有相同的长度(列))
res <- do.call(rbind, lapply(lst, `length<-`, max(lengths(lst))))

或者使用来自 stringi 的便捷功能
library(stringi)
res <- stri_list2matrix(lst, byrow=TRUE)

然后使用 write.csv
write.csv(res, "yourdata.csv", quote=FALSE, row.names = FALSE)

关于r - 如何使用R将频率转换为文本?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38411230/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com