gpt4 book ai didi

r - 通过分组变量折叠列(在基础中)

转载 作者:行者123 更新时间:2023-12-04 10:49:37 24 4
gpt4 key购买 nike

我有一个文本变量和一个分组变量。我想按因子将文本变量折叠成每行(组合)一个字符串。所以只要组列显示 m 我想将文本组合在一起等等。我在之前和之后提供了一个示例数据集。我正在为一个包写这篇文章,到目前为止,除了 wordcloud 之外,我已经避免了对其他包的所有依赖,并希望保持这种方式。

我怀疑 rle 可能对 cumsum 有用,但还没弄清楚这一点。

提前谢谢你。

数据是什么样的

                                 text group
1 Computer is fun. Not too fun. m
2 No its not, its dumb. m
3 How can we be certain? f
4 There is no way. m
5 I distrust you. m
6 What are you talking about? f
7 Shall we move on? Good then. f
8 Im hungry. Lets eat. You already? m

我希望数据看起来像什么

                                                       text group
1 Computer is fun. Not too fun. No its not, its dumb. m
2 How can we be certain? f
3 There is no way. I distrust you. m
4 What are you talking about? Shall we move on? Good then. f
5 Im hungry. Lets eat. You already? m

数据

dat <- structure(list(text = c("Computer is fun. Not too fun.", "No its not, its dumb.", 
"How can we be certain?", "There is no way.", "I distrust you.",
"What are you talking about?", "Shall we move on? Good then.",
"Im hungry. Lets eat. You already?"), group = structure(c(2L,
2L, 1L, 2L, 2L, 1L, 1L, 2L), .Label = c("f", "m"), class = "factor")), .Names = c("text",
"group"), row.names = c(NA, 8L), class = "data.frame")

编辑:我发现我可以为组变量的每次运行添加唯一列:

x <- rle(as.character(dat$group))[[1]]
dat$new <- as.factor(rep(1:length(x), x))

产量:

                                 text group new
1 Computer is fun. Not too fun. m 1
2 No its not, its dumb. m 1
3 How can we be certain? f 2
4 There is no way. m 3
5 I distrust you. m 3
6 What are you talking about? f 4
7 Shall we move on? Good then. f 4
8 Im hungry. Lets eat. You already? m 5

最佳答案

这利用 rle 创建一个 id 来对句子进行分组。它使用 tapply 和 paste 将输出放在一起

## Your example data
dat <- structure(list(text = c("Computer is fun. Not too fun.", "No its not, its dumb.",
"How can we be certain?", "There is no way.", "I distrust you.",
"What are you talking about?", "Shall we move on?  Good then.",
"Im hungry.  Lets eat.  You already?"), group = structure(c(2L,
2L, 1L, 2L, 2L, 1L, 1L, 2L), .Label = c("f", "m"), class = "factor")), .Names = c("text",
"group"), row.names = c(NA, 8L), class = "data.frame")


# Needed for later
k <- rle(as.numeric(dat$group))
# Create a grouping vector
id <- rep(seq_along(k$len), k$len)
# Combine the text in the desired manner
out <- tapply(dat$text, id, paste, collapse = " ")
# Bring it together into a data frame
answer <- data.frame(text = out, group = levels(dat$group)[k$val])

关于r - 通过分组变量折叠列(在基础中),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/9857787/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com