gpt4 book ai didi

r - 一个新列中跨列子集的字数统计

转载 作者:行者123 更新时间:2023-12-05 08:45:08 27 4
gpt4 key购买 nike

我有以下数据框:

structure(list(g = c("1", "2", "3"), x = c("This is text.", "This is text too.", 
"This is no text"), y = c("What is text?", "Can it eat text?",
"Maybe I will try.")), class = "data.frame", row.names = c(NA,
-3L))

我想计算 xy 列中的单词数,并将该值求和以获得一列以及每列使用的单词总数.重要的是我能够对数据进行子集化。结果应该是这样的:

structure(list(g = c("1", "2", "3"), x = c("This is text.", "This is text too.", 
"This is no text"), y = c("What is text?", "Can it eat text?",
"Maybe I will try."), z = c("6", "8", "8")), class = "data.frame", row.names = c(NA,
-3L))

我尝试将 str_count("") 与不同的正则表达式结合使用 acrossapply 但我似乎没有得到解决方案。

我在最初的问题中没有预料到其中包含 NA 单元格的列会出现问题,但我确实预料到了。因此,任何解决方案都需要能够处理 NA 单元格。

最佳答案

此处使用tokenizers 的解决方案:

library(tokenizers)

df <-
structure(list(g = c("1", "2", "3"), x = c("This is text.", "This is text too.",
"This is no text"), y = c("What is text?", "Can it eat text?",
"Maybe I will try.")), class = "data.frame", row.names = c(NA,
-3L))

df$z = tokenizers::count_words(df$x) + tokenizers::count_words(df$y)

df
#> g x y z
#> 1 1 This is text. What is text? 6
#> 2 2 This is text too. Can it eat text? 8
#> 3 3 This is no text Maybe I will try. 8

如果您更喜欢纯 R:

df$z <- rowSums(
sapply(df[,c("x","y")],function(x)
sapply(gregexpr("\\b\\w+\\b", x) , function(x)
if(x[[1]] > 0) length(x) else 0)))

请注意,\w+ 匹配所有单词,\b 匹配单词边界,尽管我相信“\w”就足够了

关于r - 一个新列中跨列子集的字数统计,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/74160144/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com