gpt4 book ai didi

r - 将单词(来自定义列表)分组为 R 中的主题

转载 作者:行者123 更新时间:2023-12-02 03:01:37 24 4
gpt4 key购买 nike

我是 Stackoverflow 的新手,正在尝试学习 R。

我想在文本中找到一组已定义的词。以我定义的相关主题的表格格式返回这些单词的计数。

这是我的尝试:

text <- c("Green fruits are such as apples, green mangoes and avocados are good for high blood pressure. Vegetables range from greens like lettuce, spinach, Swiss chard, and mustard greens are great for heart disease. When researchers combined findings with several other long-term studies and looked at coronary heart disease and stroke separately, they found a similar protective effect for both. Green mangoes are the best.")

library(qdap)

**#Own Defined Lists**

fruit <- c("apples", "green mangoes", "avocados")
veg <- c("lettuce", "spinach", "Swiss chard", "mustard greens")

**#Splitting in Sentences**

stext <- strsplit(text, split="\\.")[[1]]

**#Obtain and Count Occurences**
library(plyr)
fruitres <- laply(fruit, function(x) grep(x, stext))
vegres <- laply(veg, function(x) grep(x, stext))

**#Quick check, and not returning 2 results for** "green mangoes"
grep("green mangoes", stext)

**#Trying with stringr package**
tag_ex <- paste0('(', paste(fruit, collapse = '|'), ')')
tag_ex

library(dplyr)
library(stringr)


themes = sapply(str_extract_all(stext, tag_ex), function(x) paste(x, collapse=','))[[1]]
themes


#Create data table
library(data.table)
data.table(fruit,fruitres)

使用各自的 qdap 和 stringr 包我无法获得我想要的解决方案。

将水果和蔬菜组合在一张 table 上的理想解决方案

apples               fruit     1
green mangoes fruit 2
avocados fruit 1
lettuce veg 1
spinach veg 1
Swiss chard veg 1
mustard greens veg 1

任何帮助将不胜感激。谢谢

最佳答案

我尝试对 N 个向量进行泛化

tidyverse 和 stringr 解决方案

library(tidyverse)
library(stringr)

为你的向量创建一个data.frame

data <- c("fruit","veg")   # vector names
L <- map(data, ~get(.x))
names(L) <- data
long <- map_df(1:length(L), ~data.frame(category=rep(names(L)[.x]), type=L[[.x]]))

# You may receive warnings about coercing to characters

# category type
# 1 fruit apples
# 2 fruit green mangoes
# 3 fruit avocados
# etc

计算每个的实例

long %>%
mutate(count=str_count(tolower(text), tolower(type)))

输出

  category           type count
1 fruit apples 1
2 fruit green mangoes 2
3 fruit avocados 1
4 veg lettuce 1
# etc

额外内容

我们可以很容易地添加另一个向量

health <- c("blood", "heart")
data <- c("fruit","veg", "health")

# code as above

额外输出(tail)

6      veg    Swiss chard     1
7 veg mustard greens 1
8 health blood 1
9 health heart 2

关于r - 将单词(来自定义列表)分组为 R 中的主题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45596986/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com