gpt4 book ai didi

r - R 中的 Wordcloud 使用不同的功能

转载 作者:行者123 更新时间:2023-12-01 23:39:44 24 4
gpt4 key购买 nike

使用来自的描述功能 Online retail dataset , 我创建了一个词云。

descCorpus <- Corpus(VectorSource(without_weird$Description))
descCorpus <- tm_map(descCorpus, removePunctuation)
descCorpus <- tm_map(descCorpus, removeWords, c('the', 'this',
stopwords('english')))
descCorpus <- tm_map(descCorpus, stemDocument)
wordcloud(descCorpus , max.words = 100, random.order = FALSE)

但是,我希望词云的决定性特征是销售额而不是频率。所以销售额越高,这个词就越大。

可重现的例子:

description <- c("36 PENCILS TUBE RED RETROSPOT","HANGING HEART JAR T-LIGHT HOLDER","VICTORIAN SEWING BOX LARGE","CINAMMON SET OF 9 T-LIGHTS","ZINC T-LIGHT HOLDER STARS SMALL","T-LIGHT HOLDER","RABBIT NIGHT LIGHT","WHITE SOAP RACK WITH 2 BOTTLES","BOUDOIR SQUARE TISSUE BOX", "WHITE SKULL HOT WATER BOTTLE","STRAWBERRY CERAMIC TRINKET POT")

sales <-c(4.56,24.96,11.40,15.00,17.85,10.50,20.40,27.04,20.40,15.00,13.00)

df <- data.frame(description, sales)

最佳答案

这是一个使用精彩的 wordcloud2 包的例子。

使用您的小示例数据,我们得到

description <- c("36 PENCILS TUBE RED RETROSPOT","HANGING HEART JAR T-LIGHT HOLDER","VICTORIAN SEWING BOX LARGE","CINAMMON SET OF 9 T-LIGHTS","ZINC T-LIGHT HOLDER STARS SMALL","T-LIGHT HOLDER","RABBIT NIGHT LIGHT","WHITE SOAP RACK WITH 2 BOTTLES","BOUDOIR SQUARE TISSUE BOX", "WHITE SKULL HOT WATER BOTTLE","STRAWBERRY CERAMIC TRINKET POT")    
sales <-c(4.56,24.96,11.40,15.00,17.85,10.50,20.40,27.04,20.40,15.00,13.00)
df <- data.frame(description, sales)

wordcloud2 函数需要将变量命名为 wordfreq 所以我们这样做。句子很长,所以我使用 size 参数缩小了整体大小。

library(dplyr)
library(wordcloud2)
df %>% rename(word=description, freq=sales) %>% wordcloud2(size=.1)

这会产生以下内容(并且它是顶部的交互式 html 小部件!)

enter image description here

根据你的原始数据,我得到了这样的结果(不确定这是你之后的特定数据争论,indata 是读取的 excel 文件)

indata %>% group_by(Description) %>% count(Quantity) %>% 
rename(freq=n, word=Description) %>%
wordcloud2(size=1, minSize=3)

看起来像这样

enter image description here

更新:如果你想计算字数并显示它们,我会使用 tidytext:

library(tidytext)
indata %>% unnest_tokens(word, Description, token="words") %>% group_by(word) %>% tally(Quantity) %>% rename(freq=n) %>% ungroup() %>% wordcloud2(minSize=5)

这个结果

enter image description here

您可能需要跳过这些障碍,删除您已经在 OP 中暗示的数字和停用词。

关于r - R 中的 Wordcloud 使用不同的功能,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46118149/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com