Python 制作词云的WordCloud参数用法说明-6ren

Python 制作词云的WordCloud参数用法说明

转载作者：qq735679552 更新时间：2022-09-29 22:32:09

CFSDN坚持开源创造价值，我们致力于搭建一个资源共享平台，让每一个IT人在这里找到属于你的精彩世界.

这篇CFSDN的博客文章Python 制作词云的WordCloud参数用法说明由作者收集整理，如果你对这篇文章有兴趣，记得点赞哟.

场景

官方API

https://amueller.github.io/word_cloud/generated/wordcloud.WordCloud.html 。

实现

font_path : string #字体路径，需要展现什么字体就把该字体路径+后缀名写上，如：font_path = '黑体.ttf' width : int (default=400) #输出的画布宽度，默认为400像素 height : int (default=200) #输出的画布高度，默认为200像素 prefer_horizontal : float (default=0.90) #词语水平方向排版出现的频率，默认 0.9 （所以词语垂直方向排版出现频率为 0.1 ）mask : nd-array or None (default=None) #如果参数为空，则使用二维遮罩绘制词云。如果 mask 非空，设置的宽高值将被忽略，遮罩形状被 mask 取代。除全白（#FFFFFF）的部分将不会绘制，其余部分会用于绘制词云。如：bg_pic = imread('读取一张图片.png')，背景图片的画布一定要设置为白色（#FFFFFF），然后显示的形状为不是白色的其他颜色。可以用ps工具将自己要显示的形状复制到一个纯白色的画布上再保存，就ok了。scale : float (default=1) #按照比例进行放大画布，如设置为1.5，则长和宽都是原来画布的1.5倍 min_font_size : int (default=4) #显示的最小的字体大小 font_step : int (default=1) #字体步长，如果步长大于1，会加快运算但是可能导致结果出现较大的误差 max_words : number (default=200) #要显示的词的最大个数 stopwords : set of strings or None #设置需要屏蔽的词，如果为空，则使用内置的STOPWORDS background_color : color value (default=”black”) #背景颜色，如background_color='white',背景颜色为白色 max_font_size : int or None (default=None) #显示的最大的字体大小 mode : string (default=”RGB”) #当参数为“RGBA”并且background_color不为空时，背景为透明 relative_scaling : float (default=.5) #词频和字体大小的关联性 color_func : callable, default=None #生成新颜色的函数，如果为空，则使用 self.color_func regexp : string or None (optional) #使用正则表达式分隔输入的文本 collocations : bool, default=True #是否包括两个词的搭配 colormap : string or matplotlib colormap, default=”viridis” #给每个单词随机分配颜色，若指定color_func，则忽略该方法 random_state : int or None #为每个单词返回一个PIL颜色 fit_words(frequencies) #根据词频生成词云generate(text) #根据文本生成词云generate_from_frequencies(frequencies[, ...]) #根据词频生成词云generate_from_text(text) #根据文本生成词云process_text(text) #将长文本分词并去除屏蔽词（此处指英语，中文分词还是需要自己用别的库先行实现，使用上面的 fit_words(frequencies) ）recolor([random_state, color_func, colormap]) #对现有输出重新着色。重新上色会比重新生成整个词云快很多to_array() #转化为 numpy arrayto_file(filename) #输出到文件

补充：生成词云之python中WordCloud包的用法。

效果图:

Python 制作词云的WordCloud参数用法说明

这是python中使用wordcloud包生成的词云图.

下面来介绍一下wordcloud包的基本用法

class wordcloud.WordCloud(font_path=None, width=400, height=200, margin=2, ranks_only=None, prefer_horizontal=0.9,mask=None, scale=1, color_func=None, max_words=200, min_font_size=4, stopwords=None, random_state=None,background_color='black', max_font_size=None, font_step=1, mode='RGB', relative_scaling=0.5, regexp=None, collocations=True,colormap=None, normalize_plurals=True)

这是wordcloud的所有参数，下面具体介绍一下各个参数:

font_path : string //字体路径，需要展现什么字体就把该字体路径+后缀名写上，如：font_path = '黑体.ttf'width : int (default=400) //输出的画布宽度，默认为400像素height : int (default=200) //输出的画布高度，默认为200像素prefer_horizontal : float (default=0.90) //词语水平方向排版出现的频率，默认 0.9 （所以词语垂直方向排版出现频率为 0.1 ）mask : nd-array or None (default=None) //如果参数为空，则使用二维遮罩绘制词云。如果 mask 非空，设置的宽高值将被忽略，遮罩形状被 mask 取代。除全白（#FFFFFF）的部分将不会绘制，其余部分会用于绘制词云。如：bg_pic = imread('读取一张图片.png')，背景图片的画布一定要设置为白色（#FFFFFF），然后显示的形状为不是白色的其他颜色。可以用ps工具将自己要显示的形状复制到一个纯白色的画布上再保存，就ok了。scale : float (default=1) //按照比例进行放大画布，如设置为1.5，则长和宽都是原来画布的1.5倍。min_font_size : int (default=4) //显示的最小的字体大小font_step : int (default=1) //字体步长，如果步长大于1，会加快运算但是可能导致结果出现较大的误差。max_words : number (default=200) //要显示的词的最大个数stopwords : set of strings or None //设置需要屏蔽的词，如果为空，则使用内置的STOPWORDSbackground_color : color value (default=”black”) //背景颜色，如background_color='white',背景颜色为白色。max_font_size : int or None (default=None) //显示的最大的字体大小mode : string (default=”RGB”) //当参数为“RGBA”并且background_color不为空时，背景为透明。relative_scaling : float (default=.5) //词频和字体大小的关联性color_func : callable, default=None //生成新颜色的函数，如果为空，则使用 self.color_funcregexp : string or None (optional) //使用正则表达式分隔输入的文本collocations : bool, default=True //是否包括两个词的搭配colormap : string or matplotlib colormap, default=”viridis” //给每个单词随机分配颜色，若指定color_func，则忽略该方法。fit_words(frequencies) //根据词频生成词云generate(text) //根据文本生成词云generate_from_frequencies(frequencies[, ...]) //根据词频生成词云generate_from_text(text) //根据文本生成词云process_text(text) //将长文本分词并去除屏蔽词（此处指英语，中文分词还是需要自己用别的库先行实现，使用上面的 fit_words(frequencies) ）recolor([random_state, color_func, colormap]) //对现有输出重新着色。重新上色会比重新生成整个词云快很多。to_array() //转化为 numpy arrayto_file(filename) //输出到文件

例子：

想要生成的词云的形状:

Python 制作词云的WordCloud参数用法说明

图中黑色部分就是词云的将要显示的部分，白色部分不显示任何词.

下面是一个文本文档:

How the Word Cloud Generator Works 。

The layout algorithm for positioning words without overlap is available on GitHub under an open source license as d3-cloud. Note that this is the only the layout algorithm and any code for converting text into words and rendering the final output requires additional development. 。

As word placement can be quite slow for more than a few hundred words, the layout algorithm can be run asynchronously, with a configurable time step size. This makes it possible to animate words as they are placed without stuttering. It is recommended to always use a time step even without animations as it prevents the browser's event loop from blocking while placing the words. 。

The layout algorithm itself is incredibly simple. For each word, starting with the most “important”

Attempt to place the word at some starting point: usually near the middle, or somewhere on a central horizontal line. If the word intersects with any previously placed words, move it one step along an increasing spiral. Repeat until no intersections are found. The hard part is making it perform efficiently! According to Jonathan Feinberg, Wordle uses a combination of hierarchical bounding boxes and quadtrees to achieve reasonable speeds. 。

Glyphs in JavaScript 。

There isn't a way to retrieve precise glyph shapes via the DOM, except perhaps for SVG fonts. Instead, we draw each word to a hidden canvas element, and retrieve the pixel data. 。

Retrieving the pixel data separately for each word is expensive, so we draw as many words as possible and then retrieve their pixels in a batch operation. 。

Sprites and Masks 。

My initial implementation performed collision detection using sprite masks. Once a word is placed, it doesn't move, so we can copy it to the appropriate position in a larger sprite representing the whole placement area. 。

The advantage of this is that collision detection only involves comparing a candidate sprite with the relevant area of this larger sprite, rather than comparing with each previous word separately. 。

Somewhat surprisingly, a simple low-level hack made a tremendous difference: when constructing the sprite I compressed blocks of 32 1-bit pixels into 32-bit integers, thus reducing the number of checks (and memory) by 32 times. 。

In fact, this turned out to beat my hierarchical bounding box with quadtree implementation on everything I tried it on (even very large areas and font sizes). I think this is primarily because the sprite version only needs to perform a single collision test per candidate area, whereas the bounding box version has to compare with every other previously placed word that overlaps slightly with the candidate area. 。

Another possibility would be to merge a word's tree with a single large tree once it is placed. I think this operation would be fairly expensive though compared with the analagous sprite mask operation, which is essentially ORing a whole block. 。

从这个文本中生成一个词云，代码如下:

#!/usr/bin/python# -*- coding: utf-8 -*-#coding=utf-8#导入wordcloud模块和matplotlib模块from wordcloud import WordCloudimport matplotlib.pyplot as pltfrom scipy.misc import imread#读取一个txt文件text = open('test.txt','r').read()#读入背景图片bg_pic = imread('3.png')#生成词云wordcloud = WordCloud(mask=bg_pic,background_color='white',scale=1.5).generate(text)image_colors = ImageColorGenerator(bg_pic)#显示词云图片plt.imshow(wordcloud)plt.axis('off')plt.show()#保存图片wordcloud.to_file('test.jpg')

运行结果:

Python 制作词云的WordCloud参数用法说明

以上为个人经验，希望能给大家一个参考，也希望大家多多支持我。如有错误或未考虑完全的地方，望不吝赐教.

原文链接：https://blog.csdn.net/BADAO_LIUMANG_QIZHI/article/details/89708414 。

最后此篇关于Python 制作词云的WordCloud参数用法说明的文章就讲到这里了,如果你想了解更多关于Python 制作词云的WordCloud参数用法说明的内容请搜索CFSDN的文章或继续浏览相关文章，希望大家以后支持我的博客！。

文章推荐： python pdfkit 中文乱码问题的解决方案

文章推荐： python 实现存储数据到txt和pdf文档及乱码问题的解决

文章推荐： Python WordCloud 修改色调的实现方式

文章推荐：利用Python3实现统计大量单词中各字母出现的次数和频率的方法

javascript - 如何使用正则表达式突出显示字符串中的多个关键字/词？
我有以下案例要解决。在短语中突出显示关键字的 Javascript 方法。 vm.highlightKeywords = (phrase, keywords) => { keywords =
regex - 在Dart正则表达式中匹配$(美元符号)词
我要匹配文本中的所有美元符号单词。例如，"Hello $VARONE this is $VARTWO"可以匹配$VARONE和$VARTWO。正则表达式应该是/\$(\w+)/g，但是当我在Dart
javascript - 改变状态的函数的 Redux 词
在 redux 中，对于将状态作为参数、更改状态并返回新状态的特定操作，您会在 switch 语句中调用什么函数？ function reducer(state = DEFAULT_STATE, ac
mysql - 未记录的 MySQL 词
在 MySQL 5.1 中，我将一个字段命名为“Starting”。但是，每次我使用 SQL 查询时，它都会说无效的 SQL 语法。经过一些谷歌搜索，我发现 STARTING 是一个保留的 SQL 词
python - 从列表中找到 secret 词？
我必须使用函数 isIn(secretWord,lettersGuessed) 从列表中找到密码。在下面发布我的代码。 def isWordGuessed(secretWord, lettersGue
c - C语言求两个字符串中最长的公共(public)词？
一段时间以来，我一直无法找到两个字符串中最长的常用词。首先我想到了用“isspace”函数来做这件事，但不知道如何找到一个常用词。然后我想到了“strcmp”，但到目前为止我只能比较两个字符串。我在想
python - 设置长度的python中的契约(Contract)词
我目前正在尝试制作一种“单词混合器”:对于两个给定的单词和指定的所需长度，程序应返回这两个单词的“混合”。然而，它可以是任何类型的混合:它可以是第一个单词的前半部分与第二个单词的后半部分相结合，它可以
javascript - 匹配 "After"如果它后面没有一个 -ing 词
如果 After 之后(逗号之前)没有 -ing 词，我想匹配它。所以 After 和逗号之间不应该有 -ing 词。所需的匹配项(粗体): After sitting down, he began
java - StanfordNLP 词形还原无法处理 -ing 词
我一直在试验 Stanford NLP 工具包及其词形还原功能。我很惊讶它如何使一些词词形还原。例如: depressing -> depressing depressed -> depressed
javascript - 词云中缺少 d3.js 词
js 并尝试根据 [这里] 中的示例代码来做词云:https://github.com/jasondavies/d3-cloud .我想做的是单词的字体大小是基于数组中单词的频率。例如我有 [a,a,
python - Conceptnet Numberbatch(多语言)OOV 词
我正在处理一个文本分类问题(在法语语料库上)，并且正在试验不同的词嵌入。我对 ConceptNet 提供的内容非常感兴趣，所以我决定试一试。我无法为我的特定任务找到专门的教程，所以我听取了他们的建议
search - 在 emacs 中编辑 I-search 词？
当我在文本中搜索时，我输入 C-s，然后输入单词，然后一次又一次地输入 C-s，光标前进到找到的单词的下一个位置。问题是，一旦我转到下一个单词，我无法在按钮处编辑迷你缓冲区中的搜索单词，如果我按 Ba
java - 如何在我的文件夹结构中的 Maven 中运行一个简单的 hello 词？
我正在尝试按照以下结构运行这个 maven Hello Word: ├── pom.xml └── src └── Main.java 使用pom.xml设置: 4.0.0
python - 从图像中删除 OCR 词(OpenCV，Python)
所以，从我可以开始的.. 我正在使用 OCR。该脚本非常适合我的需要。它检测单词的准确性对我来说还可以。这是结果:附加图像 100% 准确。 from PIL import Image import
ms-word - 词: Picture hidden behind text
Closed. This question does not meet Stack Overflow guidelines。它当前不接受答案。想要改善这个问题吗？更新问题，以便将其作为on-topi
java - Comparable 接口(interface)前无接口(interface)词
这是细节，但我想知道为什么会这样。示例代码: Class klasa = Enum.class; for(Type t : klasa.getGenericInterfaces()) Syst
javascript - 如何使用 JavaScript 创建 .docx 文件而不是 .doc 词
我在用: var header = ""+ "Export HTML to Word Document with JavaScript"; var footer = ""; /
bash - 在变量的字符串中找到一种模式后输出值/词(grep、awk、sed、pearl 等)
我有一个程序可以像这样将数据打印到控制台(以空格分隔): variable1 value1 variable2 value2 variable3 value3 varialbe4 value4 编辑:
bash - 在变量的字符串中找到一种模式后输出值/词(grep、awk、sed、pearl 等)
我有一个程序可以像这样将数据打印到控制台(以空格分隔): variable1 value1 variable2 value2 variable3 value3 varialbe4 value4 编辑:
ruby-on-rails - "use"Ruby/Rails/Rack 代码中的关键字/词
最近我在查看与goliath相关的一些代码时，偶然在Ruby代码中看到了这个词use。 , 中间件等。看起来它不同于include/extend, and require. 有人可以解释为什么存在这个

qq735679552

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

Python 制作词云的WordCloud参数用法说明

场景

实现

下面来介绍一下wordcloud包的基本用法

例子：