python - key 错误 : '\\documentclass'-6ren

python - key 错误 : '\\documentclass'

转载作者：太空宇宙更新时间：2023-11-04 08:00:59

25

4

我有以下 Python 脚本:

import nltk
from nltk.probability import FreqDist
nltk.download('punkt')

frequencies = {}
book = open('book.txt')
read_book = book.read()
words = nltk.word_tokenize(read_book)
frequencyDist = FreqDist(words)

for w in words:
    frequencies[w] = frequencies[w] + 1 

print (frequencies)

当我尝试运行脚本时，我得到以下信息:

[nltk_data] Downloading package punkt to /home/abc/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
Traceback (most recent call last):
  File "test.py", line 12, in <module>
    frequencies[w] = frequencies[w] + 1 
KeyError: '\\documentclass'

我做错了什么？而且，我怎样才能打印这个词和它在文本文件中出现的次数。

您可以从here 下载book.txt .

最佳答案

您的 frequencies 字典是空的。您从一开始就遇到关键错误，这是意料之中的。

我建议您改用 collections.Counter。它是一个专门的字典(有点像 defaultdict)，可以计算出现次数。

import nltk,collections
from nltk.probability import FreqDist
nltk.download('punkt')

frequencies = collections.Counter()
with open('book.txt') as book:
    read_book = book.read()
words = nltk.word_tokenize(read_book)
frequencyDist = FreqDist(words)

for w in words:
    frequencies[w] += 1 

print (frequencies)

编辑:这回答了你的问题，根本没有使用 ntlk 包。我的回答就像 nltk 包只是一个字符串分词器。因此，为了更具体并允许在不重新发明轮子的情况下进一步进行文本分析，并且感谢下面的各种评论，您应该这样做:

import nltk
from nltk.probability import FreqDist
nltk.download('punkt')

with open('book.txt') as book:
    read_book = book.read()
words = nltk.word_tokenize(read_book)
frequencyDist = FreqDist(words)   # no need for the loop, does the count job

print (frequencyDist)

你会得到(用我的文字):

<FreqDist with 142 samples and 476 outcomes>

所以不是直接包含 word => 元素数量的字典，而是包含此信息的更复杂的对象 + 更多信息:

frequencyDist.items():你得到 words=>count(和所有经典的 dict 方法)
frequencyDist.most_common(50) 打印 50 个最常见的词
frequencyDist['the'] 返回 "the" 的出现次数
...

关于python - key 错误 : '\\documentclass' ，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/40346561/

25

4

0

文章推荐： javascript - 如何在添加元素时平滑滚动？

文章推荐： python - Python 字典可以作为 Neo4j 文字映射传递吗？

文章推荐： python - 如何在不重复函数评估的情况下编写列表理解？

latex -\documentclass{book} 居中的标题页
使用 \documentclass{book} 排版文档时，奇数页和偶数页的页边距是不同的，就像在书中一样。这意味着页面上的内容没有居中，这一切都很好，除了有时在标题页上。我的问题是:如何在使用 b
python - key 错误 : '\\documentclass'
我有以下 Python 脚本: import nltk from nltk.probability import FreqDist nltk.download('punkt') frequencies
c# - 仅在尝试使用 Word DocumentClass 时在产品中出现对象引用错误
我正在编写一个使用 .dotx 模板并在 aspx 页面中合并数据的程序。该程序在我的本地开发工作站上运行完美，但当我将它部署到测试 IIS 服务器时，它在下面的第二行失败，给我一个对象引用错误。我
r - 如何阻止 knitr 添加基于 documentClass 的 tex 包？
例如，我有一个简单的最小 .Rnw 文件，如下所示: >= test = "test" @ \documentclass{article} \begin{document} Test value is
r - Knit 钩子(Hook)在 tex 文件中的\documentclass 行之前添加代码，以避免选项与 xcolor 冲突
我正在尝试使用 rmarkdown 和 knitr 创建 pdf 文档。我需要使用 xcolor tex 包和一些选项(例如:[table]或 [svgnames])。每当我尝试在 YAML hea

首页

博学

6Ren·AI

商城

python - key 错误 : '\\documentclass'