gpt4 book ai didi

python - key 错误 : '\\documentclass'

转载 作者:太空宇宙 更新时间:2023-11-04 08:00:59 25 4
gpt4 key购买 nike

我有以下 Python 脚本:

import nltk
from nltk.probability import FreqDist
nltk.download('punkt')

frequencies = {}
book = open('book.txt')
read_book = book.read()
words = nltk.word_tokenize(read_book)
frequencyDist = FreqDist(words)

for w in words:
frequencies[w] = frequencies[w] + 1

print (frequencies)

当我尝试运行脚本时,我得到以下信息:

[nltk_data] Downloading package punkt to /home/abc/nltk_data...
[nltk_data] Package punkt is already up-to-date!
Traceback (most recent call last):
File "test.py", line 12, in <module>
frequencies[w] = frequencies[w] + 1
KeyError: '\\documentclass'

我做错了什么?而且,我怎样才能打印这个词和它在文本文件中出现的次数。

您可以从here 下载book.txt .

最佳答案

您的 frequencies 字典是空的。您从一开始就遇到关键错误,这是意料之中的。

我建议您改用 collections.Counter。它是一个专门的字典(有点像 defaultdict),可以计算出现次数。

import nltk,collections
from nltk.probability import FreqDist
nltk.download('punkt')

frequencies = collections.Counter()
with open('book.txt') as book:
read_book = book.read()
words = nltk.word_tokenize(read_book)
frequencyDist = FreqDist(words)

for w in words:
frequencies[w] += 1

print (frequencies)

编辑:这回答了你的问题,根本没有使用 ntlk 包。我的回答就像 nltk 包只是一个字符串分词器。因此,为了更具体并允许在不重新发明轮子的情况下进一步进行文本分析,并且感谢下面的各种评论,您应该这样做:

import nltk
from nltk.probability import FreqDist
nltk.download('punkt')

with open('book.txt') as book:
read_book = book.read()
words = nltk.word_tokenize(read_book)
frequencyDist = FreqDist(words) # no need for the loop, does the count job

print (frequencyDist)

你会得到(用我的文字):

<FreqDist with 142 samples and 476 outcomes>

所以不是直接包含 word => 元素数量的字典,而是包含此信息的更复杂的对象 + 更多信息:

  • frequencyDist.items():你得到 words=>count(和所有经典的 dict 方法)
  • frequencyDist.most_common(50) 打印 50 个最常见的词
  • frequencyDist['the'] 返回 "the" 的出现次数
  • ...

关于python - key 错误 : '\\documentclass' ,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40346561/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com