gpt4 book ai didi

python - "name ' word_tokenize ' is not defined"in python 字数频率

转载 作者:行者123 更新时间:2023-12-03 23:13:23 25 4
gpt4 key购买 nike

我试图从特定的词列中找出词频。

我想从字典中删除停用词。

这是代码:

代码

import unicodecsv as csv
import nltk
import pandas as pd
import chardet

from nltk.corpus import stopwords
from nltk.tokenize import sent_tokenize

with open('data.csv','rb') as f:
result = chardet.detect(f.read())

file_band = file[file['article'].str.contains("first time")]
file.loc[:,'extracted'] = file_band['article']

top_N = 200

a = file_band['extracted'].str.lower().replace(r'\|', ' ').str.cat(sep=' ')
words = nltk.tokenize.word_tokenize(a)
word_dist = nltk.FreqDist(words)
print (word_dist)

stop_words = set(stopwords.words('english'))
word_tokens = word_tokenize(word_dist)

filtered_sentence = [w for w in word_tokens if not w in stop_words]
filtered_sentence = []

for w in word_tokens:
if w not in stop_words:
filtered_sentence.append(w)

print(word_tokens)
print(filtered_sentence)

错误

问题错误是:

NameError Traceback (most recent call last) in () 27 #filter words 28 stop_words = set(stopwords.words('english')) ---> 29 word_tokens = word_tokenize(word_dist) 30 31 filtered_sentence = [w for w in word_tokens if not w in stop_words]

NameError: name 'word_tokenize' is not defined

最佳答案

NameError: name 'word_tokenize' is not defined



错误告诉你的是你正在调用一个函数, word_tokenize() ,而它在您的代码中不可用。

通常,您会定义这样的函数:
def my_function(my_input):
words = *do_something_with* my_input
return words

然后你可以稍后调用它:
words = my_function(my_input)

在您的情况下,您似乎正在尝试使用作为 nltk.tokenize 模块一部分的函数。但是,您只导入了该模块的一部分 - sent_tokenize (你似乎没有使用顺便说一句)
from nltk.tokenize import sent_tokenize

所以也许你需要导入 word_tokenize反而?
from nltk.tokenize import word_tokenize

如果您打算使用 sent_tokenize,或者两者都使用之后?
from nltk.tokenize import sent_tokenize, word_tokenize

关于python - "name ' word_tokenize ' is not defined"in python 字数频率,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50524915/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com