gpt4 book ai didi

python - NLTK:包错误?朋克和泡菜?

转载 作者:太空宇宙 更新时间:2023-11-03 12:20:50 24 4
gpt4 key购买 nike

Errors on Command Prompt

基本上,我不知道为什么会收到此错误。

除了一张图片,这里还有一条代码格式的类似消息。因为是最近的,这个帖子的答案已经在留言中提到了:

Preprocessing raw texts ...

---------------------------------------------------------------------------

LookupError Traceback (most recent call last)

<ipython-input-38-263240bbee7e> in <module>()
----> 1 main()

7 frames

<ipython-input-32-62fa346501e8> in main()
32 data = data.fillna('') # only the comments has NaN's
33 rws = data.abstract
---> 34 sentences, token_lists, idx_in = preprocess(rws, samp_size=samp_size)
35 # Define the topic model object
36 #tm = Topic_Model(k = 10), method = TFIDF)

<ipython-input-31-f75213289788> in preprocess(docs, samp_size)
25 for i, idx in enumerate(samp):
26 sentence = preprocess_sent(docs[idx])
---> 27 token_list = preprocess_word(sentence)
28 if token_list:
29 idx_in.append(idx)

<ipython-input-29-eddacbfa6443> in preprocess_word(s)
179 if not s:
180 return None
--> 181 w_list = word_tokenize(s)
182 w_list = f_punct(w_list)
183 w_list = f_noun(w_list)

/usr/local/lib/python3.7/dist-packages/nltk/tokenize/__init__.py in word_tokenize(text, language, preserve_line)
126 :type preserver_line: bool
127 """
--> 128 sentences = [text] if preserve_line else sent_tokenize(text, language)
129 return [token for sent in sentences
130 for token in _treebank_word_tokenizer.tokenize(sent)]

/usr/local/lib/python3.7/dist-packages/nltk/tokenize/__init__.py in sent_tokenize(text, language)
92 :param language: the model name in the Punkt corpus
93 """
---> 94 tokenizer = load('tokenizers/punkt/{0}.pickle'.format(language))
95 return tokenizer.tokenize(text)
96

/usr/local/lib/python3.7/dist-packages/nltk/data.py in load(resource_url, format, cache, verbose, logic_parser, fstruct_reader, encoding)
832
833 # Load the resource.
--> 834 opened_resource = _open(resource_url)
835
836 if format == 'raw':

/usr/local/lib/python3.7/dist-packages/nltk/data.py in _open(resource_url)
950
951 if protocol is None or protocol.lower() == 'nltk':
--> 952 return find(path_, path + ['']).open()
953 elif protocol.lower() == 'file':
954 # urllib might not use mode='rb', so handle this one ourselves:

/usr/local/lib/python3.7/dist-packages/nltk/data.py in find(resource_name, paths)
671 sep = '*' * 70
672 resource_not_found = '\n%s\n%s\n%s\n' % (sep, msg, sep)
--> 673 raise LookupError(resource_not_found)
674
675

LookupError:
**********************************************************************
Resource punkt not found.
Please use the NLTK Downloader to obtain the resource:

>>> import nltk
>>> nltk.download('punkt')

Searched in:
- '/root/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- '/usr/nltk_data'
- '/usr/lib/nltk_data'
- ''
**********************************************************************

最佳答案

执行以下操作:

>>> import nltk
>>> nltk.download()

然后,当您收到一个弹出窗口时,选择 Module 选项卡中 identifier 列下的 punkt

enter image description here

关于python - NLTK:包错误?朋克和泡菜?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30822131/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com