Python Spacy 键错误 : "[E018] Can' t retrieve string for hash-6ren

Python Spacy 键错误 : "[E018] Can' t retrieve string for hash

转载作者：行者123 更新时间：2023-12-05 05:02:38

25

4

我试图让我的代码在 Raspberry Pi 4 上运行，但在这个错误上停留了几个小时。此代码段会引发错误，但在具有相同项目的 Windows 上可以完美运行

def create_lem_texts(data): # 作为列表 def sent_to_words(句子): 对于句子中的句子: yield (gensim.utils.simple_preprocess(str(sentence), deacc=True)) # deacc=True 去除标点符号

data_words = list(sent_to_words(data))
bigram = gensim.models.Phrases(data_words, min_count=5, threshold=100)  # higher threshold fewer phrases.
bigram_mod = gensim.models.phrases.Phraser(bigram)

def remove_stopwords(texts):
    return [[word for word in simple_preprocess(str(doc)) if word not in stop_words] for doc in texts]

def make_bigrams(texts):
    return [bigram_mod[doc] for doc in texts]

def lemmatization(texts, allowed_postags=['NOUN', 'ADJ', 'VERB', 'ADV']):
    """https://spacy.io/api/annotation"""
    texts_out = []
    print(os.getcwd())
    for sent in texts:
        doc = nlp(" ".join(sent))
        texts_out.append([token.lemma_ for token in doc if token.pos_ in allowed_postags])
    return texts_out

data_words_nostops = remove_stopwords(data_words)
data_words_bigrams = make_bigrams(data_words_nostops)
print(os.getcwd())
nlp = spacy.load('en_core_web_sm', disable=['parser', 'ner'])

# Do lemmatization keeping only noun, adj, vb, adv
data_lemmatized = lemmatization(data_words_bigrams, allowed_postags=['NOUN', 'ADJ', 'VERB', 'ADV'])

return data_lemmatized

这段代码被这个函数依次调用:

def assign_topics_tweet(tweets):
owd = os.getcwd()
print(owd)
os.chdir('/home/pi/Documents/pycharm_project_twitter/topic_model/')
print(os.getcwd())
lda = LdaModel.load("LDA26")
print(lda)
id2word = Dictionary.load('Id2Word')
print(id2word)
os.chdir(owd)
data = create_lem_texts(tweets)
corpus = [id2word.doc2bow(text) for text in data]
topics = []
for tweet in corpus:
    topics_dist = lda.get_document_topics(tweet)
    topics.append(topics_dist)
return topics

这是错误信息

    Traceback (most recent call last):
  File "/home/pi/Documents/pycharm_project_twitter/Twitter_Import.py", line 193, in <module>
    main()
  File "/home/pi/Documents/pycharm_project_twitter/Twitter_Import.py", line 169, in main
    topics = assign_topics_tweet(data)
  File "/home/pi/Documents/pycharm_project_twitter/TopicModel.py", line 238, in assign_topics_tweet
    data = create_lem_texts(tweets)
  File "/home/pi/Documents/pycharm_project_twitter/TopicModel.py", line 76, in create_lem_texts
    data_lemmatized = lemmatization(data_words_bigrams, allowed_postags=['NOUN', 'ADJ', 'VERB', 'ADV'])
  File "/home/pi/Documents/pycharm_project_twitter/TopicModel.py", line 67, in lemmatization
    texts_out.append([token.lemma_ for token in doc if token.pos_ in allowed_postags])
  File "/home/pi/Documents/pycharm_project_twitter/TopicModel.py", line 67, in <listcomp>
    texts_out.append([token.lemma_ for token in doc if token.pos_ in allowed_postags])
  File "token.pyx", line 871, in spacy.tokens.token.Token.lemma_.__get__
  File "strings.pyx", line 136, in spacy.strings.StringStore.__getitem__
KeyError: "[E018] Can't retrieve string for hash '18446744073541552667'. This usually refers to an issue with the `Vocab` or `StringStore`."

Process finished with exit code 1

我尝试重新安装 spacy 和 en 模型，直接在 pi 上运行它，spacy 版本在我的 Windows 机器和 Pi 上都是相同的。而且网上基本没有关于这个错误的资料

最佳答案

经过三天的测试，只需安装旧版本的 Spacy 2.0.1 即可解决问题

关于Python Spacy 键错误 : "[E018] Can' t retrieve string for hash，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/62076017/

25

4

0

文章推荐： reactjs - 我应该使用 React 提供多少静态 HTML？

文章推荐： logging - Azure Application Insights 查询自定义维度

文章推荐： python-3.x - 线性回归中的混淆矩阵

文章推荐： javascript - 创建 React App 语法错误 : Unexpected Token

javascript - Ember JS : retrieving model from route when multiple models are retrieved
我有一条路线，有两个与之关联的模型，如下所示: App.IndexRoute = Ember.Route.extend({ model: function() { return Emb
python - 尝试使用 urllib.retrieve : 'module' object has no attribute 'retrieve' 通过 URL 下载文件
这是我的第一个 Python 脚本，所以我认为我做错了什么。但是我在任何教程或示例中都找不到线索。以下代码(可以这么说): import urllib urllib.retrieve("http://
谷歌云数据流作业失败，出现错误 'Failed to retrieve staged files: failed to retrieve worker in 3 attempts: bad MD5...'
SDK:适用于 Go 0.5.0 的 Apache Beam SDK 我们在 Google Cloud Data Flow 中运行 Apache Beam Go SDK 作业。他们一直工作得很好，直到
javascript - 商业催化剂: Is it possible to submit the Password Retrieve Request form and display the resulting Password Retrieve Conformation page via AJAX?
我希望简化网站用户在 Adobe Business Catalyst 网站上注册、登录和检索/重置密码的流程。我已成功使用 AJAX 提交安全区登录表单。我还使用 jQuery.load 从密码
sql - .verify.JDBC.result(r， "Unable to retrieve JDBC result set"，: Unable to retrieve JDBC result set JDBC ERROR: Numeric value 'NA' is not 中的错误
我在 R 中收到以下错误消息: Error in .verify.JDBC.result(r, "Unable to retrieve JDBC result set", : Unable to r
information-retrieval - 哪里不使用IDF？
在什么情况下“反文档频率”在信息检索中不起作用？最佳答案如果您不希望在系统中权衡稀有术语而不是频繁出现的术语，则可能不希望使用IDF。此外，计算idf是一项昂贵的操作。从以下事实可以明显看出这一点
information-retrieval - 平均精度的正确版本是什么？
我正在尝试计算 Average Precision (和 Mean Average Precision )在 Oxford Building image dataset 上. 下面是他们提供的用于计算
information-retrieval - 用于在网站上查找联系方式的脚本或库
有人知道在网站上查找最相关的联系信息的脚本/食谱/库吗？一些可能的情况: 在个人网页上查找联系电话号码在博客上查找所有者电子邮件地址查找联系页面的网址最佳答案查看WSO2's Mashup
data-retrieval - 从战网获取数据
我看过像“sc2ranks.com”这样的网站，我完全想知道他们是如何获得我只能在游戏中访问的信息的。我的意思是我如何访问星际争霸 II 欧盟天梯？这只是我们必须访问/下载的 URL，还是需要通过注入
information-retrieval - 置换项索引是如何工作的？
我已阅读 Permuterm indexes stanford 网站上的页面，但是我仍然无法弄清楚我们如何从以下地址到达:*X*至 X* . 那么$在哪里？ ? 我可以得到这些: For X, loo
跨节点的javascript选择: how to retrieve those nodes?
考虑以下选定的 html 片段: .......... .......||||||||||||||||||||||||||||||....... 第二行代表用户选择(管道)，跨越多个跨度标签。使用 j
PowerShell : retrieve file from GitHub
我需要从我的 GitHub 私有(private)仓库下载一个文件。因此，按照 GitHub 站点上的说明，我为我的凭据创建了一个 OAuth token 。然后我执行了这个 PS 脚本: $Web
Java方法: retrieve the inheriting type
我有几个扩展 C 的类，并且我需要一个接受 C 类型的任何参数的方法。但是在这个方法中我想知道我正在处理 A 还是 B。 * public A extends C public B extends C
Apache Struts : Cannot retrieve ActionForward
我正在尝试学习 Apache Struts 框架，并且我编写了一个用于进行类(class)注册的小型应用程序，但每当我尝试加载我的应用程序时，它都会抛出以下异常: javax.servlet.Serv
u2 - UniVerse RetrieVe 如何查询文件中所有列的值？
有点跟进 my self-answered question about finding the column names. 在 UniVerse 中，您无法查询文件的所有列，除非文件字典中的 @ 短
plugins - 使用 Retrieve 消息获取实体
我正在为 Dynamics CRM 2011 创建一个插件，以便在用户打开潜在客户时执行。所以我使用 Retrieve 消息来执行插件。 IPluginExecutionContext context
information-retrieval - 寻找有关信息科学、信息检索的书籍
关闭。这个问题不满足Stack Overflow guidelines .它目前不接受答案。想改善这个问题吗？更新问题，使其成为 on-topic对于堆栈溢出。 6年前关闭。 Improve thi
Java Jsoup : Retrieve only the article
正在尝试检索文章的文本。我想选择中的所有文本 ... 我做到了。但我只想从文章正文中检索文本，而不是整个页面 Document article = Jsoup.connect("html doc
javascript - .retrieve() 中的第二个参数；总是运行？
我一直在尝试元素存储，但我被元素存储中的 .retrieve(); 问题所困扰。文档指出: Element:retrieve actually accepts an optional second
Facebook "could not retrieve data from URL"
尝试对 facebook 进行图形 API 调用时出现以下错误。我知道我的回调 url 很好，因为当我在 facebook 调试器中测试它时，它可以很好地查看页面。我也在使用 Google App E

首页

博学

6Ren·AI

商城

Python Spacy 键错误 : "[E018] Can' t retrieve string for hash