gpt4 book ai didi

python - 值错误: [E088] Text of length 1027203 exceeds maximum of 1000000.

转载 作者:行者123 更新时间:2023-11-30 21:55:10 32 4
gpt4 key购买 nike

我正在尝试通过文本创建单词语料库。我用的是spacy。所以这是我的代码:

import spacy
nlp = spacy.load('fr_core_news_md')
f = open("text.txt")
doc = nlp(''.join(ch for ch in f.read() if ch.isalnum() or ch == " "))
f.close()
del f
words = []
for token in doc:
if token.lemma_ not in words:
words.append(token.lemma_)

f = open("corpus.txt", 'w')
f.write("Number of words:" + str(len(words)) + "\n" + ''.join([i + "\n" for i in sorted(words)]))
f.close()

但它返回此异常:

ValueError: [E088] Text of length 1027203 exceeds maximum of 1000000. The v2.x parser and NER models require roughly 1GB of temporary memory per 100,000 characters in the input. This means long texts may cause memory allocation errors. If you're not using the parser or NER, it's probably safe to increase the `nlp.max_length` limit. The limit is in number of characters, so you can check whether your inputs are too long by checking `len(text)`.

我尝试过这样的事情:

import spacy
nlp = spacy.load('fr_core_news_md')
nlp.max_length = 1027203
f = open("text.txt")
doc = nlp(''.join(ch for ch in f.read() if ch.isalnum() or ch == " "))
f.close()
del f
words = []
for token in doc:
if token.lemma_ not in words:
words.append(token.lemma_)

f = open("corpus.txt", 'w')
f.write("Number of words:" + str(len(words)) + "\n" + ''.join([i + "\n" for i in sorted(words)]))
f.close()

但遇到了同样的错误:

ValueError: [E088] Text of length 1027203 exceeds maximum of 1000000. The v2.x parser and NER models require roughly 1GB of temporary memory per 100,000 characters in the input. This means long texts may cause memory allocation errors. If you're not using the parser or NER, it's probably safe to increase the `nlp.max_length` limit. The limit is in number of characters, so you can check whether your inputs are too long by checking `len(text)`.

如何解决?

最佳答案

我与上面的答案不同,我认为 nlp.max_length 确实执行正确,但设置的值太低。看来您已将其设置为错误消息中的值。将 nlp.max_length 增加到略高于错误消息中的数字:

nlp.max_length = 1030000 # or even higher

理想情况下,它应该在此之后工作。

所以你的代码可以改为这样

import spacy
nlp = spacy.load('fr_core_news_md')
nlp.max_length = 1030000 # or higher
f = open("text.txt")
doc = nlp(''.join(ch for ch in f.read() if ch.isalnum() or ch == " "))
f.close()
del f
words = []
for token in doc:
if token.lemma_ not in words:
words.append(token.lemma_)

f = open("corpus.txt", 'w')
f.write("Number of words:" + str(len(words)) + "\n" + ''.join([i + "\n" for i in sorted(words)]))
f.close()

关于python - 值错误: [E088] Text of length 1027203 exceeds maximum of 1000000.,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57231616/

32 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com