gpt4 book ai didi

python - 类型错误 : ufunc 'add' did not contain a loop with signature matching types

转载 作者:IT老高 更新时间:2023-10-28 20:34:33 36 4
gpt4 key购买 nike

我正在创建句子的词袋表示。然后将句子中存在的单词与文件“vectors.txt”进行比较,以获得它们的嵌入向量。在获得句子中存在的每个单词的向量后,我将取句子中单词向量的平均值。这是我的代码:

import nltk
import numpy as np
from nltk import FreqDist
from nltk.corpus import brown


news = brown.words(categories='news')
news_sents = brown.sents(categories='news')

fdist = FreqDist(w.lower() for w in news)
vocabulary = [word for word, _ in fdist.most_common(10)]
num_sents = len(news_sents)

def averageEmbeddings(sentenceTokens, embeddingLookupTable):
listOfEmb=[]
for token in sentenceTokens:
embedding = embeddingLookupTable[token]
listOfEmb.append(embedding)

return sum(np.asarray(listOfEmb)) / float(len(listOfEmb))

embeddingVectors = {}

with open("D:\\Embedding\\vectors.txt") as file:
for line in file:
(key, *val) = line.split()
embeddingVectors[key] = val

for i in range(num_sents):
features = {}
for word in vocabulary:
features[word] = int(word in news_sents[i])
print(features)
print(list(features.values()))
sentenceTokens = []
for key, value in features.items():
if value == 1:
sentenceTokens.append(key)
sentenceTokens.remove(".")
print(sentenceTokens)
print(averageEmbeddings(sentenceTokens, embeddingVectors))

print(features.keys())

不知道为什么,但我得到了这个错误:

TypeError                                 Traceback (most recent call last)
<ipython-input-4-643ccd012438> in <module>()
39 sentenceTokens.remove(".")
40 print(sentenceTokens)
---> 41 print(averageEmbeddings(sentenceTokens, embeddingVectors))
42
43 print(features.keys())

<ipython-input-4-643ccd012438> in averageEmbeddings(sentenceTokens, embeddingLookupTable)
18 listOfEmb.append(embedding)
19
---> 20 return sum(np.asarray(listOfEmb)) / float(len(listOfEmb))
21
22 embeddingVectors = {}

TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('<U9') dtype('<U9') dtype('<U9')

附:嵌入向量看起来像:

the 0.011384 0.010512 -0.008450 -0.007628 0.000360 -0.010121 0.004674 -0.000076 
of 0.002954 0.004546 0.005513 -0.004026 0.002296 -0.016979 -0.011469 -0.009159
and 0.004691 -0.012989 -0.003122 0.004786 -0.002907 0.000526 -0.006146 -0.003058
one 0.014722 -0.000810 0.003737 -0.001110 -0.011229 0.001577 -0.007403 -0.005355
in -0.001046 -0.008302 0.010973 0.009608 0.009494 -0.008253 0.001744 0.003263

使用 np.sum 后出现此错误:

TypeError                                 Traceback (most recent call last)
<ipython-input-13-8a7edbb9d946> in <module>()
40 sentenceTokens.remove(".")
41 print(sentenceTokens)
---> 42 print(averageEmbeddings(sentenceTokens, embeddingVectors))
43
44 print(features.keys())

<ipython-input-13-8a7edbb9d946> in averageEmbeddings(sentenceTokens, embeddingLookupTable)
18 listOfEmb.append(embedding)
19
---> 20 return np.sum(np.asarray(listOfEmb)) / float(len(listOfEmb))
21
22 embeddingVectors = {}

C:\Anaconda3\lib\site-packages\numpy\core\fromnumeric.py in sum(a, axis, dtype, out, keepdims)
1829 else:
1830 return _methods._sum(a, axis=axis, dtype=dtype,
-> 1831 out=out, keepdims=keepdims)
1832
1833

C:\Anaconda3\lib\site-packages\numpy\core\_methods.py in _sum(a, axis, dtype, out, keepdims)
30
31 def _sum(a, axis=None, dtype=None, out=None, keepdims=False):
---> 32 return umr_sum(a, axis, dtype, out, keepdims)
33
34 def _prod(a, axis=None, dtype=None, out=None, keepdims=False):

TypeError: cannot perform reduce with flexible type

最佳答案

你有一个 numpy 字符串数组,而不是 float 。这就是 dtype('<U9') 的意思。 -- 一个小端编码的 unicode 字符串,最多 9 个字符。

尝试:

return sum(np.asarray(listOfEmb, dtype=float)) / float(len(listOfEmb))

但是,这里根本不需要 numpy。你真的可以这样做:

return sum(float(embedding) for embedding in listOfEmb) / len(listOfEmb)

或者如果你真的打算使用 numpy。

return np.asarray(listOfEmb, dtype=float).mean()

关于python - 类型错误 : ufunc 'add' did not contain a loop with signature matching types,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35013726/

36 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com