gpt4 book ai didi

Python-文本挖掘-TypeError : __hash__ method should return an integer

转载 作者:行者123 更新时间:2023-11-30 09:21:13 25 4
gpt4 key购买 nike

我正在研究 python 中的分类问题。事实是,我对 python 还不太擅长。所以我很长时间以来都遇到同样的问题,但我不知道如何解决。我希望你能帮助我:)

这是我的代码:

tableau = pandas.DataFrame({'Exigence':exigence,'Résumé':resume})    

df2, targets = encode_target(tableau,"Exigence")
features = list(df2.columns[:4])

for line in resume:
terms = prep.ngram_tokenizer(text=line)
mx.add_doc(doc_id='some-unique-identifier',
doc_class=df2["Target"],
doc_terms=terms,
frequency=True,
do_padding=True)

我有这个错误:

objects are mutable, thus they cannot be hashed
Traceback (most recent call last):

File "<ipython-input-9-072e9c71917a>", line 7, in <module>
do_padding=True)

File "C:\Users\nouguierc\AppData\Local\Continuum\Anaconda3\lib\site- packages\irlib\matrix.py", line 222, in add_doc
if doc_class in self.classes:

TypeError: __hash__ method should return an integer

当我进入matrix.py的第222行时,我看到了这个:

    if doc_class in self.classes:
self.classes[doc_class].add(my_doc_terms)

包含这些行的函数是:

def add_doc(self, doc_id = '', doc_class='', doc_terms=[], 
frequency=False, do_padding=False):
''' Add new document to our matrix:
doc_id: Identifier for the document, eg. file name, url, etc.
doc_class: You might need this in classification.
doc_terms: List of terms you got after tokenizing the document.
frequency: If true, term occurences is incremented by one.
Else, occurences is only 0 or 1 (a la Bernoulli)
do_padding: Boolean. Check do_padding() for more info.
'''
# Update list of terms if new term seen.
# And document (row) with its associated data.
my_doc_terms = SuperList()
for term in doc_terms:
term_idx = self.terms.unique_append(term)
#my_doc_terms.insert_after_padding(self.terms.index(term))
if frequency:
my_doc_terms.increment_after_padding(term_idx,1)
else:
my_doc_terms.insert_after_padding(term_idx,1)
self.docs.append({ 'id': doc_id,
'class': doc_class,
'terms': my_doc_terms})
# Update list of document classes if new class seen.
# self.classes.unique_append(doc_class)
if doc_class in self.classes:
self.classes[doc_class].add(my_doc_terms)
else:
self.classes[doc_class] = my_doc_terms
if do_padding:
self.do_padding()

您对我的问题有何看法?

最佳答案

您正在将对象作为doc_class传递,检查df2['Target']返回什么,可能是pandas系列,将其转换为一个字符串,然后传递它。

关于Python-文本挖掘-TypeError : __hash__ method should return an integer,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38009703/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com