gpt4 book ai didi

python - 如何保存分类器textblob NaiveBayesClassifier的结果?

转载 作者:太空宇宙 更新时间:2023-11-03 13:17:30 27 4
gpt4 key购买 nike

我正在使用 TextBlob 的 NaiveBayesclassifier 根据我选择的给定主题进行文本分析。

数据量巨大(约3000条)

虽然我能够得到结果,但如果不再次调用该函数并等待数小时直到处理完成,我就无法保存它以备将来使用。

我试过用下面的方法 pickle

ab = NaiveBayesClassifier(data)

import pickle

object = ab
file = open('f.obj','w') #tried to use 'a' in place of 'w' ie. append
pickle.dump(object,file)

然后我得到一个错误,如下所示:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\pickle.py", line 1370, in dump
Pickler(file, protocol).dump(obj)
File "C:\Python27\lib\pickle.py", line 224, in dump
self.save(obj)
File "C:\Python27\lib\pickle.py", line 331, in save
self.save_reduce(obj=obj, *rv)
File "C:\Python27\lib\pickle.py", line 419, in save_reduce
save(state)
File "C:\Python27\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "C:\Python27\lib\pickle.py", line 649, in save_dict
self._batch_setitems(obj.iteritems())
File "C:\Python27\lib\pickle.py", line 663, in _batch_setitems
save(v)
File "C:\Python27\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "C:\Python27\lib\pickle.py", line 600, in save_list
self._batch_appends(iter(obj))
File "C:\Python27\lib\pickle.py", line 615, in _batch_appends
save(x)
File "C:\Python27\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "C:\Python27\lib\pickle.py", line 562, in save_tuple
save(element)
File "C:\Python27\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "C:\Python27\lib\pickle.py", line 649, in save_dict
self._batch_setitems(obj.iteritems())
File "C:\Python27\lib\pickle.py", line 662, in _batch_setitems
save(k)
File "C:\Python27\lib\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "C:\Python27\lib\pickle.py", line 501, in save_unicode
self.memoize(obj)
File "C:\Python27\lib\pickle.py", line 247, in memoize
self.memo[id(obj)] = memo_len, obj
MemoryError

我也尝试过使用 sPickle,但它也导致了如下错误:

#saving object with function sPickle.s_dump
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\sPickle.py", line 22, in s_dump
for elt in iterable_to_pickle:
TypeError: 'NaiveBayesClassifier' object is not iterable

#saving object with function sPickle.s_dump_elt
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\sPickle.py", line 28, in s_dump_elt
pickled_elt_str = dumps(elt_to_pickle)
MemoryError: out of memory

谁能告诉我必须做什么才能保存对象?

或者无论如何都可以保存分类器的结果以供将来使用?

最佳答案

我自己解决了这个问题。

首先使用 64 位版本的 Python(适用于从 2.6 到 3.4 的所有版本)

64 位版本解决所有内存问题

使用 cPickle

import cPickle as pickle

然后将你的文件打开为

file = open('file_name.pickle','wb') #same as what Robert said in the above post

将对象写入文件

pickle.dump(object,file)

你的对象将被转储到一个文件中。但你必须检查你的对象使用了哪些内存。pickle-ing 也占用内存空间,因此至少 25% 的内存应该可用于要 pickle 的对象

对我来说,我的笔记本电脑有 8 GB RAM,因此内存只够容纳其中一个对象。

(我的分类器非常重,有 3000 个字符串实例,每个字符串包含大约 15-30 个单词的句子。情感/主题的数量是 22。)

因此,如果您的笔记本电脑死锁(或者,一般来说,停止工作),那么您可能必须关闭它并重新开始并尝试使用较小的编号。实例数或较少情绪/主题。

在这里,cPickle 非常有用,因为它比任何其他 pickle-ing 模块都快得多,我建议使用它。

关于python - 如何保存分类器textblob NaiveBayesClassifier的结果?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24431449/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com