gpt4 book ai didi

python - 用pickle加速sklearn/机器学习的分类任务?

转载 作者:行者123 更新时间:2023-11-30 09:01:27 27 4
gpt4 key购买 nike

我已经训练了一个分类器,并通过 pickle 加载。我的主要疑问是是否有什么可以加快分类任务的速度。每个文本(特征提取和分类)大约需要 1 分钟,这正常吗?我应该继续多线程吗?

这里有一些代码片段可以查看整体流程:

for item in items:
review = ''.join(item['review_body'])
review_features = getReviewFeatures(review)
normalized_predicted_rating = getPredictedRating(review_features)
item_processed['rating'] = str(round(float(normalized_predicted_rating),1))

def getReviewFeatures(review, verbose=True):

text_tokens = tokenize(review)

polarity = getTextPolarity(review)

subjectivity = getTextSubjectivity(review)

taggs = getTaggs(text_tokens)

bigrams = processBigram(taggs)
freqBigram = countBigramFreq(bigrams)
sort_bi = sortMostCommun(freqBigram)

adjectives = getAdjectives(taggs)
freqAdjectives = countFreqAdjectives(adjectives)
sort_adjectives = sortMostCommun(freqAdjectives)

word_features_adj = list(sort_adjectives)
word_features = list(sort_bi)

features={}
for bigram,freq in word_features:
features['contains(%s)' % unicode(bigram).encode('utf-8')] = True
features["count({})".format(unicode(bigram).encode('utf-8'))] = freq

for word,freq in word_features_adj:
features['contains(%s)' % unicode(word).encode('utf-8')] = True
features["count({})".format(unicode(word).encode('utf-8'))] = freq

features["polarity"] = polarity
features["subjectivity"] = subjectivity

if verbose:
print "Get review features..."

return features


def getPredictedRating(review_features, verbose=True):
start_time = time.time()
classifier = pickle.load(open("LinearSVC5.pickle", "rb" ))

p_rating = classifier.classify(review_features) # in the form of "# star"
predicted_rating = re.findall(r'\d+', p_rating)[0]
predicted_rating = int(predicted_rating)

best_rating = 5
worst_rating = 1
normalized_predicted_rating = 0
normalized_predicted_rating = round(float(predicted_rating)*float(10.0)/((float(best_rating)-float(worst_rating))+float(worst_rating)))

if verbose:
print "Get predicted rating..."
print "ML_RATING: ", normalized_predicted_rating
print("---Took %s seconds to predict rating for the review---" % (time.time() - start_time))

return normalized_predicted_rating

最佳答案

NLTK是一个很棒的工具,也是自然语言处理的一个很好的起点,但如果速度很重要,那么有时它就不是很有用,正如作者含蓄地说的那样:

NLTK has been called “a wonderful tool for teaching, and working in, computational linguistics using Python,” and “an amazing library to play with natural language.”

因此,如果您的问题仅在于工具包的分类器的速度,则必须使用其他资源或者您必须自己编写分类器。

Scikit如果您想使用可能更快的分类器,可能会对您有所帮助。

关于python - 用pickle加速sklearn/机器学习的分类任务?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32972138/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com