gpt4 book ai didi

python - 无法在逻辑回归中使用 Decision_function() 评估分数

转载 作者:行者123 更新时间:2023-11-30 09:08:05 25 4
gpt4 key购买 nike

我正在读这所大学。在华盛顿作业中,我必须使用 LogisticRegression 中的 Decision_function() 来预测sample_test_matrix(最后几行)的分数。但我收到的错误是

    ValueError: X has 145 features per sample; expecting 113092

这是代码:

   import pandas as pd 
import numpy as np
from sklearn.linear_model import LogisticRegression

products = pd.read_csv('amazon_baby.csv')

def remove_punct (text) :
import string
text = str(text)
for i in string.punctuation:
text = text.replace(i,"")
return(text)

products['review_clean'] = products['review'].apply(remove_punct)
products = products[products.rating != 3]
products['sentiment'] = products['rating'].apply(lambda x : +1 if x > 3 else -1 )

train_data_index = pd.read_json('module-2-assignment-train-idx.json')
test_data_index = pd.read_json('module-2-assignment-test-idx.json')

train_data = products.loc[train_data_index[0], :]
test_data = products.loc[test_data_index[0], :]
train_data = train_data.dropna()
test_data = test_data.dropna()

from sklearn.feature_extraction.text import CountVectorizer

train_matrix = vectorizer.fit_transform(train_data['review_clean'])
test_matrix = vectorizer.fit_transform(test_data['review_clean'])

sentiment_model = LogisticRegression()
sentiment_model.fit(train_matrix, train_data['sentiment'])
print (sentiment_model.coef_)

sample_data = test_data[10:13]
print (sample_data)

sample_test_matrix = vectorizer.transform(sample_data['review_clean'])
scores = sentiment_model.decision_function(sample_test_matrix)
print (scores)

这是产品数据:

          Name                                                         Review                                       Rating  

0 Planetwise Flannel Wipes These flannel wipes are OK, but in my opinion ... 3


1 Planetwise Wipe Pouch it came early and was not disappointed. i love... 5


2 Annas Dream Full Quilt with 2 Shams Very soft and comfortable and warmer than it l... 5

3 Stop Pacifier Sucking without tears with Thumb... This is a product well worth the purchase. I ... 5

4 Stop Pacifier Sucking without tears with Thumb... All of my kids have cried non-stop when I trie... 5

最佳答案

该行导致后续行出现错误:

test_matrix = vectorizer.fit_transform(test_data['review_clean'])

将上面的内容更改为:

test_matrix = vectorizer.transform(test_data['review_clean'])

说明:使用 fit_transform() 将根据测试数据重新拟合 CountVectorizer。因此,有关训练数据的所有信息都将丢失,词汇量将仅根据测试数据计算。

然后您将使用该vectorizer对象来转换sample_data['review_clean']。因此其中的功能将只是从 test_data 中学习的功能。

但是 sentiment_model 是根据 train_data 中的词汇进行训练的。因此功能不同。

始终对测试数据使用 transform(),切勿使用 fit_transform()

关于python - 无法在逻辑回归中使用 Decision_function() 评估分数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47204919/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com