python - 值错误 : operands could not be broadcast together with shapes in Naive bayes classifier-6ren

python - 值错误 : operands could not be broadcast together with shapes in Naive bayes classifier

转载作者：太空狗更新时间：2023-10-30 02:53:50

26

4

开门见山:

1) 我的目标是应用 NLP 和机器学习算法将包含句子的数据集分类为 5 种不同类型的类别(数字)。例如“我想知道我的订单详情 -> 1”。

代码:

import numpy as np
import pandas as pd

dataset = pd.read_csv('Ecom.tsv', delimiter = '\t', quoting = 3)

import re
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer

corpus = []
for i in range(0, len(dataset)):
    review = re.sub('[^a-zA-Z]', ' ', dataset['User'][i])
    review = review.lower()
    review = review.split()
    ps = PorterStemmer()
    review = [ps.stem(word) for word in review if not word in set(stopwords.words('english'))]
    review = ' '.join(review)
    corpus.append(review)

# # Creating the Bag of Words model
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer()
X = cv.fit_transform(corpus).toarray()
y = dataset.iloc[:, 1].values

# Splitting the dataset into the Training set and Test set
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = 0)

# Fitting Naive Bayes to the Training set
from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB()
classifier.fit(X_train, y_train)

# Predicting the Test set results
y_pred = classifier.predict(X_test)

# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)

此处一切正常，模型训练良好并预测测试数据的正确结果。

2) 现在我想使用这个经过训练的模型来预测新句子的类别。因此，我按照处理数据集的方式对文本进行了预处理。

代码:

#Pre processing the new input
new_text = "Please tell me the details of this order"
new_text = new_text.split()
ps = PorterStemmer()
processed_text = [ps.stem(word) for word in new_text if not word in set(stopwords.words('english'))]

vect = CountVectorizer()
Z = vect.fit_transform(processed_text).toarray()
classifier.predict(Z)

ValueError:操作数无法与形状 (4,4) (33,) 一起广播

我唯一能理解的是，当我第一次训练我的模型时转换我的语料库时，numpy 数组的形状是 (18, 33)。第二次，当我尝试预测新输入时，当我使用 fit_transform() 转换我的 processed_text 时，numpy 数组形状为 (4, 4)。

我无法弄清楚这里是否有我错误应用的过程？可以解决什么问题。提前致谢! :)

最佳答案

你答对了问题!

假设你有一个由 33 个不同单词组成的语料库，那么你在训练时的词袋将有 33 列。现在你正在使用另一个只有 4 个不同单词的语料库。你最终得到一个有 4 列的矩阵，模型不会喜欢这样!因此，您需要将第二个语料库放入与开始时相同的词袋矩阵中，包含 33 列。有不同的方法可以做到这一点，解释得很好 here .

例如，一种方法是使用 fit() 保存您在训练时使用的 transform 对象，然后在测试时应用它(仅 transform( ))!

关于python - 值错误 : operands could not be broadcast together with shapes in Naive bayes classifier，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/48153854/

26

4

0

文章推荐： python - 在 MultiIndex 上使用 df.query 给出 UndefinedVariableError

c++ - 架构 x86_64 的 undefined symbol : "Shape::get_area()", 从 : votable for shape in shape. o 引用
您好，我很确定我的问题很愚蠢，但我无法弄清楚它对我的生活有何影响。我有这个家庭作业，它基本上是为了加强我们在类里面学到的关于多态性的知识(顺便说一下，这是 C++)。该程序的基础是一个名为 shape
python - 引发 ValueError ("bad input shape {0}".format(shape)) ValueError : bad input shape (10, 90)
我是新手，所以需要任何帮助，当我要求一个例子时，我的教授给我了这段代码，我希望有一个工作模型...... from numpy import loadtxt import numpy as np fr
CSS shape-margin、shape-outside 不起作用
CSS 形状边距和外型不适用于我的系统。我正在使用最新版本的 Chrome。我唯一能想到的是我的操作系统是 Windows 7。这应该是一个问题吗？这是JSFiddle .但是，由于在您的系统上
基于tf.shape(tensor)和tensor.shape()的区别说明
#tf.shape(tensor)和tensor.shape()的区别 ?
excel - 如何在excel vba中使用 "for each shape in activesheet.shapes"将形状插入指定单元格
我要求提示以下问题。如何从事件表添加到指定的单元格形状？当我知道名称但不知道如何为...中的每个形状实现论坛时，我可以添加形状目前我有这样的事情: Sub loop() Dim a As Integ
VBA 获取连接器 'from shape' 和 'to shape'
我在 Excel 中有一个流程设计(使用形状、连接器等)。我需要的是有一个矩阵，每个形状都有所有的前辈和所有的后继者。在 VBA 中，为此我正在尝试执行以下操作: - 我列出了所有的连接器(Sha
java - 如何在 JavaFX 场景图中拖动低于另一个 `Shape` 的 `shape`？
我正在使用 JavaFX 编写一个教育应用程序，用户可以在其中绘制和操作贝塞尔曲线 Line、QuadCurve 和 CubicCurve。这些曲线应该能够用鼠标拖动。我有两种选择: 1- 使用类 L
python - matplotlib 历史() : weights should have the same shape as x while shape is the same
我正在尝试绘制 pandas 系列中列的直方图 ('df_plot')。因为我希望 y 轴是百分比(而不是计数)，所以我使用权重选项来实现这一点。正如您在下面的堆栈跟踪中发现的那样，权重数组和数据系列
python - OpenCV:无法创建类型为 "flatten_1/Shape"的层 "Shape"
我尝试在 opencv dnn 中实现一个 tensorflow 模型。这是我遇到的错误: OpenCV: Can't create layer "flatten_1/Shape" of type "
JavaFX Canvas : Draw a shape exclusively within another shape
我目前正在用 Java 开发一款游戏，我一直在尝试弄清楚如何在 Canvas 上绘制一个形状(例如圆形)，在不同的形状(例如正方形)之上，但是只绘制与正方形相交的圆的部分，类似于 Photoshop
python - 对于范围(defects.shape)中的i:AttributeError: 'NoneType'对象没有属性 'shape'
import cv2 import numpy as np import sys import time import os cap = cv2.VideoCa
python - 检查输入时出错 : expected embedding_1 input to have shape but got shape
我已经成功创建了 Keras 序列模型并对其进行了一段时间的训练。现在我试图做出一些预测，但即使使用与训练阶段相同的数据，它也会失败。我收到此错误:{ValueError}检查输入时出错:预期 em
python - .shape[] 在 "for i in range(Y.shape[0])"中做了什么？
我正在尝试逐行分解程序。 Y 是一个数据矩阵，但我找不到任何关于 .shape[0] 究竟做了什么的具体数据。 for i in range(Y.shape[0]): if Y[i] == -
opencv - 行，列，_ = frame.shape AttributeError: 'tuple'对象没有属性 'shape'
我正在尝试运行代码，但它给了我这个错误: 行，列，_ = frame.shape AttributeError:“tuple”对象没有属性“shape” 我正在使用OpenCV和python 3.6，
java - 将 Shape 从 awt 转换为 javafx 中的 Shape
我想在 JavaFx 中的 Pane 上显示形状。我正在使用从空间数据库中选择的 Oracle JGeometry 对象，它有一个方法 createShape() 但它返回 java.awt.Shap
python - 值错误: could not broadcast input array from shape (5) into shape (7)
在此代码中: import pandas as pd myj='{"columns":["tablename","alias_tablename","real_tablename","
python - 将函数应用于小数据帧 : shape mismatch: value array of shape (4, ) 无法广播
我正在尝试将 API 结果应用于两列。下面是我的虚拟数据框。不幸的是，这不是很容易重现，因为我使用的是带有 key 和密码的 API...这只是为了让您了解尺寸。但我希望也许有人能发现一个明显的问
java - JSONVIEW.with 使用 @JsonFormat(shape = JsonFormat.Shape.OBJECT) 将枚举序列化为字符串
我的代码是: final String json = getObjectMapper().writeValueAsString(JsonView.with(graph) .onClas
python - 索引错误 : shape mismatch: indexing arrays could not be broadcast together with shapes
a=np.arange(240).reshape(3,4,20) b=np.arange(12).reshape(3,4) c=np.zeros((3,4),dtype=int) x=np.arang
python - 索引错误 : shape mismatch: indexing arrays could not be broadcast together with shapes
我正在尝试从张量中提取某些数据，但出现了奇怪的错误。在这里，我将尝试生成错误: a=np.random.randn(5, 10, 5, 5) a[:, [1, 6], np.triu_indices(

首页

博学

6Ren·AI

商城

python - 值错误 : operands could not be broadcast together with shapes in Naive bayes classifier