python - sklearn模型数据转换错误: CountVectorizer

python - sklearn模型数据转换错误: CountVectorizer - Vocabulary wasn't fitted

转载作者：行者123 更新时间：2023-11-30 08:52:07

25

4

我已经训练了一个主题分类模型。然后，当我要将新数据转换为向量进行预测时，就会出错。它显示“NotFittedError:CountVectorizer - 词汇未安装。”但是，当我通过将训练数据拆分为训练模型中的测试数据来进行预测时，它起作用了。代码如下:

from sklearn.externals import joblib
from sklearn.feature_extraction.text import CountVectorizer

import pandas as pd
import numpy as np

# read new dataset
testdf = pd.read_csv('C://Users/KW198/Documents/topic_model/training_data/testdata.csv', encoding='cp950')

testdf.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1800 entries, 0 to 1799
Data columns (total 2 columns):
keywords    1800 non-null object
topics      1800 non-null int64
dtypes: int64(1), object(1)
memory usage: 28.2+ KB

# read columns
kw = testdf['keywords']
label = testdf['topics']

# 將預測資料轉為向量
vectorizer = CountVectorizer(min_df=1, stop_words='english')
x_testkw_vec = vectorizer.transform(kw)

这里有一个错误

---------------------------------------------------------------------------
NotFittedError                            Traceback (most recent call last)
<ipython-input-93-cfcc7201e0f8> in <module>()
      1 # 將預測資料轉為向量
      2 vectorizer = CountVectorizer(min_df=1, stop_words='english')
----> 3 x_testkw_vec = vectorizer.transform(kw)

~\Anaconda3\envs\ztdl\lib\site-packages\sklearn\feature_extraction\text.py in transform(self, raw_documents)
    918             self._validate_vocabulary()
    919 
--> 920         self._check_vocabulary()
    921 
    922         # use the same matrix-building strategy as fit_transform

~\Anaconda3\envs\ztdl\lib\site-packages\sklearn\feature_extraction\text.py in _check_vocabulary(self)
    301         """Check if vocabulary is empty or missing (not fit-ed)"""
    302         msg = "%(name)s - Vocabulary wasn't fitted."
--> 303         check_is_fitted(self, 'vocabulary_', msg=msg),
    304 
    305         if len(self.vocabulary_) == 0:

~\Anaconda3\envs\ztdl\lib\site-packages\sklearn\utils\validation.py in check_is_fitted(estimator, attributes, msg, all_or_any)
    766 
    767     if not all_or_any([hasattr(estimator, attr) for attr in attributes]):
--> 768         raise NotFittedError(msg % {'name': type(estimator).__name__})
    769 
    770 

NotFittedError: CountVectorizer - Vocabulary wasn't fitted.

最佳答案

您需要调用vectorizer.fit()让计数向量化器在调用 vectorizer.transform() 之前构建单词词典。您也可以调用vectorizer.fit_transform()将两者结合起来。

但是您不应该使用新的向量化器进行测试或任何类型的推理。您需要使用训练模型时使用的相同的，否则您的结果将是随机的，因为词汇不同(缺少一些单词，没有相同的对齐方式等......)

为此，您可以 pickle训练中使用的向量化器并在推理/测试时加载它。

关于python - sklearn模型数据转换错误: CountVectorizer - Vocabulary wasn't fitted，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/49547715/

25

4

0

文章推荐： javascript - 这个网站被黑了吗？

文章推荐： tensorflow - 在 google-cloud-ml 上为诗人部署和预测 tensorflow

文章推荐： javascript - 替换字符串中不同字符的函数

r - "Wasn' t 能够确定域的范围"for ColorNumeric
我正在尝试在 R 中创建等值线图。我已经合并了我的 shapefile 和数据文件。我正在尝试为我希望我的数据显示在我的 choropleth 上的不同颜色创建一个调色板。当我使用 colorNume
clojure - 尝试删除所有节点+rels 抛出 "Expected to be in a transaction but wasn' t"
我在 Clojure 中工作，针对使用 neocons 库的 neo4j 数据库。我有一个测试装置，它使用以下 Cypher 查询来拆除每个单元测试后创建的节点和关系: START n=node(*
php - 代码 Iginiter 错误 'The address wasn' 无法理解'
我的新手代码 iginiter 3 带有 mysql 数据库和 xampp v3.2 。当我单击登录按钮时，出现“地址无法理解”错误。帮助我... 这个配置 `enter code here` $c
c# - 如何强制单元测试的状态为 "Test wasn' t run”？(MS 测试)
我有一些单元测试正在测试第 3 方 REST API 的代理(仅限 GET)。此 API 返回的数据可能会发生变化，有时根本没有数据。这意味着有时我无法检查我的代码是否有效。我不想让我的测试在他们没有
Grails Spring Security SAML 插件 "SPSSODescriptor wasn' t 找到”
我已经配置了 grails saml 插件并加载了 SP 元数据文件。当我尝试访问应用程序上的 protected 资源时，出现以下错误。我似乎找不到有关它的任何信息。实体测试应用和角色的元数据 {
visual-studio - Resharper 7 : MSTest not working - "Test wasn' t run"
自从我升级到 VS2012 和 Resharper 7 后，我以前工作的 MS 测试不再运行了。测试在 ASP.NET 环境中运行。我使用以下属性: [TestMethod] [Ho
asp.net - 'GridView 1' fired event PageIndexChanging which wasn' t 已处理
我正在使用 gridview，我想使用分页。我已经将允许分页设置为 true，并将页面大小设置为 5。我可以看到 gridview 底部的数字，但是当我单击数字移动到相应页面时，它会抛出错误: Gri
ruby-on-rails - 获取 "RuntimeError: SSL Session wasn' t 初始化”错误 rails
目前，我正在与使用 3-legged OAuth 安全性的 API 进行通信。对于通信，我每次都需要访问 token 。访问 token 自创建之日起有效期为 1 年。我已经将它存储在数据库中(A
cassandra - 从 Cassandra Java 驱动程序收到警告，启动时在控制主机的 system.peers 中发现接触点 "wasn' t”
我注意到以下来自 datastax cassandra java 驱动程序的 WARN 日志消息。请帮助理解此消息。它有多重要？它的影响是什么？如何修复它。 Cassandra Version : 2
ssis - "Host key wasn' t 已验证“尽管已将 -hostkey 开关添加到 WinSCP 命令行
我正在尝试使用带有以下命令行参数的 SSIS 执行进程任务建立 SFTP 连接。 /log=G:\USER_DATA\USER_USER_SYNC\SFTP_LOG\user_sync_winscp.
c# Resharper 'No Tests Found in Project'/'Inconclusive: Test wasn' t 运行'
我已经安装了 ReSharper v8.2.1。我有一个包含多个测试项目的 VS2013 解决方案。他们中的大多数工作得很好。但是，有一个项目给我带来了麻烦。在解决方案资源管理器中，我右键单击该项目
gatsby - "Gatsby-plugin-sharp wasn' t 在 gatsby-config.js 中正确设置。确保将它添加到插件数组中。”
最令人沮丧的部分是我之前有这个工作然后不知何故破坏了它，但我正在使用 gatsby-plugin-sharp 和 gatsby-plugin-image 将照片添加到我的主页并看到这个错误: Gats
gatsby - "Gatsby-plugin-sharp wasn' t 在 gatsby-config.js 中正确设置。确保将它添加到插件数组中。”
最令人沮丧的部分是我之前有这个工作然后不知何故破坏了它，但我正在使用 gatsby-plugin-sharp 和 gatsby-plugin-image 将照片添加到我的主页并看到这个错误: Gats
php - 亚马逊 s3 存储桶上传图像 curl 异常 'data rewind wasn' t 可能 '
我正在使用 codeigniter 将图像上传到亚马逊存储桶。我遇到了这个错误，我不知道如何解决这个问题 Fatal error: Uncaught exception 'Guzzle\Http\Ex
asp.net - 你能告诉我为什么我收到 `The GridView ' GridView 1' fired event RowUpdating which wasn' t Handling.` 错误消息
我已经能够解决最初无法一致插入记录的问题，但出现了新的错误消息。现在，我收到GridView“GridView1”触发了未处理的事件RowUpdating。当我单击“更新”按钮更新一行记录时，会发
javascript - 使用 Ag-Grid Enterprise 许可证获取 'ag-grid: Looking for component [agSetColumnFilter] but it wasn' t 找到。”错误
几个月来，我一直在使用 Ag-Grid 的企业功能“agSetColumnFilter”，没有出现任何问题。我正在使用以下版本: "ag-grid": "^17.1.1", "ag-

首页

博学

6Ren·AI

商城

python - sklearn模型数据转换错误: CountVectorizer - Vocabulary wasn't fitted