machine-learning - sklearn : Using CountVectorizer object to get a feature vector of a new string-6ren

machine-learning - sklearn : Using CountVectorizer object to get a feature vector of a new string

转载作者：行者123 更新时间：2023-11-30 08:33:04

因此，我通过执行以下几行创建一个 CountVectorizer 对象。

count_vectorizer = CountVectorizer(binary='true')
data = count_vectorizer.fit_transform(data)

现在我有一个新字符串，我想将该字符串映射到从 CountVectorizer 获得的 TDM 矩阵。因此，我期望输入到 TDM 的字符串是相应的文档术语向量。

我试过了，

count_vectorizer.transform([string])

出现错误，AttributeError:未找到转换添加堆栈跟踪的一部分，它是一个很长的堆栈跟踪，因此我只添加相关位，即跟踪的最后几行。

  File "/Users/ankit/Desktop/geny/APIServer/RUNTIME/src/controller/sentiment/Sentiment.py", line 29, in computeSentiment
    vec = self.models[model_name]["vectorizer"].transform([string])
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/scipy/sparse/base.py", line 440, in __getattr__
    raise AttributeError(attr + " not found")

请指教。

谢谢

Ankit S

最佳答案

您展示的示例无法重现 - 这里的字符串变量是什么？然而下面的代码似乎工作得很好:-

from sklearn.feature_extraction.text import CountVectorizer

data = ["aa bb cc", "cc dd ee"]
count_vectorizer = CountVectorizer(binary='true')
data = count_vectorizer.fit_transform(data)

# Check if your vocabulary is being built perfectly
print count_vectorizer.vocabulary_

# Trying a couple new string with added new word. new word should be ignored
newData = count_vectorizer.transform(["aa dd mm", "aa bb"])
print newData

# You can get the array by writing  
print newData.toarray()

enter image description here

嗯，count_vectorizer.transform() 接受字符串列表 - 而不是单个字符串。如果变换拟合不起作用，它应该会引发“ValueError:词汇未拟合或为空!”如果出现此类错误，请粘贴整个回溯堆栈(异常堆栈)。没有人能看到 AttributeError 是从哪里来的——你的代码或者 sklearn 中的一些内部错误。

关于machine-learning - sklearn : Using CountVectorizer object to get a feature vector of a new string，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/30696870/

文章推荐： machine-learning - 非典实现

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

machine-learning - sklearn : Using CountVectorizer object to get a feature vector of a new string