python - 值错误 : unknown is not supported in sklearn. RFECV-6ren

python - 值错误 : unknown is not supported in sklearn. RFECV

转载作者：太空狗更新时间：2023-10-29 22:26:37

25

4

我试图使用 rfecv 缩小与我的分类器真正相关的特征的数量。这是我写的代码

import sklearn
import pandas as p
import numpy as np
import scipy as sp
import pylab as pl
from sklearn import linear_model, cross_validation, metrics
from sklearn.svm import SVC
from sklearn.feature_selection import RFECV
from sklearn.metrics import zero_one_loss
from sklearn import preprocessing
#from sklearn.feature_extraction.text import CountVectorizer
#from sklearn.feature_selection import SelectKBest, chi2

modelType = "notext"

# ----------------------------------------------------------
# Prepare the Data
# ----------------------------------------------------------
training_data = np.array(p.read_table('F:/NYC/NYU/SM/3/SNLP/Project/Data/train.tsv'))
print ("Read Data\n")

# get the target variable and set it as Y so we can predict it
Y = training_data[:,-1]

print(Y)

# not all data is numerical, so we'll have to convert those fields
# fix "is_news":
training_data[:,17] = [0 if x == "?" else 1 for x in training_data[:,17]]

# fix -1 entries in hasDomainLink
training_data[:,14] = [0 if x =="-1" else x for x in training_data[:,10]]

# fix "news_front_page":
training_data[:,20] = [999 if x == "?" else x for x in training_data[:,20]]
training_data[:,20] = [1 if x == "1" else x for x in training_data[:,20]]
training_data[:,20] = [0 if x == "0" else x for x in training_data[:,20]]

# fix "alchemy category":
training_data[:,3] = [0 if x=="arts_entertainment" else x for x in training_data[:,3]]
training_data[:,3] = [1 if x=="business" else x for x in training_data[:,3]]
training_data[:,3] = [2 if x=="computer_internet" else x for x in training_data[:,3]]
training_data[:,3] = [3 if x=="culture_politics" else x for x in training_data[:,3]]
training_data[:,3] = [4 if x=="gaming" else x for x in training_data[:,3]]
training_data[:,3] = [5 if x=="health" else x for x in training_data[:,3]]
training_data[:,3] = [6 if x=="law_crime" else x for x in training_data[:,3]]
training_data[:,3] = [7 if x=="recreation" else x for x in training_data[:,3]]
training_data[:,3] = [8 if x=="religion" else x for x in training_data[:,3]]
training_data[:,3] = [9 if x=="science_technology" else x for x in training_data[:,3]]
training_data[:,3] = [10 if x=="sports" else x for x in training_data[:,3]]
training_data[:,3] = [11 if x=="unknown" else x for x in training_data[:,3]]
training_data[:,3] = [12 if x=="weather" else x for x in training_data[:,3]]
training_data[:,3] = [999 if x=="?" else x for x in training_data[:,3]]

print ("Corrected outliers data\n")

# ----------------------------------------------------------
# Models
# ----------------------------------------------------------
if modelType == "notext":
    print ("no text model\n")
    #ignore features which are useless
    X = training_data[:,list([3, 5, 6, 7, 8, 9, 10, 14, 15, 16, 17, 19, 20, 22, 25])]
    scaler = preprocessing.StandardScaler()
    print("initialized scaler \n")
    scaler.fit(X,Y)
    print("fitted train data and labels\n")
    X = scaler.transform(X)
    print("Transformed train data\n")
    svc = SVC(kernel = "linear")
    print("Initialized SVM\n")
    rfecv = RFECV(estimator = svc, cv = 5, loss_func = zero_one_loss, verbose = 1)
    print("Initialized RFECV\n")
    rfecv.fit(X,Y)
    print("Fitted train data and label\n")
    rfecv.support_
    print ("Optimal Number of features : %d" % rfecv.n_features_)
    savetxt('rfecv.csv', rfecv.ranking_, delimiter=',', fmt='%f')

在调用“rfecv.fit(X,Y)”时，我的代码从 metrices.py 文件中抛出错误“ValueError:不支持未知”

sklearn.metrics.metrics 中出现错误:

# No metrics support "multiclass-multioutput" format
    if (y_type not in ["binary", "multiclass", "multilabel-indicator", "multilabel-sequences"]):
        raise ValueError("{0} is not supported".format(y_type))

这是一个分类问题，目标值只有0或1。数据集可以在 Kaggle Competition Data 找到

如果有人能指出我哪里出错了，我将不胜感激。

最佳答案

RFECV 检查目标/训练数据是否属于binary、multiclass、multilabel-indicator 类型之一或 多标签序列:

'binary': y 包含 <= 2 个离散值并且是 1d 或一列矢量。
'multiclass': y 包含两个以上的离散值，不是一个sequence 的序列，并且是 1d 或列向量。
'mutliclass-multioutput': y 是一个二维数组，包含更多不是两个离散值，不是序列的序列，并且两者尺寸 > 1。
'multilabel-indicator': y是一个标签指示矩阵，一个数组至少有两列的二维，最多 2 个唯一的值(value)观。

而你的Y是unknown，即

'unknown':y 是类数组但不是以上任何一种，例如 3d 数组，或非序列对象数组。

原因是您的目标数据是字符串(格式为 "0" 和 "1")并加载了 read_table作为对象:

>>> training_data[:, -1].dtype
dtype('O')
>>> type_of_target(training_data[:, -1])
'unknown'

为了解决这个问题，你可以转换为int:

>>> Y = training_data[:, -1].astype(int)
>>> type_of_target(Y)
'binary'

关于python - 值错误 : unknown is not supported in sklearn. RFECV，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/20234851/

25

4

0

文章推荐： c# - 如何在 N :N relationship? 中手动加载相关实体

文章推荐：拆分数学计算的 Pythonic 方法

typescript - 不可分配给类型 'LocationDescriptor | ((location: Location) => LocationDescriptor)'
这是我的代码 14 20 {"Confirm Email"} 21 在第 17 行我得到错误 Type '{ pathname: string; user: { em
typescript - 不可分配给类型 'LocationDescriptor | ((location: Location) => LocationDescriptor)'
这是我的代码 14 20 {"Confirm Email"} 21 在第 17 行我得到错误 Type '{ pathname: string; user: { em
Java KeyEvent - 为什么 "Unknown"!= "Unknown"？
这个问题已经有答案了: How do I compare strings in Java? (23 个回答) 已关闭 8 年前。为什么 KeyEvent.getKeyText(0).substrin
Rust wasm32-unknown-unknown 数学函数不链接
我正在尝试 Rust 的新 wasm32-unknown-unknown 目标，我在调用数学函数(例如 sin、cos、exp、atan2)时遇到问题。 cargo .toml: [package]
java - 项目构建错误 : Invalid packaging for parent POM [unknown-group-id]:[unknown-artifact-id]:[unknown-version], 必须是 "pom"但实际上是 "jar"
当我为 spring-boot 创建启动项目时，我在 pom 文件中收到此错误。这只是为了创建一个基本的 Spring Boot 项目 Project build error: Invalid pac
javascript - 属性管道不适用于类型 "OperatorFunction"
我已经订阅了我想要传输的数据。但不知何故它不起作用。我收到此错误: The property pipe is not available for type "OperatorFunction" 这是我
postgresql - 函数 to_char(unknown, unknown) 不是唯一的
运行以下查询时。select * from surgerys where to_char(dt_surgery ,'DD-MM-YYYY' ) = to_char('12-02-2012','DD-M
java - 不支持从 UNKNOWN 到 UNKNOWN 的转换
我在运行存储过程时遇到以下异常: com.microsoft.sqlserver.jdbc.SQLServerException:不支持从 UNKNOWN 到 UNKNOWN 的转换。过程定义如下:
python - 值错误 : Unknown label type: 'unknown'
我尝试运行以下代码。顺便说一句，我对 python 和 sklearn 都是新手。 import pandas as pd import numpy as np from sklearn.linear
typescript - `unknown` 类型被强制为 `string` ，尽管 `unknown` 的官方语义
我已经阅读了关于未知类型的官方文档，但我很难真正理解它是如何工作的。人们可以在文档中读到:“在没有首先断言或缩小到更具体的类型之前，不允许对未知进行任何操作。” 但如果我有这个功能: const f
java - Hadoop设置中的“unknown.prolexic.com: unknown error”
我正在尝试在Mac OS中设置Hadoop 2.6.0 我正在关注这篇文章: http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hado
Docker 从 Nexus 代理拉取来自守护进程 : unknown: unknown 的错误响应
配置 Nexus docker 注册表和代理“dockerhub-proxy”后，如下所述: https://help.sonatype.com/repomanager3/formats/docker
python - 值错误 : Unknown label type: 'unknown' in sklearn
我收到此错误 - “ValueError:未知标签类型:'unknown'” 我已经在网上搜索但无法摆脱这个错误，顺便说一句，我是 python 的新手:) 我的数据有 5 行 22 列，最后一列是标
openshift - API 错误 (500) : manifest unknown: manifest unknown
使用 SHA256 摘要标识符拉取图像失败最佳答案不幸的是，这是 DockerHub 删除 Docker 1.9 守护进程的向后兼容性的副作用。当使用 Docker 1.10 推送图像时，较旧的
postgresql [42883] 错误 : function to_tsvector ("unknown", "unknown") 不存在
我是 postgresql 的新手，正在尝试使用全文搜索 to_tsvector但是我遇到了错误。 SQL 和错误 SELECT to_tsvector('english', 'The quick b
laravel - 获得 95% 发出未命名的兼容插件错误 : UNKNOWN: unknown error, 打开
每当我这样做时 npm run watch ，第一次编译工作正常 - 但经过几次编译后，我最终会得到这个错误: 95% emitting unnamed compat pluginError: UNK
node.js - chokidar : Error: UNKNOWN: unknown error, watch 的错误
在一个新的 Angular 应用程序中，我收到以下错误:Error from chokidar : Error: UNKNOWN: unknown error, watch我已经删除并重新安装 nod
TypeScript:将 Container> 转换为 Maybe>
使用 Typescipt 4.x.x 我写了一些代码来实现其他语言 Elm/Rust/Haskell 中常用的 Maybe/Option 类型。我想写一个可以接受映射类型的通用函数 type MyM
typescript - (参数) state : unknown Object is of type 'unknown' . redux TS
const submitted = useSelector((state) => state.post.submitted) 对于上面的状态。我得到错误: (参数)状态:未知对象的类型为“未知”。这
docker - 什么是 "manifest blob unknown: blob unknown to registry"错误
我正在尝试将多架构 docker 镜像推送到 docker hub 并遇到错误(在 https://github.com/docker/distribution/issues/3100 处打开了 do

首页

博学

6Ren·AI

商城

python - 值错误 : unknown is not supported in sklearn. RFECV