python - 交叉验证时关键错误不在索引中-6ren

python - 交叉验证时关键错误不在索引中

转载作者：太空狗更新时间：2023-10-30 01:47:34

27

4

我已经在我的数据集上应用了 svm。我的数据集是多标签的，这意味着每个观察值都有多个标签。

当 KFold 交叉验证时，它会引发错误not in index。

它显示从 601 到 6007 的索引不在索引中(我有 1...6008 个数据样本)。

这是我的代码:

   df = pd.read_csv("finalupdatedothers.csv")
categories = ['ADR','WD','EF','INF','SSI','DI','others']
X= df[['sentences']]
y = df[['ADR','WD','EF','INF','SSI','DI','others']]
kf = KFold(n_splits=10)
kf.get_n_splits(X)
for train_index, test_index in kf.split(X,y):
    print("TRAIN:", train_index, "TEST:", test_index)
    X_train, X_test = X.iloc[train_index], X.iloc[test_index]
    y_train, y_test = y.iloc[train_index], y.iloc[test_index]

SVC_pipeline = Pipeline([
                ('tfidf', TfidfVectorizer(stop_words=stop_words)),
                ('clf', OneVsRestClassifier(LinearSVC(), n_jobs=1)),
            ])

for category in categories:
    print('... Processing {} '.format(category))
    # train the model using X_dtm & y
    SVC_pipeline.fit(X_train['sentences'], y_train[category])

    prediction = SVC_pipeline.predict(X_test['sentences'])
    print('SVM Linear Test accuracy is {} '.format(accuracy_score(X_test[category], prediction)))
    print 'SVM Linear f1 measurement is {} '.format(f1_score(X_test[category], prediction, average='weighted'))
    print([{X_test[i]: categories[prediction[i]]} for i in range(len(list(prediction)))])

其实我不知道如何应用KFold交叉验证，我可以分别得到每个标签的F1分数和准确率。看了this和 this没有帮助我如何成功申请我的案子。

为了可重现，这是数据框的一个小样本最后七个特征是我的标签，包括 ADR、WD、...

,sentences,ADR,WD,EF,INF,SSI,DI,others
0,"extreme weight gain, short-term memory loss, hair loss.",1,0,0,0,0,0,0
1,I am detoxing from Lexapro now.,0,0,0,0,0,0,1
2,I slowly cut my dosage over several months and took vitamin supplements to help.,0,0,0,0,0,0,1
3,I am now 10 days completely off and OMG is it rough.,0,0,0,0,0,0,1
4,"I have flu-like symptoms, dizziness, major mood swings, lots of anxiety, tiredness.",0,1,0,0,0,0,0
5,I have no idea when this will end.,0,0,0,0,0,0,1

更新

当我按照 Vivek Kumar 所说的去做时它引发了错误

ValueError: Found input variables with inconsistent numbers of samples: [1, 5408]

在分类器部分。你知道如何解决吗？

在 stackoverflow 中有几个链接说明我需要 reshape 训练数据。我也这样做了，但没有成功 link谢谢:)

最佳答案

train_index, test_index 是基于行数的整数索引。但是 Pandas 索引不是那样工作的。较新版本的 pandas 在如何切片或从中选择数据方面更加严格。

您需要使用.iloc 来访问数据。更多信息是available here

这是你需要的:

for train_index, test_index in kf.split(X,y):
    print("TRAIN:", train_index, "TEST:", test_index)
    X_train, X_test = X.iloc[train_index], X.iloc[test_index]
    y_train, y_test = y.iloc[train_index], y.iloc[test_index]

    ...
    ...

    # TfidfVectorizer dont work with DataFrame, 
    # because iterating a DataFrame gives the column names, not the actual data
    # So specify explicitly the column name, to get the sentences

    SVC_pipeline.fit(X_train['sentences'], y_train[category])

    prediction = SVC_pipeline.predict(X_test['sentences'])

关于python - 交叉验证时关键错误不在索引中，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/51852551/

27

4

0

文章推荐： python - 如何从同步代码 Python 调用异步函数

文章推荐： python - Pandas - 用名称 ID 替换数字字符串

jsf - Bean 验证 VS JSF 验证
在 JSF2 应用程序中遇到验证属性的问题时，有两种主要方法。使用 Annotation 在 ManagedBean 上定义验证 @ManagedBean public class MyBean {
javascript - Jquery 验证。验证 "keyup"并在密码正确时隐藏表单
我想实现一个不常见的功能，我认为 jquery 验证插件将是最好的方法(如果您在没有插件的情况下建议和回答，我们也会欢迎)。我想在用户在输入字段中输入正确的单词后立即隐藏表单。我试过这个: $("
javascript - jQuery 验证 - 同一类的 NotEqual 验证
我有几个下拉菜单(类名为month_dropdown)，并且下拉菜单的数量不是恒定的。我怎样才能为它们实现 NotEqual 验证。我正在使用 jQuery 验证插件。这就是我写的 - jQuery
php - Javascript 中的 URL 验证 InstaGram 验证
我设法制作了这个网址验证代码并且它起作用了。但我面临着一个问题。我认为 stackoverflow 是获得解决方案的最佳场所。 function url_followers(){ var url=do
java - 验证/验证 Google Play 游戏服务 ID？
我目前正在使用后端服务，该服务允许用户在客户端应用程序上使用 Google Games 库登录。用户可以通过他们的 gplay ID 向我们发送信息，以便登录或恢复旧帐户。用户向我们发送以下内容，包
完整和部分 IP 的 python IP 验证 REGex 验证
我正在尝试验证输入以查看它是否是有效的 IP 地址(可能是部分地址)。可接受的输入:172、172.112、172.112.113、172.112.113.114 Not Acceptable 输入
regex - Mongoose 验证 : required : false, 验证 : regex, 问题与空值
我从 Mongoose 验证中得到这条消息: 'Validator failed for path phone with value ``' 这不应该发生，因为不需要电话。这是我的模型架构: var
openssl - 使用 .start_tls_s() 时如何强制 Python LDAP 验证/验证 SSL 证书
我一直在尝试使用Python-LDAP (版本 2.4.19)在 MacOS X 10.9.5 和 Python 2.7.9 下我想在调用 .start_tls_s() 后验证与给定 LDAP 服务
javascript - 在 VS 2017 中禁用一个项目的 ESLint/CSSLint/Javascript 验证/CSS 验证
我正在处理一个仅与 IE6 兼容的旧 javascript 项目(抱歉...)，我想仅在 VS 2017 中禁用此项目的 ESLint/CSLint/Javascript 验证/CSS 验证。我知道
spring - 使用 Hibernate Validator 验证 double 和 float 值 - bean 验证
我正在寻找一种方法来验证 Spring 命令 bean 中的 java.lang.Double 字段的最大值和最小值(一个值必须位于给定的值范围之间)，例如, public final class W
java - 无法执行目标org.apache.maven.plugins :maven-failsafe-plugin:2. 12:验证(验证)
我正在尝试在 springfuse(JavaEE 6 + Spring Framework (针对 Jetty、Tomcat、JBoss 等)) 和 maven 的帮助下构建我的 webapps 工作
Scalaz 验证
我试图在我们的项目中使用 scalaz 验证，但遇到了以下情况: def rate(username: String, params: Map[String, String]): Validation
YamlLint 验证
我有一个像这样的 Yaml 文件 name: hhh_aaa_bbb arguments: - !argument name: inputsss des
JavaScript 验证
我有一个表单，人们可以单击并向表单添加字段，并且我需要让它在单击时验证这些字段中的值。假设我单击它两次并获取 2 个独立的字段集，我需要旋转 % 以确保它在保存时等于 100。我已放入此函数以使其
JavaScript 验证
在我的页面中有一个选项可以创建新的日期字段输入框。用户可以根据需要创建尽可能多的“截止日期”和“起始日期”框。就像，日期_to1 || date_from1 日期到2 ||日期_from2 date
YamlLint 验证
我有一个像这样的 Yaml 文件 name: hhh_aaa_bbb arguments: - !argument name: inputsss des
Jquery 验证
有没有办法在动态字段上使用 jquery 验证表单。我想将其设置为必填字段我正在使用 Jsp 动态创建表单字段。喜欢等等...... 我想使用必需的表单字段验证此表单字段。最佳答
JavaScript 验证
嗨，任何人都可以通过提供 JavaScript 代码来帮助我验证用户名文本框不应包含数字，它只能包含一个字符。最佳答案使用正则表达式: (\d)+ 如果找到匹配项，则字符串中就有一个数字。关于J
JavaScript 验证
我有两个输入字段holidayDate和Description(id=tags) $(document).ready(function() {
JavaScript 验证 :
我遇到了这个问题，这些验证从电子邮件验证部分开始就停止工作。我只是不明白为什么即使经过几天的观察，只是想知道是否有人可以在这里指出我的错误？ Javascript部分: function valid

首页

博学

6Ren·AI

商城

python - 交叉验证时关键错误不在索引中