python - 用户警告 : Label not :NUMBER: is present in all training examples-6ren

python - 用户警告 : Label not :NUMBER: is present in all training examples

转载作者：太空狗更新时间：2023-10-29 20:19:44

我正在进行多标签分类，我尝试为每个文档预测正确的标签，这是我的代码:

mlb = MultiLabelBinarizer()
X = dataframe['body'].values 
y = mlb.fit_transform(dataframe['tag'].values)

classifier = Pipeline([
    ('vectorizer', CountVectorizer(lowercase=True, 
                                   stop_words='english', 
                                   max_df = 0.8, 
                                   min_df = 10)),
    ('tfidf', TfidfTransformer()),
    ('clf', OneVsRestClassifier(LinearSVC()))])

predicted = cross_val_predict(classifier, X, y)

运行我的代码时，我收到多个警告:

UserWarning: Label not :NUMBER: is present in all training examples.

当我打印出预测标签和真实标签时，大约有一半的文档预测标签为空。

为什么会发生这种情况，是否与它在训练运行时打印出的警告有关？我怎样才能避免那些空洞的预测？

EDIT01:当使用 LinearSVC() 以外的其他估算器时，也会发生这种情况。

我试过 RandomForestClassifier()，它也给出了空预测。奇怪的是，当我使用 cross_val_predict(classifier, X, y, method='predict_proba') 来预测每个标签的概率时，而不是二元决策 0/1，总是至少有一个标签对于给定文档，每个预测集的概率 > 0。所以我不知道为什么这个标签没有选择二元决策？还是二元决策的评估方式不同于概率？

EDIT02:我找到了一个旧的 post OP正在处理类似的问题。这是同一个案例吗？

最佳答案

Why is this happening, is it related to warnings it prints out while training is running?

问题可能是某些标签只出现在少数文档中(查看 this thread 了解详细信息)。当您将数据集拆分为训练和测试以验证您的模型时，训练数据中可能会丢失某些标签。设 train_indices 为包含训练样本索引的数组。如果训练样本中没有出现特定标签(索引 k)，则指标矩阵 y[train_indices] 的第 k 列中的所有元素] 是零。

How can I avoid those empty predictions?

在上述场景中，分类器将无法可靠地预测测试文档中的第 k 标签(下一段将详细介绍)。因此，您不能相信 clf.predict 做出的预测，您需要自己实现预测功能，例如使用 clf.decision_function 返回的决策值作为在 this answer 中建议。

So I don't know why is this label not chosen with binary decisioning? Or is binary decisioning evaluated in different way than probabilities?

在包含许多标签的数据集中，大多数标签的出现频率通常很低。如果将这些低值馈送到二元分类器(即进行 0-1 预测的分类器)，则分类器很可能会为所有文档的所有标签选择 0。

I have found an old post where OP was dealing with similar problem. Is this the same case?

是的，绝对是。那个人面临着与您完全相同的问题，他的代码与您的非常相似。

演示

为了进一步解释这个问题，我使用模拟数据详细说明了一个简单的玩具示例。

Q = {'What does the "yield" keyword do in Python?': ['python'],
     'What is a metaclass in Python?': ['oop'],
     'How do I check whether a file exists using Python?': ['python'],
     'How to make a chain of function decorators?': ['python', 'decorator'],
     'Using i and j as variables in Matlab': ['matlab', 'naming-conventions'],
     'MATLAB: get variable type': ['matlab'],
     'Why is MATLAB so fast in matrix multiplication?': ['performance'],
     'Is MATLAB OOP slow or am I doing something wrong?': ['matlab-oop'],
    }
dataframe = pd.DataFrame({'body': Q.keys(), 'tag': Q.values()})    

mlb = MultiLabelBinarizer()
X = dataframe['body'].values 
y = mlb.fit_transform(dataframe['tag'].values)

classifier = Pipeline([
    ('vectorizer', CountVectorizer(lowercase=True, 
                                   stop_words='english', 
                                   max_df=0.8, 
                                   min_df=1)),
    ('tfidf', TfidfTransformer()),
    ('clf', OneVsRestClassifier(LinearSVC()))])

请注意，我设置了 min_df=1，因为我的数据集比你的小得多。当我运行以下句子时:

predicted = cross_val_predict(classifier, X, y)

我收到一堆警告

C:\...\multiclass.py:76: UserWarning: Label not 4 is present in all training examples.
  str(classes[c]))
C:\\multiclass.py:76: UserWarning: Label not 0 is present in all training examples.
  str(classes[c]))
C:\...\multiclass.py:76: UserWarning: Label not 3 is present in all training examples.
  str(classes[c]))
C:\...\multiclass.py:76: UserWarning: Label not 5 is present in all training examples.
  str(classes[c]))
C:\...\multiclass.py:76: UserWarning: Label not 2 is present in all training examples.
  str(classes[c]))

和以下预测:

In [5]: np.set_printoptions(precision=2, threshold=1000)    

In [6]: predicted
Out[6]: 
array([[0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 1],
       [0, 0, 0, 0, 0, 0, 1],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 1, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0]])

条目全部为0的行表示没有为相应文档预测标签。

解决方法

为了便于分析，让我们手动验证模型，而不是通过 cross_val_predict。

import warnings
from sklearn.model_selection import ShuffleSplit

rs = ShuffleSplit(n_splits=1, test_size=.5, random_state=0)
train_indices, test_indices = rs.split(X).next()

with warnings.catch_warnings(record=True) as received_warnings:
    warnings.simplefilter("always")
    X_train, y_train = X[train_indices], y[train_indices]
    X_test, y_test = X[test_indices], y[test_indices]
    classifier.fit(X_train, y_train)
    predicted_test = classifier.predict(X_test)
    for w in received_warnings:
        print w.message

执行上面的代码片段时会发出两个警告(我使用上下文管理器来确保捕捉到警告):

Label not 2 is present in all training examples.
Label not 4 is present in all training examples.

这与索引2和4的标签在训练样本中缺失的事实是一致的:

In [40]: y_train
Out[40]: 
array([[0, 0, 0, 0, 0, 1, 0],
       [0, 1, 0, 0, 0, 0, 0],
       [0, 1, 0, 1, 0, 0, 0],
       [1, 0, 0, 0, 0, 0, 1]])

对于某些文档，预测为空(那些文档对应于 predicted_test 中全为零的行):

In [42]: predicted_test
Out[42]: 
array([[0, 1, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0],
       [0, 1, 0, 1, 0, 0, 0]])

要克服这个问题，您可以像这样实现自己的预测函数:

def get_best_tags(clf, X, lb, n_tags=3):
    decfun = clf.decision_function(X)
    best_tags = np.argsort(decfun)[:, :-(n_tags+1): -1]
    return lb.classes_[best_tags]

通过这样做，每个文档总是分配有最高置信度分数的 n_tag 标签:

In [59]: mlb.inverse_transform(predicted_test)
Out[59]: [('matlab',), (), (), ('matlab', 'naming-conventions')]

In [60]: get_best_tags(classifier, X_test, mlb)
Out[60]: 
array([['matlab', 'oop', 'matlab-oop'],
       ['oop', 'matlab-oop', 'matlab'],
       ['oop', 'matlab-oop', 'matlab'],
       ['matlab', 'naming-conventions', 'oop']], dtype=object)

关于python - 用户警告 : Label not :NUMBER: is present in all training examples，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/42821315/

文章推荐： c++ - 常量操作数与任何算术运算符的顺序是否会影响优化？

文章推荐： c# - 翻页的WPF动画创意

文章推荐： C++未知调用约定

文章推荐： c++ - 如何使用 const char 初始化 string_view 的默认值

ios - "Attempt to present while already presenting"检查后还出现？
let appDelegate = UIKit.UIApplication.shared.delegate! if let tabBarController = appDelegate
Android Presentation 类 - 如何动态更改 Presentation View
我的演示文稿终于成功了。我的第一个屏幕有一个主要的 activity，第二个屏幕有一个 Presentation。我的问题是，我无法更改演示 View 中的内容。为什么我不能在第二个屏幕上显示演示文
swift - 类型 'UIView' 的值没有成员 'present' - 如何调用 'present'？
这个问题在这里已经有了答案: ios, getting a pointer to a controller from a view (5 个答案) 关闭 4 年前。我有一个 uiview 文件，我
mvp - 模型/ View /演示者 : presenter-to-presenter communication
我有一个表示小部件的 View 类和一个随附的演示者类。我还有一个 View 类，用于拥有小部件的窗口，以及窗口 View 的随附演示者。窗口操作小部件，所以我需要窗口展示器与小部件展示器进行通信。可
ios - 警告 : Attempt to present uiimagepickercontroller while a presentation is in progress
关闭相机后。它给了我这个警告:在演示过程中尝试继续演示! - (void)imagePickerControllerDidCancel:(UIImagePickerController *)picke
ios - swift - 警告 : Attempt to present * on * while a presentation is in progress
由于这个错误，我无法切换到另一个 View Controller 。我想在成功扫描二维码后切换到另一个 View Controller 。 2015-01-27 17:59:16.093 *[5416
ios - "Warning: Attempt to present * on * which is already presenting *"的断点
有时它会发生 - 从代码的不同地方 - 我们想要做 UIViewController.presentViewController(a, b, c)，有时我们已经在呈现了，在这种情况下我们得到: 警告:
ios - 警告 : Attempt to present * on * which is already presenting (null)
这是我的第一个 iOS 应用程序。所以我有一个 UIVIewController 和一个 UITableView，其中我按顺序集成了一个 UISearchBar 和一个 UISearchContro
xsd - XML 架构 : require an element to be present when another is present
我的模式允许一个特定的元素是可选的，但是当我稍后在文件中遇到一个不同的也是可选的元素时，我需要前一个元素存在。我如何通过 XSD 确保这一点？例子: true 应该是有效的，
iphone - 警告 : Attempt to present on while a presentation is in progress
我正在使用 xcode 4.5.2 并为相机按钮创建了自己的 OverlayViewController。现在，当我的委托(delegate) Controller - TakePhotoViewC
theory - Model-View-Presenter 中的 Presenter 是否创建 View ？
如何在 MVP 中创建 View ？ Presenter 是否总是创建它们(除了 subview 的 View 之外)？或者它是一个单独的第三方组件或应用程序或创建它们的东西？我们还要补充一点，我可
ios - 忽略 "Warning: Attempt to present ... while presentation in progress"是否安全？
当我调用viewController.presentViewController时，如果已经存在另一个 Controller ，则会出现以下警告。 Warning: Attempt to presen
ios - 警告 : Attempt to present UINavigationController on UIViewController which is already presenting
我有一个关于 React Native 的模块，并在这个问题的标题中抛出错误。这是模块的代码，我不知道 swift 或 Objective-C，所以我需要你像“swift for idiots”一样解
ios7 - 警告 : Attempt to present UINavigationController on UINavigationController while a presentation is in progress
我在 didSelectRowAtIndexPath 中选择一个 segue 时收到上面的警告。这发生在 iPad 上。 iPhone 给出了不同的警告，我会看看是否可以解决这个问题。我在方法中确实
ios - Storyboard和续集 : How does the presenting VC know when the presented VC dismisses itself?
我的应用程序中有一个包含两个场景的 StoryBoard - 它使用自动转场(在 IB 中创建)。当在第一个 UIViewController 中单击按钮时，第二个 UIViewController
swift - 如何从成功登录到第二页 - "warning attempt to present on while a presentation is in progress"错误
如何在成功验证登录后转到我的第二页？我已经从登录页面 View Controller (不是登录按钮)拉出一个 segue 到下一页，并将该 segue 命名为“nextPage”。 (如果我从登录
ios - segueForUnwindingToViewController 导致 "Warning: Attempt to present <...> on <...> which is already presenting <...>"
我正在构建一个应用程序，最近发现了由传统转场引起的巨大内存泄漏。因此我了解了 unwind segue。如果我简单地使用，一切都很好: @IBAction func prepareForUnw
css - 添加一个按钮到 <%= link_to_add_fields "Add A Present", f, :presents %> in Rails?
我不知道如何将按钮添加到 link_to_add_fields 中在我制作的 Rails 应用程序中。我尝试在各处添加 class: "btn btn-mini btn-info"，但我一直收到
ios - 错误 : Attempt to present
大家好，感谢阅读。我正在制作一个应用程序，您可以在其中拍照并从相机胶卷中检索照片并将其显示在 View 中。问题是，当我尝试使用调出相机胶卷的按钮时，出现如上标题中的错误“演示正在进行中”。如果我注

ruby-on-rails - rails : validating a field is present only if another is present
我有一个模型，其中有两个字段在技术上可以为空。字段名称是 :is_activated 和 :activated_at。 :activated_at 仅在 :is_activated 设置为 true

太空狗

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - 用户警告 : Label not :NUMBER: is present in all training examples