python - 同时使用 sample_weight 和 class

python - 同时使用 sample_weight 和 class_weight

转载作者：太空狗更新时间：2023-10-30 00:47:37

26

4

我的数据集已经有加权示例。在这个二元分类中，与第二类相比，我也有更多的第一类。

我可以同时使用 sample_weight 并在 model.fit() 函数中使用 class_weight 进一步重新加权吗？

或者我是否首先创建一个新的 new_weights 数组并将其作为 sample_weight 传递给 fit 函数？

编辑:

为了进一步说明，我已经为数据集中的每个样本设置了单独的权重，更复杂的是，第一类样本权重的总和远远超过第二类样本权重的总和。

例如我目前有:

y = [0,0,0,0,1,1]

sample_weights = [0.01,0.03,0.05,0.02, 0.01,0.02]

所以类“0”的权重总和是0.11，类“1”是0.03。所以我应该:

class_weight = {0 : 1. , 1: 0.11/0.03}

我需要同时使用 sample_weight 和 class_weight 功能。如果一个覆盖另一个，那么我将不得不创建新的 sample_weights，然后使用 fit() 或 train_on_batch()。

所以我的问题是，我可以同时使用两者，还是一个覆盖另一个？

最佳答案

如果你愿意，你当然可以同时做这两件事，关键是你是否需要。根据kerasdocs :

class_weight: Optional dictionary mapping class indices (integers) to a weight (float) value, used for weighting the loss function (during training only). This can be useful to tell the model to "pay more attention" to samples from an under-represented class.

sample_weight: Optional Numpy array of weights for the training samples, used for weighting the loss function (during training only). You can either pass a flat (1D) Numpy array with the same length as the input samples (1:1 mapping between weights and samples), or in the case of temporal data [...].

鉴于您提到您“第一类比第二类多得多”，我认为您应该选择 class_weight 参数。您可以在此处指示数据集呈现的比率，以便您可以补偿不平衡的数据类。当您想为每个数据元素定义权重或重要性时，sample_weight 更适合。

例如，如果您通过:

class_weight = {0 : 1. , 1: 50.}

您会说 1 类中的每个样本都算作 0 类中的 50 个样本，因此为 类中的元素赋予更多“重要性” 1(因为你肯定有更少的样本)。您可以自定义它以满足您自己的需要。更多关于不平衡数据集的信息 this好问题。

注意:要进一步比较这两个参数，请记住将 class_weight 作为 {0:1., 1:50.} 传递相当于将 sample_weight 作为 [1.,1.,1.,...,50.,50.,...] 传递，假设您有样本其类 [0,0,0,...,1,1,...]。

正如我们所见，在这种情况下使用 class_weight 更为实用，而 sample_weight 可用于您实际想要给出“重要性”分别对每个样本。如果情况需要，也可以同时使用两者，但必须牢记其累积效应。

编辑:根据您的新问题，挖掘 Keras source code似乎确实 sample_weights 覆盖了 class_weights，这是在 _standarize_weigths 方法上执行的代码片段(第 499 行):

if sample_weight is not None:
    #...Does some error handling...
    return sample_weight #simply returns the weights you passed

elif isinstance(class_weight, dict):
    #...Some error handling and computations...
    #Then creates an array repeating class weight to match your target classes
    weights = np.asarray([class_weight[cls] for cls in y_classes
                          if cls in class_weight])

    #...more error handling...
    return weights

这意味着您只能使用其中之一，而不能同时使用两者。因此，您确实需要将 sample_weights 乘以补偿不平衡所需的比率。

更新:截至本次编辑(2020 年 3 月 27 日)，查看 source code training_utils.standardize_weights() 我们可以看到它现在支持两者 class_weights 和 sample_weights:

Everything gets normalized to a single sample-wise (or timestep-wise) weight array. If both sample_weights and class_weights are provided, the weights are multiplied together.

关于python - 同时使用 sample_weight 和 class_weight，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/48173168/

26

4

0

文章推荐： python - Django，在prefetch_related之后更新对象

文章推荐： c# - 从基类访问应用于派生类中方法的属性

文章推荐： c# - 使用 TryParse 设置对象属性值

文章推荐： python - 递归和随机分组列表

python - 在 sklearn RandomForestClassifier 中， class_weight=None 相当于 class_weight ="balanced_subsample"吗？
documentation中的措辞看起来 None 和“balanced_subsample”是等价的，但我想确保情况确实如此。最佳答案文档清楚地表明它们不等效: class_weight=Non
python - ValueError : `class_weight` must contain all classes in the data. 类{1,2,3}存在于数据中但不存在于 `class_weight`
ValueError: class_weight must contain all classes in the data. The classes {1, 2, 3} exist in the da
python - class_weight = 'balanced' 相当于朴素贝叶斯
我正在使用两个不同的分类器对相同的不平衡数据执行一些(二进制)文本分类。我想比较两个分类器的结果。使用 sklearns 逻辑回归时，我可以选择设置 class_weight = 'balanced
python - 随机森林中的 class_weight 超参数改变混淆矩阵中的样本量
我目前正在研究一个随机森林分类模型，其中包含 24,000 个样本，其中 20,000 个样本属于 class 0其中 4,000 个属于 class 1 。我做了一个train_test_split
machine-learning - 随机森林分类器 class_weight
我有一个不平衡的数据集，其中 200000 个描述属于 0 类，大约 10000 个描述属于 1 类。但是，在我的训练数据集中，我有相同数量的“正”和“负”样本，每个样本大约 8000 个。所以现在我
tensorflow - 多标签二元分类中的 Keras class_weight
在使用 class_weight 解决我的多标签问题时遇到问题。也就是说，每个标签不是0就是1，但是每个输入样本有很多标签。代码(带有用于 MWE 目的的随机数据): import tensorfl
python - ScikitLearn 随机森林中的欠采样与 class_weight
我正在将 ScikitLearn 的随机森林应用于极度不平衡的数据集(比率为 1:10 000)。我可以使用 class_weigth='balanced' 参数。我看过它相当于欠采样。但是，这种方
python - 我可以在训练期间更改 class_weight 吗？
我想在 Keras 训练期间更改我的 class_weight。我使用了 fit_generator 和 Callback 方法，如下所示。 model.fit_generator(
python - Keras:class_weight 实际上试图平衡什么？
我的数据具有极度的类别不平衡。约99.99%的样本为阴性；积极因素(大致)平均分配给其他三个类别。我认为我正在训练的模型基本上一直在预测多数类(class)。出于这个原因，我正在尝试对类(class)
python - Sklearn GridSearchCV，class_weight 因未知原因不工作 :(
尝试让 class_weight 开始。我知道其余代码有效，只是 class_weight 给我错误: parameters_to_tune = ['min_samples_split':[2
python - 随机森林 Python 中的 Class_Weight
我目前正在尝试改变随机森林分类器的阈值以绘制 ROC 曲线。我的印象是，对随机森林执行此操作的唯一方法是使用 class_weight 参数。我已经能够成功地做到这一点，提高和降低精确度、召回率、真阳
python - Keras 如何使用 class_weight 参数？
Keras 使用 class_weight 参数来处理不平衡的数据集。这是我们可以在 doc 中找到的内容: Optional dictionary mapping class indices (i
python - sklearn 分类的 class_weight 字典格式
尝试对大约 2500*~4000 个大型特征进行分类，并在训练数据中附上置信文档。我尝试使用置信度值作为分类器的 class_weight 参数，但无法理解 class_weight 所需的字典格式
python - LinearSVC 中参数 class_weight 的最佳值是多少？
我有一个多标签数据(有些类有 2 个标签，有些类有 10 个标签)，并且我的模型对于平衡值和无值过度拟合。为 class_weight 参数设置的最佳值是多少。 from sklearn.svm im
python - class_weights 如何应用于 sklearn 逻辑回归？
我对 sklearn 如何应用我们提供的类权重感兴趣。 documentation没有明确说明类权重的应用位置和方式。阅读源代码也没有帮助(似乎 sklearn.svm.liblinear 用于优化，
python - 同时使用 sample_weight 和 class_weight
我的数据集已经有加权示例。在这个二元分类中，与第二类相比，我也有更多的第一类。我可以同时使用 sample_weight 并在 model.fit() 函数中使用 class_weight 进一步重
python - 如何以简单的方式在 Keras 中分配 class_weight？
谁能告诉我当数据集不平衡时在 Keras 中应用 class_weight 的最简单方法是什么？我的目标中只有两个类。谢谢。最佳答案 fit() 函数的class_weight 参数是一个将类别
python - Keras:one-hot编码的类权重(class_weight)
我想在 keras model.fit 中使用 class_weight 参数来处理不平衡的训练数据。通过查看一些文档，我了解到我们可以像这样传递一个字典: class_weight = {0 : 1
python - 在多类分类问题中将 class_weights 参数与 Keras 一起使用时出错
此问题已在其他论坛中提出，我尝试了他们的变体但无济于事:class_weight for imbalanced data - Keras how to set class-weights for im
TensorFlow - 在 fit_generator 中使用 class_weights 导致内存泄漏
在 TensorFlow 中，当在 fit_generator 中使用 class_weights 时，会导致训练过程不断消耗越来越多的 CPU RAM，直到耗尽。每个纪元后内存使用量都会逐步增加。请

首页

博学

6Ren·AI

商城

python - 同时使用 sample_weight 和 class_weight