python - 一种热编码期间的 RunTimeError-6ren

python - 一种热编码期间的 RunTimeError

转载作者：行者123 更新时间：2023-12-04 13:26:41

我有一个数据集，其中类值从 -2 到 2 步 (i.e., -2,-1,0,1,2)其中 9 标识未标记的数据。
使用一种热编码

self._one_hot_encode(labels)

我收到以下错误: RuntimeError: index 1 is out of bounds for dimension 1 with size 1由于

self.one_hot_labels = self.one_hot_labels.scatter(1, labels.unsqueeze(1), 1)

错误应该从 [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 9, 1, 1, 1, 1, 1, 1] 引发，其中我在映射设置中的 9 等于索引 9 到 1。我不清楚如何修复它，即使经过过去的问题和类似问题的答案(例如， index 1 is out of bounds for dimension 0 with size 1 )。
错误涉及的部分代码如下:

def _one_hot_encode(self, labels):
    # Get the number of classes
    classes = torch.unique(labels)
    classes = classes[classes != 9] # unlabelled 
    self.n_classes = classes.size(0)

    # One-hot encode labeled data instances and zero rows corresponding to unlabeled instances
    unlabeled_mask = (labels == 9)
    labels = labels.clone()  # defensive copying
    labels[unlabeled_mask] = 0
    self.one_hot_labels = torch.zeros((self.n_nodes, self.n_classes), dtype=torch.float)
    self.one_hot_labels = self.one_hot_labels.scatter(1, labels.unsqueeze(1), 1)
    self.one_hot_labels[unlabeled_mask, 0] = 0

    self.labeled_mask = ~unlabeled_mask

def fit(self, labels, max_iter, tol):
    
    self._one_hot_encode(labels)

    self.predictions = self.one_hot_labels.clone()
    prev_predictions = torch.zeros((self.n_nodes, self.n_classes), dtype=torch.float)

    for i in range(max_iter):
        # Stop iterations if the system is considered at a steady state
        variation = torch.abs(self.predictions - prev_predictions).sum().item()
        

        prev_predictions = self.predictions
        self._propagate()

数据集示例:

ID  Target  Weight  Label   Score   Scale_Cat   Scale_num
0   A   D   65.1    1   87  Up  1
1   A   X   35.8    1   87  Up  1
2   B   C   34.7    1   37.5    Down    -2
3   B   P   33.4    1   37.5    Down    -2
4   C   B   33.1    1   37.5    Down    -2
5   S   X   21.4    0   12.5    NA  9

我用作引用的源代码在这里: https://mybinder.org/v2/gh/thibaudmartinez/label-propagation/master?filepath=notebook.ipynb
错误的完整跟踪:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-126-792a234f63dd> in <module>
      4 label_propagation = LabelPropagation(adj_matrix_t)
----> 6 label_propagation.fit(labels_t) # causing error
      7 label_propagation_output_labels = label_propagation.predict_classes()
      8 

<ipython-input-115-54a7dbc30bd1> in fit(self, labels, max_iter, tol)
    100 
    101     def fit(self, labels, max_iter=1000, tol=1e-3):
--> 102         super().fit(labels, max_iter, tol)
    103 
    104 ## Label spreading

<ipython-input-115-54a7dbc30bd1> in fit(self, labels, max_iter, tol)
     58             Convergence tolerance: threshold to consider the system at steady state.
     59         """
---> 60         self._one_hot_encode(labels)
     61 
     62         self.predictions = self.one_hot_labels.clone()

<ipython-input-115-54a7dbc30bd1> in _one_hot_encode(self, labels)
     42         labels[unlabeled_mask] = 0
     43         self.one_hot_labels = torch.zeros((self.n_nodes, self.n_classes), dtype=torch.float)
---> 44         self.one_hot_labels = self.one_hot_labels.scatter(1, labels.unsqueeze(1), 1)
     45         self.one_hot_labels[unlabeled_mask, 0] = 0
     46 

RuntimeError: index 1 is out of bounds for dimension 1 with size 1

最佳答案

我浏览了您的笔记本(我认为您将 9 更改为 -1 以便运行)并看到了这部分代码:

# Learn with Label Propagation
label_propagation = LabelPropagation(adj_matrix_t)
print("Label Propagation: ", end="")
label_propagation.fit(labels_t)
label_propagation_output_labels = label_propagation.predict_classes()

最终调用:

self.one_hot_labels = self.one_hot_labels.scatter(1, labels.unsqueeze(1), 1)

是哪里出错了。
花点时间在 scatter 上阅读 pytorch 手册: torch Scatter我们了解到，对于 scatter 来说，理解 dim、index、src 和 self 矩阵很重要。对于一种热编码，dim=1 或 0 无关紧要，我们的 src 矩阵为 1(稍后我们将对此进行更多研究)。您现在正在使用 [40,1] 的索引矩阵和 [40,5] 的结果(自身)矩阵调用维度 1 上的 scatter。
我在这里看到两个问题:

您正在使用文字类别虚拟变量 (-2,-1,0,1,2) 作为索引矩阵中的编码索引。这将导致 scatter 在 src 矩阵中搜索这些索引。 这是来自 的索引越界的地方

您提到未标记的有 -2、-1、0、1、2 和 9 类 6 类，但您是 5 个类的一种热编码。 (是的，我知道您希望未标记的类全部为零，但使用 scatter 实现这一点有点困难。我稍后会解释)。

那么我们如何解决这个问题呢？
问题 1:让我们从一个小例子开始:

index = torch.tensor([[5],[0],[3],[5],[1],[4]]); print(index.shape); print(index)
result = torch.zeros(6, 6, dtype=src.dtype).scatter_(1, index, src); print(result.shape); print(result)

这会给我们

torch.Size([6, 1])
tensor([[5],
        [0],
        [3],
        [5],
        [1],
        [4]])
torch.Size([6, 6])
tensor([[0, 0, 0, 0, 0, 1],
        [1, 0, 0, 0, 0, 0],
        [0, 0, 0, 1, 0, 0],
        [0, 0, 0, 0, 0, 1],
        [0, 1, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0]])

索引矩阵是 6 个观测值和 1 个观测值(类别)
Self 矩阵是 6 个观测值，具有 6 个类别 1 的热编码向量
scatter(dim=1) 创建 self 矩阵的方式是 torch 首先检查行(观察)，然后将该行的值更改为存储在同一行但在列的 src 矩阵中的值的值存储在索引中的值。

self[i][index[i][j][k]][k] = src[i][j][k]

因此，在您的情况下，您试图将 1 的值应用到 self[40,1] 中 index[0] 列(等于 1)的一行中。给你问题中的错误。虽然我检查了你的笔记本，错误是
对于大小为 5 的维度 1，索引 -1 超出范围。它们都是相同的根本原因。
问题 2:单热编码
在这种情况下，使用冷编码进行完整的单热而不是单热更容易。原因是对于单热编码和冷编码，您需要在 src 矩阵中为每个未标记的观察创建一个 0 值。这比仅对 src 使用 1 更痛苦。另请阅读此链接: Is it valid to have full zeros for OHE?我认为对每个类别使用 one-hot 更有意义。
因此，对于第二个问题，我们只需要简单地将类别映射到结果/自我矩阵的索引中。由于我们有 6 个类别，因此我们只需要将它们映射到 0、1、2、3、4、5。一个简单的 lambda 函数就可以解决问题。我使用随机采样器从类列表中获取数据标签，如下所示:(我从 6 个类中随机创建了 40 个观察值)

classes = list([-2,-1,0,1,2,9])

labels = list()
for i in range(0,40):
    labels.append(list([(lambda x: x+2 if x !=9 else 5)(random.sample(classes,1)[0])]))

index_aka_labels = torch.tensor(labels)
print(index_aka_labels)
print(index_aka_labels.shape)
torch.zeros(40, 6, dtype=src.dtype).scatter_(1, index_aka_labels, 1)

最后，我们实现了我们想要的OHE结果:

tensor([[0, 0, 0, 0, 0, 1],
        [0, 0, 1, 0, 0, 0],
        [0, 0, 0, 0, 1, 0],
        [0, 0, 0, 0, 1, 0],
        ... (40 observations)
        [0, 1, 0, 0, 0, 0],
        [0, 0, 0, 1, 0, 0],
        [1, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 1],

关于python - 一种热编码期间的 RunTimeError，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/68045496/

文章推荐： c - 如何在 cython 中调用 C 内联汇编汇编？

文章推荐： php - 没有为此 uri 定义的路由

python - 热 python 输入循环
我想要类似于以下伪代码的东西: while input is not None and timer = 5: print "took too long" else: print inp
c# - 热/冷 Observable，增加订阅者
如何将 MainEngine Observable 转换为 Cold？来自这个例子: public IObservable MainEngine { get
这款Moto 360圆形智能手表，在五年前引发手表「刷机」热
自从手表被发明以来，表盘的方圆之争就始终没有停下来过，在漫长的岁月中，无论是方形还是圆形表盘，人们都为其寻找到足够多的设计元素，让其肆意成长，这种生机与活力后来也延续到了智能手表上，在2014年，这
cuda - 用 CUDA 求解二维扩散(热)方程
我正在学习 CUDA，试图解决一些标准问题。例如，我正在使用以下代码求解二维扩散方程。但我的结果与标准结果不同，我无法弄清楚。 //kernel definition __global__ void
java - 在 JBoss(热)重新部署后找到所需的 dll？
我的 Web 应用程序使用 native dll 来实现其部分功能(其位置在 PATH 中提供)。一切正常，直到我对 WAR 进行更改并且 JBoss 热部署此 WAR。此时dll已经找不到了，需要手
java - 热 Observables 的 RxJava 延迟
我看到这个问题here 。这是关于实现每个发出的项目的延迟。这是根据accepted answer如何实现的: Observable.zip(Observable.range(1, 5) .g
mysql - 热 vs 冷 mysql 模式迁移和提高速度
我最近一直在进行冷迁移...这意味着我无法在进行迁移时从应用程序级别读取/写入数据库(维护页面)。这样就不会因为更改结构而发生错误，而且如果负载很大，我也不希望 mysql 在迁移过程中崩溃。我的

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python - 一种热编码期间的 RunTimeError