c++ - CUDA 直方图 reduce_by

c++ - CUDA 直方图 reduce_by_key 失败

转载作者：行者123 更新时间：2023-11-28 02:13:33

我有以下 CUDA Thrust 代码，它使用 reduce_by_key 将值 [0, 1024) 的直方图绘制到 256 个桶中。我希望每个桶的计数 = 4，但我看到桶 0 有 256 个，桶 255 有 3 个，其余桶有 4 个。

#include <stdio.h>
#include <stdlib.h>

#include <cuda.h>
#include <cuda_runtime.h>
#include <device_launch_parameters.h>

#include <thrust/device_vector.h>
#include <thrust/extrema.h>
#include <thrust/pair.h>

#define SIZE 1024

struct binFunc {
    const float minVal;
    const float valRange;
    const int numBins;
    binFunc(float _minVal, float _valRange, int _numBins) :
        minVal(_minVal), valRange(_valRange), numBins(_numBins) {}

    __host__ __device__
    int operator()(float v) const {
        int b = int((v - minVal) / valRange * float(numBins));
        return b;
    }
};

int main() {
    thrust::device_vector<float> d_vec(SIZE);
    for (int i = 0; i < SIZE; ++i)
        d_vec[i] = float(i);

    thrust::device_vector<float>::iterator min;
    thrust::device_vector<float>::iterator max;
    thrust::pair<thrust::device_vector<float>::iterator,
            thrust::device_vector<float>::iterator> minmax =
            thrust::minmax_element(d_vec.begin(), d_vec.end());
    min = minmax.first;
    max = minmax.second;
    float minVal = *min;
    float maxVal = *max;

    std::cout << "The minimum value is " << minVal
            << " and the maximum value is " << maxVal << "." << std::endl;

    float valRange = maxVal - minVal;

    std::cout << "The range is " << valRange << "." << std::endl;

    int numBins = 256;

    thrust::device_vector<int> d_binResults(SIZE);
    thrust::transform(d_vec.begin(), d_vec.end(), d_binResults.begin(),
            binFunc(minVal, valRange, numBins));

    thrust::device_vector<int>::iterator d_binResults_iter =
            d_binResults.begin();
    for (int i = 0; i < 10; ++i) {
        int b = *d_binResults_iter;
        printf("d_binResults[%d]=%d\n", i, b);
        d_binResults_iter++;
    }

    std::cout << "The numBins is " << numBins << "." << std::endl;

    thrust::device_vector<int> d_binsKeys(numBins);
    thrust::device_vector<int> d_binsValues(numBins);

    thrust::pair<thrust::device_vector<int>::iterator,
            thrust::device_vector<int>::iterator> keys_and_values =
            thrust::reduce_by_key(d_binResults.begin(), d_binResults.end(),
                    thrust::constant_iterator<int>(1), d_binsKeys.begin(),
                    d_binsValues.begin());

    thrust::device_vector<int>::iterator d_binsKeys_begin_iter =
            d_binsKeys.begin();
    thrust::device_vector<int>::iterator d_binsValues_begin_iter =
            d_binsValues.begin();
    for (int i = 0; i < numBins; ++i) {
        int key = *d_binsKeys_begin_iter;
        int val = *d_binsValues_begin_iter;
        printf("d_binsValues[%d]=(%d,%d)\n", i, key, val);
        d_binsKeys_begin_iter++;
        d_binsValues_begin_iter++;
    }
    return 0;
}

输出的显着部分是:

d_binsValues[0]=(0,256)
d_binsValues[1]=(1,4)
d_binsValues[2]=(2,4)
...
d_binsValues[254]=(254,4)
d_binsValues[255]=(255,3)

那么，0 号桶有 256 个元素，255 号桶有 3 个元素？这是怎么回事？

最佳答案

如果打印出所有 d_binResults[] 值而不是前 10 个值，您会发现最后一个元素 (d_binResults[1023]) 的值为 256!但那是一个无效的 bin 索引。对于 numBins = 256，有效索引为 0..255。

它是由于你的仿函数中的计算算法而发生的:

    int b = int((v - minVal) / valRange * float(numBins));

插入最后一个元素的相关值，我们有:

(1023 - 0)/1023*256 = 256

但是 256 是无效的 bin 索引。事实证明，这会破坏 reduce_by_key 操作，导致最后一个 bin 有 3 个元素，第一个 bin 被“损坏”。

如果您解决这个问题，您将解决您描述的两个问题(第一个 bin 有 256 个元素，最后一个 bin 有 3 个。)

作为一个简单的证明，添加这行代码:

d_binResults[1023] = 255;

紧接在您的thrust::transform 操作之后。结果是正确的。您如何选择更正您的 bin 计算算法取决于您。 (可能通过将 1 添加到 valRange 来“修复”，但这可能暗示了您预期的直方图值)。

关于c++ - CUDA 直方图 reduce_by_key 失败，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/34843956/

文章推荐： c++ - OpenGL 试图平移对象，但相机也平移，我不需要

文章推荐： c++ - 如何使用 C++ Boost odeint 库求解这个常微分方程

文章推荐： c++ - ROS 与 QtCreator : autocompletion

文章推荐：对象创建时的 C++ 未声明标识符

c - thrust::reduce_by_key 性能，关键重复次数很少
我必须使用许多不同的键对数组进行键控缩减，这些键偶尔会重复一次: keys = {1,2,3,3,4,5,6,7,7, 8, 9, 9,10,11,...} array = {1,2,3,4,5,6
c++ - 推力/cuda reduce_by_key 错误？
我遇到了 thrust 库的 reduce_by_key 函数的问题。对我来说这看起来像是一个错误，但我想在报告之前确定一下。首先，我的设置:CUDA 7.0、Windows 8、NIVIDA Ge
c++ - CUDA 直方图 reduce_by_key 失败
我有以下 CUDA Thrust 代码，它使用 reduce_by_key 将值 [0, 1024) 的直方图绘制到 256 个桶中。我希望每个桶的计数 = 4，但我看到桶 0 有 256 个，桶 2
c++ - CUDA Thrust reduce_by_key 使用更少的内存
我正在尝试减少为我的用例计算 reduce_by_key 所需的内存。与值的数量(大约 1600 万)相比，我有相对较少的唯一键(大约 100-150)。按键减少 example显示分配用于包含结果的
c++ - 推力:reduce_by_key 将 zip_iterator(tuple) 传递给自定义仿函数以按键检索平均值
我想做的是通过 thrust::reduce_by_key 按键获取平均值.我先sort_by_key这对于 reduce_by_key 的连续键分组工作得很好.我用了this帮助我走到这一步。但是，
CUDA Thrust : reduce_by_key on only some values in an array, 基于 "key"数组中的值
假设我有两个 device_vector 数组，d_keys 和 d_data。如果 d_data 例如是一个扁平的 2D 3x5 数组(例如 { 1, 2, 3, 4, 5, 6, 7, 8, 9
c++ - Cuda Thrust - 如何使用 sort_by_key、merge_by_key 和 reduce_by_key 优化代码
我正在使用c++和cuda/thrust在GPU上进行计算，这对我来说是一个新领域。不幸的是，我的代码(下面的 MCVE)效率不高，所以我想知道如何优化它。该代码执行以下操作: 有两个键 vector

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

c++ - CUDA 直方图 reduce_by_key 失败