c++ - 类似于推力的 CUB 模板-6ren

c++ - 类似于推力的 CUB 模板

转载作者：行者123 更新时间：2023-11-30 03:31:48

26

4

以下是推力代码:

h_in_value[7] = thrust::reduce(thrust::device, d_in1 + a - b, d_ori_rho_L1 + a);

这里，thrust::reduce 获取第一个和最后一个输入迭代器，thrust 将值返回给 CPU(复制到 h_in_value)

可以使用 CUB 获得此功能吗？

第一个和最后一个迭代器作为输入
将结果返回给主机

最佳答案

Can this functionality be obtained using CUB?

是的，使用 CUB 可以做类似的事情。您需要的大部分内容都已包含 here在 sum reduce 的示例片段中。此外，CUB 不会自动将数量复制回主机代码，因此我们需要对其进行管理。这是一种可能的实现方式:

$ cat t125.cu
#include <thrust/reduce.h>
#include <thrust/execution_policy.h>
#include <thrust/device_vector.h>
#include <cub/cub.cuh>
#include <iostream>

typedef int mytype;

const int dsize = 10;
const int val  = 1;


template <typename T>
T my_cub_reduce(T *begin, T *end){

  size_t num_items = end-begin;
  T *d_in = begin;
  T *d_out, res;
  cudaMalloc(&d_out, sizeof(T));
  void     *d_temp_storage = NULL;
  size_t   temp_storage_bytes = 0;
  cub::DeviceReduce::Sum(d_temp_storage, temp_storage_bytes, d_in, d_out, num_items);
  // Allocate temporary storage
  cudaMalloc(&d_temp_storage, temp_storage_bytes);
  // Run sum-reduction
  cub::DeviceReduce::Sum(d_temp_storage, temp_storage_bytes, d_in, d_out, num_items);
  cudaMemcpy(&res, d_out, sizeof(T), cudaMemcpyDeviceToHost);
  cudaFree(d_out);
  cudaFree(d_temp_storage);
  return res;
}

template <typename T>
typename thrust::iterator_traits<T>::value_type
my_cub_reduce(T begin, T end){

  return my_cub_reduce(thrust::raw_pointer_cast(&(begin[0])), thrust::raw_pointer_cast(&(end[0])));
}

int main(){

  mytype *d_data, *h_data;
  cudaMalloc(&d_data, dsize*sizeof(mytype));
  h_data = (mytype *)malloc(dsize*sizeof(mytype));
  for (int i = 0; i < dsize; i++) h_data[i] = val;
  cudaMemcpy(d_data, h_data, dsize*sizeof(mytype), cudaMemcpyHostToDevice);
  std::cout << "thrust reduce: " << thrust::reduce(thrust::device, d_data, d_data+dsize) << std::endl;
  std::cout << "cub reduce:    " << my_cub_reduce(d_data, d_data+dsize) << std::endl;
  thrust::device_vector<int> d(5,1);
  // using thrust style container iterators and pointers
  std::cout << my_cub_reduce(d.begin(), d.end()) << std::endl;
  std::cout << my_cub_reduce(thrust::device_pointer_cast(d.data()), thrust::device_pointer_cast(d.data()+d.size())) << std::endl;
}
$ nvcc -arch=sm_61 -o t125 t125.cu
$ ./t125
thrust reduce: 10
cub reduce:    10
5
5
$

编辑:通过几行额外的代码，我们可以添加对推力式设备容器迭代器和指针的支持。我也更新了上面的代码以证明这一点。

关于c++ - 类似于推力的 CUB 模板，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/43924639/

26

4

0

文章推荐： java - 使用包含空字段的复合键读取 Multimap

文章推荐： c++ - 使用 clang++ v4 和 gcc 6.3 库的自定义分配器

文章推荐： java - 如何从 LibGDX 中的图像中裁剪非矩形部分？

c++ - CUB 选择是否有返回的索引
我最近在使用 Thrust 时遇到性能问题图书馆。这些来自在大型嵌套循环结构的基础上分配内存的推力。这显然是不需要的，理想情况下使用预分配的全局内存块执行。我想通过以下三种方式之一删除或改进有问题的代
c++ - 类似于推力的 CUB 模板
以下是推力代码: h_in_value[7] = thrust::reduce(thrust::device, d_in1 + a - b, d_ori_rho_L1 + a); 这里，thrust:
c++ - CUB 的 TexRefInputIterator 是如何工作的？
CUB provides an iterator对于纹理引用，其中的实现 is readily accessible . 因为我不知道如何自己实现可模板化的纹理引用 - 他们 "can only be
c++ - CUB (CUDA UnBound) 相当于 thrust::gather
由于 Thrust 库存在一些性能问题(有关详细信息，请参阅 this page)，我计划重构一个 CUDA 应用程序以使用 CUB 而不是 Thrust。具体来说，就是替换 thrust::sort
从 Visual Studio 2010 构建时缺少 Wix darice.cub 文件
我们为基本的 Windows 应用程序创建了一个简单的 wix 项目。一切都很好，并且生成了 MSI。但是，切换到 Release 会出现以下错误消息； light.exe(0,0): error

首页

博学

6Ren·AI

商城

c++ - 类似于推力的 CUB 模板