cuda - 在线程内使用推力::排序-6ren

cuda - 在线程内使用推力::排序

转载作者：行者123 更新时间：2023-12-04 17:20:32

24

4

我想知道是否可以在线程内使用推力::排序()

__global__
void mykernel(float* array, int arrayLength)
{
    int threadID = blockIdx.x * blockDim.x + threadIdx.x;
    // array length is vector in the device global memory
    // is it possible to use inside the thread?
    thrust::sort(array, array+arrayLength);
    // do something else with the array
}

如果是，排序是否会启动其他内核来并行化排序？

最佳答案

是的，thrust::sort可以与 thrust::seq 结合使用在单个 CUDA 线程内(或在单个 CPU 线程内按顺序)对数字进行排序的执行策略:

#include <thrust/sort.h>
#include <thrust/execution_policy.h>

__global__
void mykernel(float* array, int arrayLength)
{
  int threadID = blockIdx.x * blockDim.x + threadIdx.x;

  // each thread sorts array
  // XXX note this causes a data race
  thrust::sort(thrust::seq, array, array + arrayLength);
}

请注意，您的示例会导致数据竞争，因为每个 CUDA 线程都尝试并行排序相同的数据。正确的无竞争程序会分区 array根据线程索引。
thrust::seq此功能所需的执行策略仅在 Thrust v1.8 或更高版本中可用。

关于cuda - 在线程内使用推力::排序，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/23403653/

24

4

0

文章推荐： tensorflow - 如何使用 tensorflow 服务部署 parsey 的表亲

文章推荐： javascript - knockout :编写和读取计算属性

文章推荐： javascript - 在 AngularJS 指令中隔离范围

cuda - 推力:填充编译错误
我需要一些帮助来追踪 thrust::fill 给我的编译错误。代码没有问题: line 9 #include // needed for other thrus
cuda - 推力:如何返回事件数组元素的索引
如何使用推力返回事件数组元素的索引，即返回数组元素等于 1 的索引向量？对此进行扩展，在给定数组维度的多维索引的情况下，这将如何工作？编辑:目前该功能看起来像这样 template void Vo
c++ - 推力::device_vector的结构抛出总线错误
当尝试创建thrust::device_vector的struct时，我得到了Bus error (core dumped)。奇怪的是，下面的代码在我的笔记本电脑(Quadro P2000)上运行良好
c++ - 推力::主机执行策略的段错误
我尝试将数据从主机复制到设备并返回，但不是使用 CUDA API，而是使用推力库。我在 thrust::host_vector 中分配了内存，并尝试将其复制到 thrust::device_vecto
cuda - 推力:删除键值数组中的重复项
我有一对大小相等的数组，我将它们称为键和值。例如: K: V 1: 99 1: 100 1: 100 1: 100 1: 103 2: 103 2: 105 3: 45 3: 67 键被排序，与每个
c++ - 推力即时按键排序还是不同的方法？
我想知道是否可以使用 Thrust 库按键排序，而无需创建 Vector 来存储键(动态)。例如，我有以下两个 vector :键和值: vectorKeys: 0, 1, 2, 0,
c++ - 推力:如何有意避免将参数传递给算法？
假设我想做一个 thrust::reduce_by_key 但我不关心输出键是什么。有没有一种方法可以通过某种方式将空对象(可能是空指针)传递给该参数的算法，从而不会创建毫无意义的输出键列表，从而节省
sorting - 推力::sort_by_key:如何将结果存储在单独的数组中？
我目前正在通过以下方式按键对值进行排序 thrust::sort_by_key(thrust::device_ptr(keys), thrust::device
cuda - 推力:如何从主机阵列创建 device_vector？
这个问题在这里已经有了答案: is there a better and a faster way to copy from CPU memory to GPU using thrust? (1 个回
c++ - 推力 vector 指针声明
有没有办法在不实际分配 vector 的情况下声明推力 vector 指针？我需要将此指针用作类中的成员变量。因为我事先并不知道 vector 的大小，所以我不能将 vector 静态分配为成员变量。
c++ - 推力 set_intersection 是如何工作的？
我想知道如何 thrust::set_intersection有效，但从我的测试结果来看，我对这个函数的作用更加困惑。举几个例子: const int size1 = 5; const int si
c++ - 推力 vector 距离计算
考虑以下数据集和质心。一共有7个人，两个均值有8个维度。它们按行主要顺序存储。 short dim = 8; float centroids[] = { 0.223, 0.002, 0.223
使用 double2 阵列减少 CUDA 推力
我有以下(可编译和可执行)代码，使用 CUDA Thrust 来执行 float2 数组的缩减。它工作正常 using namespace std; // includes, system #incl
cuda - 多 GPU CUDA 推力
我有一个使用 Thrust 目前在单个 GPU 上正常工作的 Cuda C++ 代码。我现在想为多 GPU 修改它。我有一个主机函数，其中包括许多对设备数组进行排序、复制、计算差异等的推力调用。我想使
c++ - 推力 vector 切片/ View
我在 thrust::device_vector 中有一个矩阵(面向行) .有什么方法可以获取该 vector 的切片/ View (也属于 thrust::device_vector 类型)？我对复
c++ - 推力/cuda reduce_by_key 错误？
我遇到了 thrust 库的 reduce_by_key 函数的问题。对我来说这看起来像是一个错误，但我想在报告之前确定一下。首先，我的设置:CUDA 7.0、Windows 8、NIVIDA Ge
c++ - 推力:不支持运算符 '*'
我有以下函数，用于用从 -time/2 到 time/2 的步长和步长 dt 填充 vector t: #define THRUST_PREC thrust::complex __host__ voi
C++ CUDA 推力 vector 多态性
在我现在正在编写的程序中，我想使用 GPU 或 CPU 进行计算(用于对彼此进行基准测试)。为此，我想要一些通用指针，我可以像这样使用 device_vector 或 host_vector 的实例对
cuda - 推力::device_ptr 没有成员 'begin'
我试图找到数组中的最小元素: thrust::device_ptr devPtr(d_ary); int minPos = thrust::min_element(devPtr.begin(),
推力 : how to implement priority queue 上的 CUDA
我的计划是使用 Pearsons 相关性计算距离矩阵，并从距离矩阵中为每个节点 (q=ln(n)) 获取 q-最近邻，并将它们放入结果向量中。我在 C++ 中使用相关函数循环内的 STL 优先级队列来

首页

博学

6Ren·AI

商城

cuda - 在线程内使用推力::排序