cuda - 将 cuda 数组传递给 Thrust::inclusive

cuda - 将 cuda 数组传递给 Thrust::inclusive_scan

转载作者：行者123 更新时间：2023-12-02 17:24:57

31

4

我可以对 cpu 上的数组使用包容性扫描，但是否可以对 gpu 上的数组执行此操作？ (注释是我知道有效但我不需要的方式)。或者，是否有其他简单的方法可以对设备内存中的数组执行包含扫描？

代码:

#include <stdio.h>
#include <stdlib.h> /* for rand() */
#include <unistd.h> /* for getpid() */
#include <time.h> /* for time() */
#include <math.h>
#include <assert.h>
#include <iostream>
#include <ctime>
  #include <thrust/scan.h>
#include <cuda.h>



#ifdef DOUBLE
 #define REAL double
 #define MAXT 256
#else
 #define REAL float
 #define MAXT 512
#endif

#ifndef MIN
#define MIN(x,y) ((x < y) ? x : y)
#endif

using namespace std;

bool errorAsk(const char *s="n/a")
{
cudaError_t err=cudaGetLastError();
if(err==cudaSuccess)
    return false;
printf("CUDA error [%s]: %s\n",s,cudaGetErrorString(err));
return true;
};

double *fillArray(double *c_idata,int N,double constant) {
    int n;
    for (n = 0; n < N; n++) {
            c_idata[n] = constant*floor(drand48()*10);

    }
return c_idata;
}

int main(int argc,char *argv[])
{
    int N,blocks,threads;
    N = 100;
    threads=MAXT;
    blocks=N/threads+(N%threads==0?0:1);

    double *c_data,*g_data;

    c_data = new double[N];
    c_data = fillArray(c_data,N,1);
    cudaMalloc(&g_data,N*sizeof(double));

    cudaMemcpy(g_data,c_data,N*sizeof(double),cudaMemcpyHostToDevice);
    thrust::inclusive_scan(g_data, g_data + N, g_data); // in-place scan
    cudaMemcpy(c_data,g_data,N*sizeof(double),cudaMemcpyDeviceToHost);

//        thrust::inclusive_scan(c_data, c_data + N, c_data); // in-place scan

    for(int i = 0; i < N; i++) {
            cout<<c_data[i]<<endl;
    }
}

最佳答案

如果您阅读 thrust quick start guide您会发现处理“原始”设备数据的一个建议:使用 thrust::device_ptr:

You may wonder what happens when a "raw" pointer is used as an argument to a Thrust function. Like the STL, Thrust permits this usage and it will dispatch the host path of the algorithm. If the pointer in question is in fact a pointer to device memory then you'll need to wrap it with thrust::device_ptr before calling the function.

要修复您的代码，您需要

#include <thrust/device_ptr.h>

并将现有的对 thrust::inclusive_scan 的调用替换为以下两行:

thrust::device_ptr<double> g_ptr = thrust::device_pointer_cast(g_data);
thrust::inclusive_scan(g_ptr, g_ptr + N, g_ptr); // in-place scan

另一种方法是使用推力 execution policies并像这样修改您的调用:

thrust::inclusive_scan(thrust::device, g_data, g_data + N, g_data);

还有其他各种可能性。

关于cuda - 将 cuda 数组传递给 Thrust::inclusive_scan，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/33156534/

31

4

0

文章推荐： javascript - Facebook API : checking if user is logged in on connect

文章推荐： coldfusion - cfinclude .cfm 函数到 CFC 中，昂贵吗？

文章推荐： php - 如何在 PHP 中触发正则表达式拒绝服务？

文章推荐： php - YouTube API v3 未检索 channel 的视频

cuda - Thrust::min_element 在 Thrust::device_vector 上发生崩溃(CUDA Thrust)
以下 CUDA Thrust 程序崩溃: #include #include int main(void) { thrust::device_vector vec; for (int i(
c++ - thrust::device_vector 使用 thrust::replace 或 thrust::transform 自定义仿函数/谓词
我使用 cuda 内核对推力 vector 执行 S 形激活: thrust::device_vector output = input; float * output_ptr = thrust::r
c++ - thrust::complex with thrust reduce 无法编译
我一直在尝试实现一些需要在 thrust::complexes 上调用 reduce 的代码，编译器向我发出错误消息: cannot pass an argument with a user-prov
c++ - cuda thrust::for_each with thrust::counting_iterator
我是 CUDA 的新手，而且很吃力。当提供 counting_iterator 时，我似乎无法让 thrust::for_each 算法工作。这是我的简单仿函数: struct print_Funct
c++ - thrust::device_vector of thrust::complex 编译错误，可能是由于错误的实现
我实际上正在学习CUDA和thrust，我正在尝试用.cpp做一个项目，。 hpp 文件和 .cu, .cuh 文件。因此，我做了第一个小实现(见下面的代码)，但是我有一个编译错误。这是 output
c++ - 如何使用 CUDA Thrust 执行策略覆盖 Thrust 的低级设备内存分配器
我想覆盖低级 CUDA 设备内存分配器(实现为 thrust::system::cuda::detail::malloc())，以便它使用自定义分配器而不是直接调用 cudaMalloc()在主机 (
c++ - 如何将二维 thrust::device_vector> 转换为原始指针
当我在main函数中使用thrust::device_vector时，可以正确的传递给内核函数，代码如下: thrust::device_vector device_a(2); thrust::h
c++ - Thrust device vector of thrust device vector 推力装置 vector
我在 CUDA 中使用这种 vector 方法的 vector 方法，因为我仍然习惯于 Matlab 和 Python 风格的编程环境。我能够从设备 vector 中的主机端提取数据，但现在我不确定如
c++ - 命名空间 thrust::system::cuda::thrust 中无法解释的错误，特别是在 "system_error"和 "cuda_category"
我正在尝试使用 thrust::raw_pointer_cast 转换原始指针以捕获仿函数中的输出。我尝试了多种方法来将指针传递给 float ，但不断出现内存冲突和两个智能感知错误 thrust::
thrust 学习笔记
gather与scatter正好相反： scatter是顺序输入根据map确定撒点输出位置。 #include #include #include ... // mark even indice
cuda - Thrust 是同步还是异步？
我是 Thrust 的新手，有件事我不明白。 Thrust 是异步还是同步？如果我编写以下代码，所花费的时间不是0。但在其他标签中，其他用户报告的结果为0。真相是什么？ clock_t start,
thrust - 编译器不支持#pragma Once
我的编译器 (PGI) 不支持 #pragma once 但是我想包含的库(推力)使用它们。这个问题有解决办法吗？最佳答案您可以使用guardonce将 #pragma Once 语句转换为标准
cuda - Thrust::remove_if的返回值类型
我的设备上有两个整数数组 dmap 和 dflag相同的长度我用推力设备指针 dmapt 和dflagt dmap 数组中有一些值为 -1 的元素。我想要删除这些 -1 和相应的值dflag 数组。
cuda - Thrust 如何知道如何自动配置它启动的内核？
Thrust 能够对编码器隐藏各种细节，并且声称 Thrust 会根据系统规范在一定程度上设置参数。 Thrust 如何选择最佳参数化，以及如何处理不同机器上的各种代码？ Thrust 实现这种通用库
cuda - Thrust 设备管理和内核
我在当前项目中使用了 Thrust，所以我不必写 device_vector自己抽象或(分段)扫描内核。到目前为止，我已经使用推力抽象完成了我的所有工作，但是对于简单的内核或不容易转换为 for_e
c++ - Thrust 中的虚方法调用
我想做这样的事情: BaseFunctor* f = new MyFunctor(); thrust::transform(it1,it2,MyFunctor); 目标是让用户能够传递不同的仿函数(具
c++ - Thrust 对主机上运行的自定义仿函数的结果不正确
当我尝试实现任何仿函数时，我得到了不好的结果。例如，我尝试了一个类似于 thrust::negate 的否定仿函数下面是一个示例代码，它使用内置的否定仿函数产生了良好的结果: int data[10]
在 thrust 中调用用户定义的函数
我正在使用 OpenCV 加载一个 .png 文件，我想使用 thrust 库提取它的蓝色强度值。我的代码是这样的: 使用 OpenCV IplImage 指针加载图像将图像数据复制到thrust
c++ - Thrust+boost代码编译错误
我有一个奇怪的问题，我无法解决。它与 boost +推力代码相关联。代码: #include #include #include #include #include #include #
cuda - 使用 Thrust 的向量数组
是否可以使用 Thrust 创建一个 device_vectors 数组？我知道我不能创建一个 device_vector 的 device_vector，但是我将如何创建一个 device_vect

首页

博学

6Ren·AI

商城

cuda - 将 cuda 数组传递给 Thrust::inclusive_scan