gpt4 book ai didi

c++ - Cublas 矩阵 LU 分解

转载 作者:塔克拉玛干 更新时间:2023-11-03 02:02:41 24 4
gpt4 key购买 nike

我在 cuda 中调用 dgetrf 时遇到了一些问题。根据我的发现,我只能调用批处理版本 ( http://docs.nvidia.com/cuda/cublas/#cublas-lt-t-gt-getrfbatched )。当我调用它时,我得到返回的错误值 7,我无法找到该错误代码的相应枚举。以下是我的代码,如有任何帮助,我们将不胜感激;

void cuda_matrix_inverse (int m, int n, double* a){

cublasHandle_t handle;
cublasStatus_t status;
double **devPtrA = 0;
double **devPtrA_dev = NULL;
int *d_pivot_array;
int *d_info_array;
int rowsA = m;
int colsA = n;
int matrixSizeA;
cudaError_t error;

fprintf(stderr,"starting cuda inverse\n");

error = cudaMalloc((void **)&d_pivot_array, sizeof(int));
if (error != cudaSuccess) fprintf(stderr,"\nError: %s\n",cudaGetErrorString(error));
error = cudaMalloc((void **)&d_info_array, sizeof(int));
if (error != cudaSuccess) fprintf(stderr,"\nError: %s\n",cudaGetErrorString(error));

fprintf(stderr,"malloced pivot and info\n");

status = cublasCreate(&handle);
if (status != CUBLAS_STATUS_SUCCESS) fprintf(stderr,"error %i\n",status);

matrixSizeA = rowsA * colsA;

devPtrA =(double **)malloc(1 * sizeof(*devPtrA));

fprintf(stderr,"malloced devPtrA\n");

error = cudaMalloc((void **)&devPtrA[0], matrixSizeA * sizeof(devPtrA[0][0]));
if (error != cudaSuccess) fprintf(stderr,"\nError: %s\n",cudaGetErrorString(error));

error = cudaMalloc((void **)&devPtrA_dev, 1 * sizeof(*devPtrA));
if (error != cudaSuccess) fprintf(stderr,"\nError: %s\n",cudaGetErrorString(error));

fprintf(stderr,"malloced device variables\n");

error = cudaMemcpy(devPtrA_dev, devPtrA, 1 * sizeof(*devPtrA), cudaMemcpyHostToDevice);
if (error != cudaSuccess) fprintf(stderr,"\nError: %s\n",cudaGetErrorString(error));

fprintf(stderr,"copied from devPtrA to d_devPtrA\n");

status = cublasSetMatrix(rowsA, colsA, sizeof(a[0]), a, rowsA, devPtrA[0], rowsA);
if (status != CUBLAS_STATUS_SUCCESS) fprintf(stderr,"error %i\n",status);


status = cublasDgetrfBatched(handle, m, devPtrA_dev,m,d_pivot_array,d_info_array,1); //cannot get this to work
if (status != CUBLAS_STATUS_SUCCESS) fprintf(stderr,"error in dgetrf %i\n",status);


fprintf(stderr,"done with cuda inverse\n");
}

最佳答案

cublas的错误码7表示CUBLAS_STATUS_INVALID_VALUE . cublas 中的矩阵求逆仅适用于方矩阵,因此我假设 m == n在你的情况下。话虽这么说,功能cublas<t>getrfBatched要求枢轴数组的长度为 n对于每个矩阵,所以你应该分配 d_pivot_array作为:

error = cudaMalloc((void **)&d_pivot_array, n * sizeof(int));

为了更通用,它被分配为:

error = cudaMalloc((void **)&d_pivot_array, n * batchSize * sizeof(int));

Here是我在测试 CUBLAS 函数时编写的方阵求逆代码。函数输入输出为float设备上分配的类型方阵。

关于c++ - Cublas 矩阵 LU 分解,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22501502/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com