2017-04-15 63 views
0

我是GPU的新手,我想用GPU解决大型矩阵矢量乘法。我试图用“cublasDgbmv”解决它,因为矩阵是带状矩阵。我试图用一个简单的例子来实现这个命令。这是我写的代码:使用GPU的cublasDgbmv的系数矩阵

/* system of equations sol=A*b: 
    A=[1 2 3 0 0 0 
    2 -1 4 1 0 0 
    3 4 5 -1 7 0 
    0 1 -1 3 8 9 
    0 0 7 8 2 6 
    0 0 0 9 6 10] 
    b=[0 1 2 3 4 5] 
    solution supposed to be (using matlab) [8 10 39 85 76 101]*/ 

#include "cuda_runtime.h" 
#include "device_launch_parameters.h" 
#include <cublas_v2.h> 
#include <stdlib.h> 

#include <stdio.h> 

#define MAX(x, y) (((x) > (y)) ? (x) : (y)) 
#define MIN(x, y) (((x) < (y)) ? (x) : (y)) 

int main() 
{ 
cublasHandle_t handle; 

int i, j, k=2, m, n=6; // i,j are used as counters, k is the bandwidth, n is the size of the matrix and m is a constant that will be used in the code 
double A[36] = {1,2,3,0,0,0, 2,-1,4,1,0,0, 3,4,5,-1,7,0, 0,1,-1,3,8,9, 0,0,7,8,2,6, 0,0,0,9,6,10}; 

double* Ab; 
Ab = (double*)malloc(n*n*sizeof(double)); 

double* b; 
b = (double*)malloc(n*sizeof(double)); 

double* sol; 
sol = (double*)malloc(n*sizeof(double)); 

const double alph1 = 1; 
const double *alpha_1 = &alph1; 

const double alph0 = 0; 
const double *alpha_0 = &alph0; 

double* d_Ab; 
cudaMalloc(&d_Ab, n*n*sizeof(double)); 

double* d_b; 
cudaMalloc(&d_b, n*sizeof(double)); 

double* d_sol; 
cudaMalloc(&d_sol, n*sizeof(double)); 

for(i=0;i<6;i++) 
    b[i] = i; 

for(j=1;j<=n;j++) 
{ 
    m=k+1-j; 
    for(i=MAX(1,j-k);i<=MIN(n,j+k);i++) 
     Ab[(m+i-1)*n+(j-1)] = A[(i-1)*n + (j-1)]; 
} 

cudaMemcpy(d_Ab, Ab, n*n*sizeof(double), cudaMemcpyHostToDevice); 
cudaMemcpy(d_b , b, n*sizeof(double), cudaMemcpyHostToDevice); 

cublasCreate(&handle); 
cublasDgbmv(handle, CUBLAS_OP_N , n, n, k, k, alpha_1, d_Ab, n, d_b, 1, alpha_0, d_sol, 1); 
cublasDestroy(handle); 

cudaMemcpy(sol, d_sol, n*sizeof(double), cudaMemcpyDeviceToHost); 

for(i=0;i<n;i++) 
    printf("%f ", sol[i]); 

free(Ab); 
free(b); 
free(sol); 

cudaFree(d_Ab); 
cudaFree(d_b); 
cudaFree(d_sol); 
} 

我有解决的办法是:

4.000000 8.000000 33.000000 -31387192811020962000000000000000000000000000000000000000000000000000.000000 -31387192811020962000000000000000000000000000000000000000000000000000.000000 -31387192811020962000000000000000000000000000000000000000000000000000.000000 

我知道矩阵应该是在带宽形式和列主要形式。我做到了这一点link

回答

0

我知道答案,我发布给其他人可能想知道它。

应该被写为在问题中提到的带状矩阵...但是,这是格式和它应该被保存在列形式...因此,对于所提到的示例

A=[1 2 3 0 0 0 
    2 -1 4 1 0 0 
    3 4 5 -1 7 0 
    0 1 -1 3 8 9 
    0 0 7 8 2 6 
    0 0 0 9 6 10] 

作为链接中提到的呈条纹状我加入之前

A_banded = [3 1 7 9 0 0 
      2 4 -1 8 6 0 
      1 -1 5 3 2 10 
      0 2 4 -1 8 6 
      0 0 3 1 7 9] 

为了写在列主要形式是应该被保存在下面的表格

A_column_major_form = {3,2,1,0,0,0, 1,4,-1,2,0,0, 7,-1,5,4,3,0, 9,8,3,-1,1,0, 0,6,2,8,7,0, 0,0,10,6,9,0}