2011-01-30 26 views
2

我跑我的8000系列设备上的下面的代码(支持CUDA):线程索引

#include <stdio.h> 
__global__ void testSet(int * MyBlock) 
{ 
    unsigned int ThreadIDX= threadIdx.x+blockDim.x*blockIdx.x; 
    MyBlock[ThreadIDX]=ThreadIDX; 
} 

int main() 
{ 
    int * MyInts; 
    int Result[1024]; 
    cudaMalloc((void**) &MyInts,sizeof(int)*1024); 
    testSet<<<2,512>>>(MyInts); 
    cudaMemcpy(Result,MyInts,sizeof(int)*1024,cudaMemcpyDeviceToHost); 
    for(unsigned int t=0; t<1024/8;t++) { 
     printf("Results: %d %d %d %d %d %d %d %d\n", 
       Result[t], Result[t+1],Result[t+2], 
       Result[t+3],Result[t+4],Result[t+5], 
       Result[t+6],Result[t+7]); 
    } 
    return 0; 
} 

我出去......

Results: 0 1 2 3 4 5 6 7 
Results: 1 2 3 4 5 6 7 8 
Results: 2 3 4 5 6 7 8 9 
Results: 3 4 5 6 7 8 9 10 
Results: 4 5 6 7 8 9 10 11 
Results: 5 6 7 8 9 10 11 12 
Results: 6 7 8 9 10 11 12 13 
Results: 7 8 9 10 11 12 13 14 
Results: 8 9 10 11 12 13 14 15 
Results: 9 10 11 12 13 14 15 16 
Results: 10 11 12 13 14 15 16 17 
Results: 11 12 13 14 15 16 17 18 
Results: 12 13 14 15 16 17 18 19 
Results: 13 14 15 16 17 18 19 20 
Results: 14 15 16 17 18 19 20 21 
Results: 15 16 17 18 19 20 21 22 
Results: 16 17 18 19 20 21 22 23 
Results: 17 18 19 20 21 22 23 24 
Results: 18 19 20 21 22 23 24 25 
Results: 19 20 21 22 23 24 25 26 
Results: 20 21 22 23 24 25 26 27 
Results: 21 22 23 24 25 26 27 28 
Results: 22 23 24 25 26 27 28 29 
Results: 23 24 25 26 27 28 29 30 
Results: 24 25 26 27 28 29 30 31 
Results: 25 26 27 28 29 30 31 32 
Results: 26 27 28 29 30 31 32 33 
Results: 27 28 29 30 31 32 33 34 
Results: 28 29 30 31 32 33 34 35 
Results: 29 30 31 32 33 34 35 36 
Results: 30 31 32 33 34 35 36 37 
Results: 31 32 33 34 35 36 37 38 
Results: 32 33 34 35 36 37 38 39 
Results: 33 34 35 36 37 38 39 40 
Results: 34 35 36 37 38 39 40 41 
Results: 35 36 37 38 39 40 41 42 
Results: 36 37 38 39 40 41 42 43 
Results: 37 38 39 40 41 42 43 44 
Results: 38 39 40 41 42 43 44 45 
Results: 39 40 41 42 43 44 45 46 
Results: 40 41 42 43 44 45 46 47 
Results: 41 42 43 44 45 46 47 48 
Results: 42 43 44 45 46 47 48 49 
Results: 43 44 45 46 47 48 49 50 
Results: 44 45 46 47 48 49 50 51 
Results: 45 46 47 48 49 50 51 52 
Results: 46 47 48 49 50 51 52 53 
Results: 47 48 49 50 51 52 53 54 
Results: 48 49 50 51 52 53 54 55 
Results: 49 50 51 52 53 54 55 56 
Results: 50 51 52 53 54 55 56 57 
Results: 51 52 53 54 55 56 57 58 
Results: 52 53 54 55 56 57 58 59 
Results: 53 54 55 56 57 58 59 60 
Results: 54 55 56 57 58 59 60 61 
Results: 55 56 57 58 59 60 61 62 
Results: 56 57 58 59 60 61 62 63 
Results: 57 58 59 60 61 62 63 64 
Results: 58 59 60 61 62 63 64 65 
Results: 59 60 61 62 63 64 65 66 
Results: 60 61 62 63 64 65 66 67 
Results: 61 62 63 64 65 66 67 68 
Results: 62 63 64 65 66 67 68 69 
Results: 63 64 65 66 67 68 69 70 
Results: 64 65 66 67 68 69 70 71 
Results: 65 66 67 68 69 70 71 72 
Results: 66 67 68 69 70 71 72 73 
Results: 67 68 69 70 71 72 73 74 
Results: 68 69 70 71 72 73 74 75 
Results: 69 70 71 72 73 74 75 76 
Results: 70 71 72 73 74 75 76 77 
Results: 71 72 73 74 75 76 77 78 
Results: 72 73 74 75 76 77 78 79 
Results: 73 74 75 76 77 78 79 80 
Results: 74 75 76 77 78 79 80 81 
Results: 75 76 77 78 79 80 81 82 
Results: 76 77 78 79 80 81 82 83 
Results: 77 78 79 80 81 82 83 84 
Results: 78 79 80 81 82 83 84 85 
Results: 79 80 81 82 83 84 85 86 
Results: 80 81 82 83 84 85 86 87 
Results: 81 82 83 84 85 86 87 88 
Results: 82 83 84 85 86 87 88 89 
Results: 83 84 85 86 87 88 89 90 
Results: 84 85 86 87 88 89 90 91 
Results: 85 86 87 88 89 90 91 92 
Results: 86 87 88 89 90 91 92 93 
Results: 87 88 89 90 91 92 93 94 
Results: 88 89 90 91 92 93 94 95 
Results: 89 90 91 92 93 94 95 96 
Results: 90 91 92 93 94 95 96 97 
Results: 91 92 93 94 95 96 97 98 
Results: 92 93 94 95 96 97 98 99 
Results: 93 94 95 96 97 98 99 100 
Results: 94 95 96 97 98 99 100 101 
Results: 95 96 97 98 99 100 101 102 
Results: 96 97 98 99 100 101 102 103 
Results: 97 98 99 100 101 102 103 104 
Results: 98 99 100 101 102 103 104 105 
Results: 99 100 101 102 103 104 105 106 
Results: 100 101 102 103 104 105 106 107 
Results: 101 102 103 104 105 106 107 108 
Results: 102 103 104 105 106 107 108 109 
Results: 103 104 105 106 107 108 109 110 
Results: 104 105 106 107 108 109 110 111 
Results: 105 106 107 108 109 110 111 112 
Results: 106 107 108 109 110 111 112 113 
Results: 107 108 109 110 111 112 113 114 
Results: 108 109 110 111 112 113 114 115 
Results: 109 110 111 112 113 114 115 116 
Results: 110 111 112 113 114 115 116 117 
Results: 111 112 113 114 115 116 117 118 
Results: 112 113 114 115 116 117 118 119 
Results: 113 114 115 116 117 118 119 120 
Results: 114 115 116 117 118 119 120 121 
Results: 115 116 117 118 119 120 121 122 
Results: 116 117 118 119 120 121 122 123 
Results: 117 118 119 120 121 122 123 124 
Results: 118 119 120 121 122 123 124 125 
Results: 119 120 121 122 123 124 125 126 
Results: 120 121 122 123 124 125 126 127 
Results: 121 122 123 124 125 126 127 128 
Results: 122 123 124 125 126 127 128 129 
Results: 123 124 125 126 127 128 129 130 
Results: 124 125 126 127 128 129 130 131 
Results: 125 126 127 128 129 130 131 132 
Results: 126 127 128 129 130 131 132 133 
Results: 127 128 129 130 131 132 133 134 

就没有我期望0..1024被打印?

我误会了什么吗?我阅读NVIDIA CUDA编程指南的介绍部分,我认为这是如何工作的。

当然,到目前为止,我已经遇到了很多令人讨厌的bug /设计限制(例如8000系列缺乏双精度支持)以及CUDA导致iomanip命令(setw,setprecision)的错误,如果使用“ std :: ...“而不是一般的”using namespace std;“

所以我想我期待一些whackiness ...

可是我就不顾一切地弄清楚到底是怎么回事......

+1

请不要标记标题中解答的问题,任何人都可以看到你已经接受了答案:) – 2011-01-31 02:55:58

回答

4

变化:

for(unsigned int t=0; t<1024/8;t++) { 

到:

for(unsigned int t=0; t<1024; t+=8) { 

您有2 x 512 = 1024线程,其索引范围从0..1023。每个线程都将自己的索引写入MyBlock中的相应位置。因此你期望看到一个数组的值等于它的索引。

+2

哦,我一定要疯了...谢谢...是啊简单的错误。 – 2011-01-30 22:48:48

相关问题