2012-08-08 91 views
5

如何在C语言中使用以下代码(当然使用SIMD内联函数)?我无法理解SIMD内部函数,这将有很大的帮助:SIMD以下代码

int sum_naive(int n, int *a) 
{ 
    int sum = 0; 
    for(int i = 0; i < n; i++) 
     sum += a[i]; 
    return sum; 
} 
+0

您记住了哪个SIMD? SSE2? – harold 2012-08-08 20:52:35

+0

SSE以下内在可用于.__ m128i _mm_setzero_si128() __m128i _mm_loadu_si128(__m128i * P) __m128i _mm_add_epi32(__m128i一个,__m128i B)(A0 + B0,A1 + B1,A2 + B2,A3 + B3) 空隙_mm_storeu_si128(__m128i * p,__m128i a) – user1585869 2012-08-08 21:03:27

+0

好吧,那么SSE2。你有什么尝试? – harold 2012-08-08 21:08:13

回答

8

这里是一个非常简单的实现(警告:未经测试的代码):

int32_t sum_array(const int32_t a[], const int n) 
{ 
    __m128i vsum = _mm_set1_epi32(0);  // initialise vector of four partial 32 bit sums 
    int32_t sum; 
    int i; 

    for (i = 0; i < n; i += 4) 
    { 
     __m128i v = _mm_load_si128(&a[i]); // load vector of 4 x 32 bit values 
     vsum = _mm_add_epi32(vsum, v);  // accumulate to 32 bit partial sum vector 
    } 
    // horizontal add of four 32 bit partial sums and return result 
    vsum = _mm_add_epi32(vsum, _mm_srli_si128(vsum, 8)); 
    vsum = _mm_add_epi32(vsum, _mm_srli_si128(vsum, 4)); 
    sum = _mm_cvtsi128_si32(vsum); 
    return sum; 
} 

注意,输入数组,a[],需要是16字节对齐的,并且n应该是4的倍数。