局部变量与阵列访问

其中哪些会更具计算效率，为什么？局部变量与阵列访问

A）重复数组访问：

for(i=0; i<numbers.length; i++) { 
    result[i] = numbers[i] * numbers[i] * numbers[i]; 
}

B）设置一个局部变量：

for(i=0; i<numbers.length; i++) { 
    int n = numbers[i]; 
    result[i] = n * n * n; 
}

岂不重复数组访问版本已经被使用指针运算来计算（），使得第一个选项较慢，因为它是这样做的：

for(i=0; i<numbers.length; i++) { 
    result[i] = *(numbers + i) * *(numbers + i) * *(numbers + i); 
}

来源

2013-10-08 jsj

你可以只使用一个Timer对象的代码？看看结果会是有趣的。我提到这一点，因为你已经编码:) –

我想编译器会生成相同的指令。 – Michael

@Michael你在说每个平台上的每个编译器的行为都一样。 – svick

任何足够复杂的编译器将为所有三种解决方案生成相同的代码。我把你的三个版本到一个小的C程序（具有轻微adjustement，我改变了访问numbers.length到宏调用其给出的阵列的长度）：

#include <stddef.h> 

size_t i; 
static const int numbers[] = { 0, 1, 2, 4, 5, 6, 7, 8, 9 }; 

#define ARRAYLEN(x) (sizeof((x))/sizeof(*(x))) 
static int result[ARRAYLEN(numbers)]; 

void versionA(void) 
{ 
    for(i=0; i<ARRAYLEN(numbers); i++) { 
     result[i] = numbers[i] * numbers[i] * numbers[i]; 
    } 
} 

void versionB(void) 
{ 
    for(i=0; i<ARRAYLEN(numbers); i++) { 
     int n = numbers[i]; 
     result[i] = n * n * n; 
    } 
} 

void versionC(void) 
{ 
    for(i=0; i<ARRAYLEN(numbers); i++) { 
      result[i] = *(numbers + i) * *(numbers + i) * *(numbers + i); 
    } 
}

我然后使用优化编译它（和调试符号，为漂亮拆卸）与Visual Studio 2012：

C:\Temp>cl /Zi /O2 /Wall /c so19244189.c 
Microsoft (R) C/C++ Optimizing Compiler Version 17.00.50727.1 for x86 
Copyright (C) Microsoft Corporation. All rights reserved. 

so19244189.c

最后，这里的拆解：

C:\Temp>dumpbin /disasm so19244189.obj 

[..] 

_versionA: 
    00000000: 33 C0    xor   eax,eax 
    00000002: 8B 0C 85 00 00 00 mov   ecx,dword ptr _numbers[eax*4] 
      00 
    00000009: 8B D1    mov   edx,ecx 
    0000000B: 0F AF D1   imul  edx,ecx 
    0000000E: 0F AF D1   imul  edx,ecx 
    00000011: 89 14 85 00 00 00 mov   dword ptr _result[eax*4],edx 
      00 
    00000018: 40     inc   eax 
    00000019: 83 F8 09   cmp   eax,9 
    0000001C: 72 E4    jb   00000002 
    0000001E: A3 00 00 00 00  mov   dword ptr [_i],eax 
    00000023: C3     ret 

_versionB: 
    00000000: 33 C0    xor   eax,eax 
    00000002: 8B 0C 85 00 00 00 mov   ecx,dword ptr _numbers[eax*4] 
      00 
    00000009: 8B D1    mov   edx,ecx 
    0000000B: 0F AF D1   imul  edx,ecx 
    0000000E: 0F AF D1   imul  edx,ecx 
    00000011: 89 14 85 00 00 00 mov   dword ptr _result[eax*4],edx 
      00 
    00000018: 40     inc   eax 
    00000019: 83 F8 09   cmp   eax,9 
    0000001C: 72 E4    jb   00000002 
    0000001E: A3 00 00 00 00  mov   dword ptr [_i],eax 
    00000023: C3     ret 

_versionC: 
    00000000: 33 C0    xor   eax,eax 
    00000002: 8B 0C 85 00 00 00 mov   ecx,dword ptr _numbers[eax*4] 
      00 
    00000009: 8B D1    mov   edx,ecx 
    0000000B: 0F AF D1   imul  edx,ecx 
    0000000E: 0F AF D1   imul  edx,ecx 
    00000011: 89 14 85 00 00 00 mov   dword ptr _result[eax*4],edx 
      00 
    00000018: 40     inc   eax 
    00000019: 83 F8 09   cmp   eax,9 
    0000001C: 72 E4    jb   00000002 
    0000001E: A3 00 00 00 00  mov   dword ptr [_i],eax 
    00000023: C3     ret

注意装配如何在所有情况下完全一样。因此，您的问题的正确答案

哪一个会更有效率计算，为什么？

这个编译器是：mu。你的问题无法回答，因为它是基于错误的假设。没有一个答案比其他任何答案都快。

来源

2013-10-08 10:02:53

谢谢你们，很高兴看到大会 – jsj

理论答案：

一个相当不错的优化编译器应该将版本A转换为版本B，并且只从内存执行一个加载。如果优化已启用，则应该没有性能差异。

如果优化禁用，版本A将会变慢，因为地址必须计算3次，并且有3个内存负载（其中2个缓存且速度非常快，但仍然比重用寄存器要慢）。

实际上，答案将取决于您的编译器，您应该通过基准测试来检查。

来源

2013-10-08 09:59:15

这取决于编译器，但它们都应该是相同的。

首先让我们来看看案例B智能编译器会生成代码，以便只将值载入寄存器一次，因此如果您使用某个附加变量也没关系，编译器会生成指令的操作码，并将值写入寄存器。因此B与A相同。

现在让我们来比较A和C。我们应该看看操作员内嵌实现。a[b]实际上是*(a + b)所以*(numbers + i)相同numbers[i]这意味着箱子A和C是相同的。

所以我们必须(A==B) && (A==C)所有的一切(A==B==C)如果你知道我的意思:)。

来源

2013-10-08 10:11:08 ST3

局部变量与阵列访问

回答

相关问题