查找设置在位数组中的最高位（最左边）

38

GCC有__builtin_clz这相当于BSR在x86/x64操作系统，CLZ在ARM等，并模拟了指令，如果硬件没有实现它。
Visual C++ 2005及更高版本有_BitScanReverse。

来源

2010-04-07 00:19:32

+0

TY为链接 – Claudiu 2010-04-07 01:36:37

+2

注意参数为0时未定义的行为。 – user2357112 2017-06-21 23:35:09

+0

是。在这种情况下，“未定义的行为”的意思是“返回一个非确定性的随机数。” – johnwbyrd 2017-11-24 22:04:22

5

查找BSR（位扫描反向）x86 asm指令，以便以最快的方式执行此操作。从英特尔的doc： Searches the source operand (second operand) for the most significant set bit (1 bit). If a most significant 1 bit is found, its bit index is stored in the destination operand (first operand).

来源

2010-04-06 23:48:18 ggiroux

+0

或者在PowerPC上，cntlwi – Crashworks 2010-04-07 00:01:27

2

http://graphics.stanford.edu/~seander/bithacks.html#IntegerLogObvious

来源

2010-04-06 23:58:04

+0

heh，我的回答中包含完全相同的URL，即＃IntegerLogObvious。 – Arkku 2010-04-07 00:04:38

11

有多种方法可以做到这一点，不同的实现的相对性能是有点依赖于机器（我碰巧基准这在一定程度上类似用途）。在某些机器上，甚至还有一个内置指令（如果可用，可以使用可移植性）。

查看一些实现here（在“整数对数基2”下）。如果您使用GCC，请查看功能__builtin_clz和__builtin_clzl（分别对非零无符号整数和无符号长整数进行此操作）。 “clz”代表“统计前导零”，这是另一种描述相同问题的方式。

当然，如果您的位阵列不适合合适的机器字，您需要遍历数组中的单词以查找第一个非零字，然后仅对该单词执行此计算。

来源

2010-04-06 23:59:11 Arkku

+0

+1，用于指出'__builtin_clz'和'__builtin_clzl'对于0个输入未定义（由[GCC文档]（https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html）备份））。 – 2017-11-22 17:49:38

0

下面是字节的任意大小的数组简单，蛮力算法：

int msb(unsigned char x); // prototype for function that returns 
          // most significant bit set 

unsigned char* p; 

for (p = arr + num_elements; p != arr;) { 
    --p; 
    if (*p != 0) break; 
} 

// p is with pointing to the last byte that has a bit set, or 
// it's pointing to the first byte in the array 

if (*p) { 
    return ((p - arr) * 8) + msb(*p); 
} 

// what do you want to return if no bits are set? 
return -1;

我会离开它作为一个练习留给读者拿出一个合适的msb()功能以及优化工作在int或long long大小的数据。

来源

2010-04-07 00:03:25

0

恩，你的标签指示32位，但它看起来像你使用的值是16位。如果你的意思是32位，那么我认为0x00a1的答案应该是24而不是8.

假设你正在寻找左边的MSB位索引，并且你知道你只会处理与uint32_t的年代，这里是明显的，头脑简单的算法：

#include <stdlib.h> 
#include <stdio.h> 
#include <stdint.h> 

int main() 
{ 
    uint32_t test_value = 0x00a1; 
    int i; 

    for (i=0; i<32; ++i) 
    { 
     if (test_value & (0x80000000 >> i)) 
     { 
      printf("i = %d\n", i); 
      exit(0); 
     } 
    } 

    return 0; 
}

来源

2010-04-07 00:09:27 tikiboy

+1

是的，但它太慢=（ – Claudiu 2010-04-07 01:35:33

2

两个最好的方法，我知道这样做纯C：

第一线性搜索字节/字阵列发现的第一个字节/字那是非零，然后做一个展开的二进制搜索字节/你找到的词。

if (b>=0x10) 
    if (b>=0x40) 
    if (b>=0x80) return 0; 
    else return 1; 
    else 
    if (b>=0x20) return 2; 
    else return 3; 
else 
    if (b>=0x4) 
    if (b>=0x8) return 4; 
    else return 5; 
    else 
    if (b>=0x2) return 6; 
    else return 7;

3（顺便说一下，这是log2（8））条件跳转来获得答案。在现代x86机器上，最后一个将被优化为条件mov。

或者，使用查找表将该字节映射到设置的第一位的索引。

您可能要查找的相关主题是整数log2函数。如果我记得，ffmpeg有一个很好的实现。

编辑：实际上，你可以使上面的二进制搜索到网点二进制搜索，但我不知道这是否会是在这种情况下更有效......

来源

2010-07-08 06:42:59

1

不是最快的，但它的作品。 ..

//// C program 
#include <math.h> 

#define POS_OF_HIGHESTBIT(a) /* 0th position is the Least-Signif-Bit */ \ 
((unsigned) log2(a))   /* thus: do not use if a <= 0 */ 

#define NUM_OF_HIGHESTBIT(a) ((!(a))   \ 
     ? 0 /* no msb set*/     \ 
     : (1 << POS_OF_HIGHESTBIT(a))) 
// could be changed and optimized, if it is known that the following NEVER holds: a <= 0 



int main() 
{ 
    unsigned a = 5; // 0b101 
    unsigned b = NUM_OF_HIGHESTBIT(a); // 4 since 4 = 0b100 
    return 0; 
}

来源

2011-06-28 15:41:14 Jeff

2

这里的代码片段解释__builtin_clz（）

////// go.c //////// 
#include <stdio.h> 

unsigned NUM_BITS_U = ((sizeof(unsigned) << 3) - 1); 
#define POS_OF_HIGHESTBITclz(a) (NUM_BITS_U - __builtin_clz(a)) /* only works for a != 0 */ 

#define NUM_OF_HIGHESTBITclz(a) ((a)        \ 
          ? (1U << POS_OF_HIGHESTBITclz(a))  \ 
          : 0) 


int main() 
{ 
    unsigned ui; 

    for (ui = 0U; ui < 18U; ++ui) 
    printf("%i \t %i\n", ui, NUM_OF_HIGHESTBITclz(ui)); 

    return 0; 
}

来源

2011-07-07 22:22:19 George

2

如果您使用的x86，可以使用几乎击败任何逐字节或字的字解决方案SSE2操作，结合find-first-bit指令，其中（在gcc世界中）最低位发音为“ffs”，最高位发音为“fls”。请原谅我在麻烦（！@＃$％^）格式化答案中的“C”代码;退房： http://mischasan.wordpress.com/2011/11/03/sse2-bit-trick-ffsfls-for-xmm-registers/

来源

2011-11-05 00:48:39 Mischa

26

作为一个性能迷我已经尝试了吨MSB组的变化，以下是我所遇到的最快速，

unsigned int msb32(unsigned int x) 
{ 
    static const unsigned int bval[] = 
    {0,1,2,2,3,3,3,3,4,4,4,4,4,4,4,4}; 

    unsigned int r = 0; 
    if (x & 0xFFFF0000) { r += 16/1; x >>= 16/1; } 
    if (x & 0x0000FF00) { r += 16/2; x >>= 16/2; } 
    if (x & 0x000000F0) { r += 16/4; x >>= 16/4; } 
    return r + bval[x]; 
}

来源

2012-04-23 01:14:38

+2

此代码是这个代码产生的结果与其他答案中的结果相差一个，即msb（1）== 1，与其他定义不同的是，msb（1）== 1， 1）== 0. – johnwbyrd 2017-05-29 20:49:04

+1

这是一个可怕的答案，谁upvoted这个？ – snb 2017-06-09 13:21:51

+3

这是StackOverflow和其他“最受欢迎的答案赢”类型网站的缺陷之一，最常见的答案是Everyman认为是正确的答案。普通人并不总是对的，人群智慧不能代替基准。 – johnwbyrd 2017-08-02 01:30:39

-2

#define FFS(t) \ 
({ \ 
register int n = 0; \ 
      \ 
if (!(0xffff & t)) \ 
    n += 16; \ 
     \ 
if (!((0xff << n) & t)) \ 
    n += 8; \ 
     \ 
if (!((0xf << n) & t)) \ 
    n += 4; \ 
     \ 
if (!((0x3 << n) & t)) \ 
    n += 2; \ 
     \ 
if (!((0x1 << n) & t)) \ 
    n += 1; \ 
     \ 
n; \ 
})

来源

2013-03-27 03:16:05

+1

如何解释一下在那段代码上离子？ – Jesse 2013-03-27 03:34:47

+1

如果它是一个宏，'t'应该放在括号里。或者更好，但也可以将它放在局部变量中，这样它并不总是被计算出来。 – Claudiu 2013-03-27 03:47:27

+0

它只是使用二进制搜索，我同意你的意见Claudiu，但我认为应该有一个更有效的方式来获得结果，而不使用clz bsr类似的指令 – 2013-03-27 07:16:25

1

我曾与许多工作的函数来获取最重要的位，但问题通常出现在32位和64位数之间或在x86_64和x86盒之间移动。功能__builtin_clz,__builtin_clzl和__builtin_clzll适用于32/64位数和x86_64和x86机器。但是，需要三个功能。我找到了一个简单的MSB，它依赖右移来处理所有正数的情况。至少在使用我做的，它已经成功了，其他人都失败了：

int 
getmsb (unsigned long long x) 
{ 
    int r = 0; 
    if (x < 1) return 0; 
    while (x >>= 1) r++; 
    return r; 
}

通过指定输入为unsigned long long它可以处理来自unsigned char所有号码类unsigned long long和给出的标准定义，它是跨兼容x86_64和x86版本。 0的情况定义为返回0，但可以根据需要进行更改。一个简单的测试，并输出有：

int 
main (int argc, char *argv[]) { 

    unsigned char c0 = 0; 
    unsigned char c = 216; 
    unsigned short s = 1021; 
    unsigned int ui = 32768; 
    unsigned long ul = 3297381253; 
    unsigned long long ull = 323543844043; 

    int i = 32767; 

    printf (" %16u MSB : %d\n", c0, getmsb (c0)); 
    printf (" %16u MSB : %d\n", c, getmsb (c)); 
    printf (" %16u MSB : %d\n", s, getmsb (s)); 
    printf (" %16u MSB : %d\n", i, getmsb (i)); 
    printf (" %16u MSB : %d\n", ui, getmsb (ui)); 
    printf (" %16lu MSB : %d\n", ul, getmsb (ul)); 
    printf (" %16llu MSB : %d\n", ull, getmsb (ull)); 

    return 0; 
}

输出：

   0 MSB : 0 
      216 MSB : 7 
      1021 MSB : 9 
     32767 MSB : 14 
     32768 MSB : 15 
    3297381253 MSB : 31 
    323543844043 MSB : 38

注：速度的考虑，使用单一的函数来完成围绕__builtin_clzll为中心的相同的事情仍然较快的因数约6.

来源

2014-06-02 08:25:28

10

tl：dr;对于32位，请使用de Bruijn multiplication。

这是"fastest"便携式算法。它比这个线程中的所有其他便携式32位MSB算法快得多，也更正确。

当输入为零时，de Bruijn算法也返回正确的结果。当输入为零时，__builtin_clz和_BitScanReverse指令return incorrect results。

在X86-64，德布鲁因乘法运行在可比的当量（有缺陷的）硬件指令的速度，具有仅约3％的性能差。

这是代码。

u32 msbDeBruijn32(u32 v) 
{ 
    static const int MultiplyDeBruijnBitPosition[32] = 
    { 
     0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30, 
     8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31 
    }; 

    v |= v >> 1; // first round down to one less than a power of 2 
    v |= v >> 2; 
    v |= v >> 4; 
    v |= v >> 8; 
    v |= v >> 16; 

    return MultiplyDeBruijnBitPosition[(u32)(v * 0x07C4ACDDU) >> 27]; 
}

本主题中的所有其他答案要么比他们的作者建议的要差得多，要么不正确地计算结果，或者两者兼而有之。让我们对它们进行基准测试，然后让我们验证它们是否做了他们声称做的事情。

下面是一个简单的C++ 11工具来测试所有这些实现。它编译干净的Visual Studio，但应该适用于所有现代编译器。它允许您在性能模式（bVerifyResults = false）和检查模式（bVerifyResults = true）下运行基准测试。

下面是结果验证模式：

Verification failed for msbNative64: input was 0; output was 818af060; expected 0 
Verification failed for msbFfs: input was 22df; output was 0; expected d 
Verification failed for msbPerformanceJunkie32: input was 0; output was ffffffff; expected 0 
Verification failed for msbNative32: input was 0; output was 9ab07060; expected 0

“性能迷”和Microsoft本地实现做不同的事情，当输入为零。 msbPerformanceJunkie32产生-1，微软的_BitScanReverse产生一个与底层硬件指令一致的随机数。此外，msbPerformanceJunkie32实现产生的结果与所有其他答案中的结果相同。

以下是性能模式的结果，我i7-4600笔记本电脑上运行，在释放模式编译：

msbLoop64 took 2.56751 seconds    
msbNative64 took 0.222197 seconds    

msbLoop32 took 1.43456 seconds    
msbFfs took 0.525097 seconds     
msbPerformanceJunkie32 took 1.07939 seconds 
msbDeBruijn32 took 0.224947 seconds   
msbNative32 took 0.218275 seconds

德布鲁因版本击败了其他实现香甜因为它是网点，因此，它可以很好地对付产生均匀分布式输出的输入。由于对现代CPU的分支预测失误的惩罚，所有其他版本对任意输入都比较慢。 smbFfs函数产生不正确的结果，因此可以忽略它。

一些实现工作在32位输入上，一些工作在64位输入上。无论输入大小如何，模板都可以帮助我们比较苹果和苹果。

这是代码。如果你喜欢，可自行下载并运行基准测试。

#include <iostream> 
#include <chrono> 
#include <random> 
#include <cassert> 
#include <string> 
#include <limits> 

#ifdef _MSC_VER 
#define MICROSOFT_COMPILER 1 
#include <intrin.h> 
#endif // _MSC_VER 

const int iterations = 100000000; 
bool bVerifyResults = false; 
std::random_device rd; 
std::default_random_engine re(rd()); 
typedef unsigned int u32; 
typedef unsigned long long u64; 

class Timer 
{ 
public: 
    Timer() : beg_(clock_::now()) {} 
    void reset() { 
     beg_ = clock_::now(); 
    } 
    double elapsed() const { 
     return std::chrono::duration_cast<second_> 
      (clock_::now() - beg_).count(); 
    } 

private: 
    typedef std::chrono::high_resolution_clock clock_; 
    typedef std::chrono::duration<double, std::ratio<1> > second_; 
    std::chrono::time_point<clock_> beg_; 
}; 

unsigned int msbPerformanceJunkie32(u32 x) 
{ 
    static const unsigned int bval[] = 
    { 0,1,2,2,3,3,3,3,4,4,4,4,4,4,4,4 }; 
    unsigned int r = 0; 
    if (x & 0xFFFF0000) { 
     r += 16/1; 
     x >>= 16/1; 
    } 
    if (x & 0x0000FF00) { 
     r += 16/2; 
     x >>= 16/2; 
    } 
    if (x & 0x000000F0) { 
     r += 16/4; 
     x >>= 16/4; 
    } 
    return r + bval[x]; 
} 

#define FFS(t) \ 
{ \ 
register int n = 0; \ 
if (!(0xffff & t)) \ 
n += 16; \ 
if (!((0xff << n) & t)) \ 
n += 8; \ 
if (!((0xf << n) & t)) \ 
n += 4; \ 
if (!((0x3 << n) & t)) \ 
n += 2; \ 
if (!((0x1 << n) & t)) \ 
n += 1; \ 
return n; \ 
} 

unsigned int msbFfs32(u32 x) 
{ 
    FFS(x); 
} 

unsigned int msbLoop32(u32 x) 
{ 
    int r = 0; 
    if (x < 1) return 0; 
    while (x >>= 1) r++; 
    return r; 
} 

unsigned int msbLoop64(u64 x) 
{ 
    int r = 0; 
    if (x < 1) return 0; 
    while (x >>= 1) r++; 
    return r; 
} 

u32 msbDeBruijn32(u32 v) 
{ 
    static const int MultiplyDeBruijnBitPosition[32] = 
    { 
     0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30, 
     8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31 
    }; 

    v |= v >> 1; // first round down to one less than a power of 2 
    v |= v >> 2; 
    v |= v >> 4; 
    v |= v >> 8; 
    v |= v >> 16; 

    return MultiplyDeBruijnBitPosition[(u32)(v * 0x07C4ACDDU) >> 27]; 
} 

#ifdef MICROSOFT_COMPILER 
u32 msbNative32(u32 val) 
{ 
    unsigned long result; 
    _BitScanReverse(&result, val); 
    return result; 
} 
u32 msbNative64(u64 val) 
{ 
    unsigned long result; 
    _BitScanReverse64(&result, val); 
    return result; 
} 
#endif // MICROSOFT_COMPILER 

template <typename InputType> 
void test(unsigned int msbFunc(InputType), 
    const std::string &name, 
    const std::vector<InputType> &inputs, 
    std::vector< unsigned int > &results, 
    bool bIsReference = false 
) 
{ 
    if (bIsReference) 
    { 
     int i = 0; 
     for (int i = 0; i < iterations; i++) 
      results[i] = msbFunc(inputs[i]); 
    } 
    InputType result; 
    if (bVerifyResults) 
    { 
     bool bNotified = false; 
     for (int i = 0; i < iterations; i++) 
     { 
      result = msbFunc(inputs[i]); 
      if ((result != results[i]) && !bNotified) 
      { 
       std::cout << "Verification failed for " << name << ": " 
        << "input was " << std::hex << inputs[i] 
        << "; output was " << result 
        << "; expected " << results[i] 
        << std::endl; 
       bNotified = true; 
      } 
     } 
    } 
    else 
    { 
     Timer t; 
     for (int i = 0; i < iterations; i++) 
     { 
      result = msbFunc(inputs[i]); 
     } 
     double elapsed = t.elapsed(); 
     if (!bIsReference) 
      std::cout << name << " took " << elapsed << " seconds" << std::endl; 
     if (result == -1.0f) 
      std::cout << "this comparison only exists to keep the compiler from " << 
      "optimizing out the benchmark; this branch will never be called"; 
    } 
} 

void main() 
{ 
    std::uniform_int_distribution <u64> dist64(0, 
     std::numeric_limits<u64>::max()); 
    std::uniform_int_distribution <u32> shift64(0, 63); 
    std::vector<u64> inputs64; 
    for (int i = 0; i < iterations; i++) 
    { 
     inputs64.push_back(dist64(re) >> shift64(re)); 
    } 
    std::vector<u32> results64; 
    results64.resize(iterations); 

    test<u64>(msbLoop64, "msbLoop64", inputs64, results64, true); 
    test<u64>(msbLoop64, "msbLoop64", inputs64, results64, false); 
#ifdef MICROSOFT_COMPILER 
    test<u64>(msbNative64, "msbNative64", inputs64, results64, false); 
#endif // MICROSOFT_COMPILER 
    std::cout << std::endl; 

    std::uniform_int_distribution <u32> dist32(0, 
     std::numeric_limits<u32>::max()); 
    std::uniform_int_distribution <u32> shift32(0, 31); 
    std::vector<u32> inputs32; 
    for (int i = 0; i < iterations; i++) 
     inputs32.push_back(dist32(re) >> shift32(re)); 
    std::vector<u32> results32; 
    results32.resize(iterations); 


    test<u32>(msbLoop32, "msbLoop32", inputs32, results32, true); 

    test<u32>(msbLoop32, "msbLoop32", inputs32, results32, false); 
    test<u32>(msbFfs32, "msbFfs", inputs32, results32, false); 
    test<u32>(msbPerformanceJunkie32, "msbPerformanceJunkie32", 
     inputs32, results32, false); 
    test<u32>(msbDeBruijn32, "msbDeBruijn32", inputs32, results32, false); 
#ifdef MICROSOFT_COMPILER 
    test<u32>(msbNative32, "msbNative32", inputs32, results32, false); 
#endif // MICROSOFT_COMPILER 
}

来源

2015-07-30 07:52:17 johnwbyrd

+0

不错的工作，但是你现在正在包含'msbLoop32'在其时序中完成的初始化工作，这意味着它看起来要慢两倍。 – 2017-11-22 18:39:00

+0

我也很想知道如何乘以一个小于* 2的幂乘的'v'。链接的PDF只解释了为什么乘法对应于移位的原因是它是2，所以我会认为加1是必要的。 – 2017-11-22 18:44:30

+0

你认为msbLoop32正在做什么初始化工作？ – johnwbyrd 2017-11-24 20:25:12

1

我会加一个！

typedef unsigned long long u64; 
typedef unsigned int  u32; 
typedef unsigned char  u8; 


u8 findMostSignificantBit (u64 u64Val) 
{ 
    u8 u8Shift; 
    u8 u8Bit = 0; 

    assert (u64Val != 0ULL); 

    for (u8Shift = 32 ; u8Shift != 0 ; u8Shift >>= 1) 
    { 
    u64 u64Temp = u64Val >> u8Shift; 
    if (u64Temp) 
    { 
     u8Bit |= u8Shift; // notice not using += 
     u64Val = u64Temp; 
    } 
    } 

    return u8Bit; 
}

当然，这是一个64位数字（无符号long long），而不是一个数组。另外，很多人都指出了我不知道的内置g ++函数。多么有趣。

无论如何，这将在6次迭代中找到最重要的位，并且如果向函数传递0，则会发出断言。如果您有权访问芯片组的指令，则不是最好的功能。

我也使用| =而不是+ =，因为这些总是幂的两倍，并且OR（典型地）比加法快。由于我只是一起增加2个独特的力量，所以我从来没有翻身。

这是一个二进制搜索，这意味着它总是在6次迭代中找到结果。

再次，这是更好的：

u8 findMostSignificantBit2 (u64 u64Val) 
{ 
    assert (u64Val != 0ULL); 

    return (u8) (__builtin_ctzll(u64Val)); 
}

来源

2017-07-18 07:01:22

查找设置在位数组中的最高位（最左边）

回答

相关问题