C Population Count of unsigned 64-bit integer with a maximum value of 15

Posted by BitTwiddler1011 on Stack Overflow See other posts from Stack Overflow or by BitTwiddler1011
Published on 2010-06-02T18:27:35Z Indexed on 2010/06/02 18:44 UTC
Read the original article Hit count: 167

Filed under:
|
|
|
|

I use a population count (hamming weight) function intensively in a windows c application and have to optimize it as much as possible in order to boost performance. More than half the cases where I use the function I only need to know the value to a maximum of 15. The software will run on a wide range of processors, both old and new. I already make use of the POPCNT instruction when Intel's SSE4.2 or AMD's SSE4a is present, but would like to optimize the software implementation (used as a fall back if no SSE4 is present) as much as possible.

Currently I have the following software implementation of the function:

inline int population_count64(unsigned __int64 w) {
    w -= (w >> 1) & 0x5555555555555555ULL;
    w = (w & 0x3333333333333333ULL) + ((w >> 2) & 0x3333333333333333ULL);
    w = (w + (w >> 4)) & 0x0f0f0f0f0f0f0f0fULL;
    return int(w * 0x0101010101010101ULL) >> 56;
}

So to summarize:

(1) I would like to know if it is possible to optimize this for the case when I only want to know the value to a maximum of 15.

(2) Is there a faster software implementation (for both Intel and AMD CPU's) than the function above?

© Stack Overflow or respective owner

Related posts about c++

Related posts about c