There are 2 very big series of elements, the second 100 times bigger than the first. For each element of the first series, there are 0 or more elements on the second series. This can be traversed and processed with 2 nested loops. But the unpredictability of the amount of matching elements for each member of the first array makes things very, very slow.
The actual processing of the 2nd series of elements involves logical and (&) and a population count.
I couldn't find good optimizations using C but I am considering doing inline asm, doing rep* mov* or similar for each element of the first series and then doing the batch processing of the matching bytes of the second series, perhaps in buffers of 1MB or something. But the code would be get quite messy.
Does anybody know of a better way? C preferred but x86 ASM OK too. Many thanks!
Sample/demo code with simplified problem, first series are "people" and second series are "events", for clarity's sake. (the original problem is actually 100m and 10,000m entries!)
#include <stdio.h>
#include <stdint.h>
#define PEOPLE 1000000 // 1m
struct Person {
uint8_t age; // Filtering condition
uint8_t cnt; // Number of events for this person in E
} P[PEOPLE]; // Each has 0 or more bytes with bit flags
#define EVENTS 100000000 // 100m
uint8_t P1[EVENTS]; // Property 1 flags
uint8_t P2[EVENTS]; // Property 2 flags
void init_arrays() {
for (int i = 0; i < PEOPLE; i++) { // just some stuff
P[i].age = i & 0x07;
P[i].cnt = i % 220; // assert( sum < EVENTS );
}
for (int i = 0; i < EVENTS; i++) {
P1[i] = i % 7; // just some stuff
P2[i] = i % 9; // just some other stuff
}
}
int main(int argc, char *argv[])
{
uint64_t sum = 0, fcur = 0;
int age_filter = 7; // just some
init_arrays(); // Init P, P1, P2
for (int64_t p = 0; p < PEOPLE ; p++)
if (P[p].age < age_filter)
for (int64_t e = 0; e < P[p].cnt ; e++, fcur++)
sum += __builtin_popcount( P1[fcur] & P2[fcur] );
else
fcur += P[p].cnt; // skip this person's events
printf("(dummy %ld %ld)\n", sum, fcur );
return 0;
}
gcc -O5 -march=native -std=c99 test.c -o test