numpy calling sse2 via ctypes

Posted by Daniel on Stack Overflow See other posts from Stack Overflow or by Daniel
Published on 2010-06-15T12:39:52Z Indexed on 2010/06/15 12:42 UTC
Read the original article Hit count: 350

Filed under:

In brief, I am trying to call into a shared library from python, more specifically, from numpy. The shared library is implemented in C using sse2 instructions. Enabling optimisation, i.e. building the library with -O2 or –O1, I am facing strange segfaults when calling into the shared library via ctypes. Disabling optimisation (-O0), everything works out as expected, as is the case when linking the library to a c-program directly (optimised or not). Attached you find a snipped which exhibits the delineated behaviour on my system. With optimisation enabled, gdb reports a segfault in __builtin_ia32_loadupd (__P) at emmintrin.h:113. The value of __P is reported as optimised out.

test.c:

#include <emmintrin.h>
#include <complex.h>
void test(const int m, const double* x, double complex* y) {

    int i;
    __m128d _f, _x, _b;
    double complex f __attribute__( (aligned(16)) );
    double complex b __attribute__( (aligned(16)) );
    __m128d* _p;

    b = 1;
    _b = _mm_loadu_pd( (double *) &b );

    _p = (__m128d*) y;

    for(i=0; i<m; ++i) {
        f = cexp(-I*x[i]);
        _f = _mm_loadu_pd( (double *) &f );
        _x = _mm_loadu_pd( (double *) &x[i] );      
        _f = _mm_shuffle_pd(_f, _f, 1);
        *_p = _mm_add_pd(*_p, _f);
        *_p = _mm_add_pd(*_p, _x); 
        *_p = _mm_mul_pd(*_p,_b);
        _p++;
    }
    return;
}

Compiler flags: gcc -o libtest.so -shared -std=c99 -msse2 -fPIC -O2 -g -lm test.c

test.py:

import numpy as np
import os

def zerovec_aligned(nr, dtype=np.float64, boundary=16):
    '''Create an aligned array of zeros.
    '''
    size = nr * np.dtype(dtype).itemsize
    tmp = np.zeros(size + boundary, dtype=np.uint8)
    address = tmp.__array_interface__['data'][0]
    offset = boundary - address % boundary
    return tmp[offset:offset + size].view(dtype=dtype)


lib = np.ctypeslib.load_library('libtest', '.' )
lib.test.restype = None
lib.test.argtypes = [np.ctypeslib.ctypes.c_int,
                     np.ctypeslib.ndpointer(np.float64, flags=('C', 'A') ),
                     np.ctypeslib.ndpointer(np.complex128, flags=('C', 'A', 'W') )]


n = 13
y = zerovec_aligned(n, dtype=np.complex128)
x = np.ones(n, dtype=np.float64)
# x = zerovec_aligned(n, dtype=np.float64)
# x[:] = 1.

lib.test(n,x,y)

My system:

Ubuntu Linux i686 2.6.31-22-generic
Compiler: gcc (Ubuntu 4.4.1-4ubuntu9)
Python: Python 2.6.4 (r264:75706, Dec 7 2009, 18:45:15) [GCC 4.4.1]
Numpy: 1.4.0

I have taken provisions (cf. python code) that y is aligned and the alignment of x should not matter (I think; explicitly aligning x does not solve the problem though).

Note also that i use _mm_loadu_pd instead of _mm_load_pd when loading b and f. For the C-only version _mm_load_pd works (as expected). However, when calling the function via ctypes using _mm_load_pd always segfaults (independent of optimisation).

I have tried several days to sort out this issue without success ... and I am on the verge beating my monitor to death. Any input welcome. Daniel

Developer IT

numpy calling sse2 via ctypes - Developer IT

numpy calling sse2 via ctypes

python

numpy

ctypes

sse2

Related posts about python

unmet dependencies in Ubuntu 12.04

How can I get sikuli-ide to work?

Getting PATH right for python after MacPorts install

call python with system() in R to run a python script emulating the python console

Python - Calling a non python program from python?

Related posts about numpy

append a numpy array to a numpy array

How do you construct an array suitable for numpy sorting?

Building an interleaved buffer for pyopengl and numpy

Helping install mrcwa and solve problems with f2py in Ubuntu 14.04 LTS

Compound assignment operators in Python's Numpy library

Categories cloud